6 Required Software
The course makes heavy use of the R statistical programming language, and several related tools, most especially the RStudio development environment. Every bit of this software is free to use, and open-source.
You will need access to a computer to do your work for this class, not just an iPad or other tablet, but an actual computer. You do not need a state of the art machine, nor should you need any special hardware to run things for this course.
- There will be many people in the course for whom R is a new experience. I assume no prior R work in the course. You will know a fair amount of R (and some other things, too) after taking the course, though.
- We’ll be using Quarto for most of our work, as well, which can be used to generate reproducible reports that appear as .html websites, PDF files or Word documents, among other things.
- For some people, working with R and Quarto is the best part of the class, and the part that they’re most excited about.
- For others, it’s a real source of anxiety. We understand and encourage patience. There will definitely be some pain, but our experience is that things are much smoother for most people by early October than they appear to be in August.
6.1 R and RStudio
You will do all of your analysis with the open source (and free!) programming language R. You will use RStudio as the main program to access R. Think of R as an engine and RStudio as a car dashboard. R handles all the calculations and the actual statistics, while RStudio provides a nice interface for running R code.
R is free, but it can sometimes be a pain to install and configure. Information about getting R and RStudio on your computer is provided below, and also on our main website.
Learning R can be difficult at first - it’s like learning a new language, just like Spanish, French, or Chinese. Hadley Wickham-the chief scientist at Posit (makers of Rstudio) and the author of some amazing R packages you’ll be using like ggplot2
made this wise observation:
It’s easy when you start out programming to get really frustrated and think, “Oh it’s me, I’m really stupid,” or, “I’m not made out to program.” But, that is absolutely not the case. Everyone gets frustrated. I still get frustrated occasionally when writing R code. It’s just a natural part of programming. So, it happens to everyone and gets less and less over time. Don’t blame yourself. Just take a break, do something fun, and then come back and try again later.
If you’re finding yourself taking way too long hitting your head against a wall and not understanding, take a break, talk to the teaching assistants, talk to classmates, ask questions, e-mail Professor Love, etc.
I promise you can do this.
Some of this material is also borrowed from Andrew Heiss, especially here and here.
6.2 Getting the Software
Everything is free, but it does require some patience to get control over your computer.
6.2.1 System Requirements
You will need access to a computer to do your work for this class, not just an iPad or other tablet, but an actual computer. Whether or not you want to bring that computer to class is up to you. All of the software we will use in this class is either free and open source, or available to you for free through your affiliation with CWRU, so there is nothing to buy in terms of software.
- We’ve made some effort in terms of course requirements to set the bar low. You do not need a state of the art machine, nor should you need any special hardware to run things for this course.
- You will need a computer, either PC (running Windows 10 or 11, ideally) or Macintosh (running a reasonably recent OS), but your choice should be determined by your personal preferences and how you believe you will use the machine in your research life. RStudio and R will look and work the same on either a PC or a Macintosh.
- We do not recommend the use of a Chromebook for 431 or 432.
- R and RStudio Desktop also run on Linux systems but Professor Love knows essentially nothing about that. Consult the documentation at CRAN for R and at the download page for RStudio.
6.2.2 What will I need for 431?
These instructions are also available on the Software page at our main course website: https://thomaselove.github.io/431-2024/.
6.3 Installing R and R Studio
The steps you need to complete are:
- Download and install the latest version of R (version 4.4.1 or later) from https://cran.case.edu/ or, if you prefer, from https://cloud.r-project.org/, which automatically chooses a fast, nearby mirror for you.
- If you have a pre-existing installation of R and/or RStudio, you will need to re-install both to get current.
- Download and install RStudio Desktop (Open Source Edition - the free version 2024.04.2+764 or later) by visiting https://posit.co/download/rstudio-desktop/.
6.4 Installing R Packages and Data/Code for 431
After you’ve completed steps 1 and 2 above, move on to these tasks:
- Install some R packages - an R “package” is a collection of functions, data, and documentation that extends the capabilities of R, and is the critical way to get R doing interesting work.
- Visit https://github.com/THOMASELOVE/431-packages for details and the list of packages we’ll be using in 431.
- Download data and code (functions) we’ve developed specifically for 431 at https://github.com/THOMASELOVE/431-data.
- Follow the instructions you’ll find there.
6.5 Need Installation Help?
If you need more help, you might look at this terrific resource for Installing R and RStudio from Jenny Bryan and the STAT 545 project. These are the people responsible for the great Happy Git with R project, which will also be worth your time when we are using Git and GitHub.
- If you’re having trouble with installation before our first class, don’t worry too much. The TAs and Professor Love will be available to help once the class gets going.
- Once the class starts, if you’re having installation problems or problems getting started in R, please consider visiting TA office hours or perhaps asking a question on Campuswire. We want to hear from you!
6.6 Updating Your R Packages
Every once in a while, it’s a good idea to update your R packages. To do so,
- Go to the Packages tab on the right side of your RStudio screen, and click on Update.
- This will bring up a dialog box. I usually click Select All, then click Install Updates.
- A popup box may appear, asking “Do you want to install from sources the packages which need compilation?” to which I usually answer No. A Yes response leads to a slower installation, but can solve problems if you still have them after updating.
- This may take a few minutes. As long as you’re seeing activity in the Console window, things are progressing.
- Eventually, you’ll get a message that “The downloaded source packages are in …” with a directory name. That’s the sign that the updating is done.
- Updating packages is something you’ll do occasionally throughout the semester, mostly when a problem happens.
- Finally, choose File … Quit Session from the top menu, and accept or deny (I usually deny) RStudio’s request to save your workspace. Then close RStudio, and re-open it if you want to do some work.
6.7 Why do we teach R in 431-432?
Why do we teach using R, rather than SAS or SPSS or Python or whatever?
- Because it is by far the better choice for what we’re trying to do, which is to help you become effective data scientists. And effective scientists, period.
- Because being a data scientist means writing code and actually doing (not just talking about) replicable research, which R facilitates in an immense variety of ways.
- Because R is free to you, me and everyone, and its community is a daily delight.
To read comments from other people on the subject, I suggest reading Why R? from Chester Ismay and Patrick Kennedy.
Also, the question of “Why R and not SPSS?” was nicely addressed by Greg Snow in this 2010 post at StackOverflow…
When talking about user friendliness of computer software I like the analogy of cars vs. busses: Busses are very easy to use, you just need to know which bus to get on, where to get on, and where to get off (and you need to pay your fare). Cars on the other hand require much more work, you need to have some type of map or directions (even if the map is in your head), you need to put gas in every now and then, you need to know the rules of the road (have some type of drivers licence). The big advantage of the car is that it can take you a bunch of places that the bus does not go and it is quicker for some trips that would require transfering between busses. Using this analogy programs like SPSS are busses, easy to use for the standard things, but very frustrating if you want to do something that is not already preprogrammed. R is a 4-wheel drive SUV (though environmentally friendly) with a bike on the back, a kayak on top, good walking and running shoes in the passenger seat, and mountain climbing and spelunking gear in the back. R can take you anywhere you want to go if you take time to learn how to use the equipment, but that is going to take longer than learning where the bus stops are in SPSS.
6.8 Additional Resources
If you’re interested in getting started with the tools you’ll be using in 431 before the class gets rolling, the great folks at RStudio Education provide these 6 ways to begin learning R. Pick the one that appeals to you, and give it a shot.
- Note that if you’re having trouble installing things, you can still learn a lot about R, RStudio and Data Science basics with the interactive tutorials at https://rstudio.cloud/learn/primers.
- If you’re already a strong coder, and have some R experience, there are also learning pathways for intermediates at RStudio Education which might be appealing to you.
Our goal is to get everyone well into the intermediate level by December. Some people will get there in September, for others it will take longer. But you can do this, and we’ll be there to help you.
In addition, there are many, many online resources to help you with working in R and in Quarto, and we’ll point you to some of the best of them during the semester.