Section 4 Software

The course makes heavy use of the R statistical programming language, and several related tools, most especially the R Studio development environment. Every bit of this software is free to use, and open-source.

  • There will be many people in the course for whom R is a new experience. I assume no prior R work in the course. You will know a fair amount of R (and some other things, too) after taking the course, though.
  • We’ll also be using the R Markdown tool within R Studio. R Markdown will be taught in our class, and can be used to generate reproducible reports that appear as .html files, PDF files or Word documents.
  • For some people, working with R is the best part of the class, and the part that they’re most excited about.
  • For others, it’s a real source of anxiety. We understand and encourage patience. There will definitely be some pain, but our experience is that things are much smoother for most people by early October than they appear to be in August.

4.1 System Requirements

You will need a laptop computer in this class, not just an iPad or other tablet, but an actual computer. All of the software we will use in this class is either free and open source, or available to you for free through your affiliation with CWRU, so there is nothing to buy if you have a laptop.

  • We’ve made some effort in terms of course requirements to set the bar low. You do not need a state of the art machine, nor should you need any special hardware to run things for this course.
    • You will need a laptop, either PC or Mac, but the style should be determined by your personal preferences and how you believe you will use the machine in your research life.
    • In this class, you’ll be using R Studio and R, which look and work the same on either a PC or a Mac.
    • Any reasonably recent PC or Macintosh machine will work well.
    • We do not recommend the use of a Chrome device for this class.
    • R and R Studio also run on Linux systems. If you use one, you know more than Dr. Love does about how to accomplish that.

4.2 How Do I Install The Software?

Complete instructions, with a step-by-step walkthrough for PC or Mac machines, are available at https://github.com/THOMASELOVE/431-2018/blob/master/software/installation.md

At that page, you will find specific instructions to install everything you need, specifically:

  • [R] The latest version of the R statistical software.
  • [R Studio] The latest version of the R Studio development environment.
  • [Packages] Some R “packages” of functions, data and documentation.
  • [431 Data] Some data and functions specific to the 431 class.

In brief, the steps you need to take for 431 are:

  1. Download and install the latest version of R (version 3.5.1 at this writing) at http://cran.case.edu/ or https://cran.r-project.org/.
  2. Download and install R Studio (version 1.1.456 or later at this writing) at https://www.rstudio.com/products/rstudio/download/#download. If you prefer, you can run the Preview Version of R Studio to get the very latest features, but that requires you to update your setup more frequently, and, very occasionally, deal with some additional troubleshooting.
  3. Install some R packages - an R “package” is a collection of functions, data, and documentation that extends the capabilities of R, and is the critical way to get R doing interesting work. To install the packages for our course, follow the instructions in the Packages description at our Software page
  4. Download the data and code (functions) we’ve developed specifically for this course from our Data page.

4.3 Need More Help?

If you need more help, you might look at this terrific resource for Installing R and R Studio from Jenny Bryan and the STAT 545 project. These are the people responsible for the great Happy Git with R project, which is worth your time, too, if you intend to use Git and GitHub.

If you’re having installation problems or problems getting started in R, please consider asking a question of us at 431-help at case dot edu, although a visit to office hours is often more helpful, as it’s difficult for us to diagnose your problem without seeing your computer.

4.4 Getting Started with the Software, once you’ve installed

  1. Dr. Love will demonstrate the use of R, R Studio and R Markdown in class, starting with Class 2.
  2. Dr. Love also prepared a downloadable template for your first few R Markdown attempts. Get it by downloading the data and code for the course at https://github.com/THOMASELOVE/431-2018-data. Click on the green Clone or download button, and then select Download ZIP to obtain a Zip file of all posted materials.
  3. Dr. Love’s document Getting Started with R is a good first step. It demonstrates some of the details on how to use these tools to actually analyze data. Of course, we’ll also do this in class.
  4. See the Datacamp section of this syllabus for details on the educational videos and tools available to you.
  5. We can also recommend Chester Ismay and Patrick Kennedy’s Getting Used to R, RStudio and R Markdown as an introduction to the basics.
  6. Dr. Love’s Course Notes are a source of many examples.

4.5 Why do we teach R, instead of SPSS or SAS or whatever, in 431-432?

  1. Because it is by far the better choice for what we’re trying to do, which is to help you become effective data scientists. And effective scientists, period.
  2. Because being a data scientist means writing code and actually doing (not just talking about) replicable research, which R facilitates in an immense variety of ways.
  3. Because R is free to you, me and everyone, and its community is a daily delight.

To read comments from other people on the subject, there’s always Google, but I suggest reading Why R? from Chester Ismay and Patrick Kennedy.