Chapter 4 Texts

4.1 Dr. Love’s Notes

The main text is a set of Notes for the course, maintained by Dr. Love at https://thomaselove.github.io/431notes/.

Although the Notes share some of the features of a textbook, they are neither comprehensive nor completely original. The main purpose is to give 431 students in Section 1 (and Section 2) a set of common materials on which to draw during the course, providing a series of examples using R to work through issues that are likely to come up during the semester. The material will be updated regularly as the semester progresses.

Slides from each session of the class are posted as .pdf files at https://github.com/THOMASELOVE/431slides

4.2 Books To Purchase

In addition, we’ll read two books that you’ll need to purchase (the combined price is about $25.):

  1. Nate Silver’s The Signal and The Noise ISBN-13: 978-1594204111 Amazon Link, and
  2. Jeff Leek’s The Elements of Data Analytic Style, available at https://leanpub.com/datastyle.

With regard to The Signal and the Noise, you can watch Nate discuss the book’s ideas in many places, for instance, at this YouTube link, or this one on the Art and Science of Prediction, or this one at Google. We’ll also spend considerable time (even before we read the book) looking at some articles from the FiveThirtyEight website, where Nate is editor-in-chief.

4.3 Free Resources You’ll Definitely Need To Access

4.3.1 Textbooks

  1. OpenIntro Statistics (OpenStats) by David Diez, Christopher Barr and Mine Cetinkaya-Rundel. This is an excellent resource, with lots of useful information set at a reasonably elementary level.
    • In Part A of the course, you’ll want to look at Chapters 1 and 3, in particular.
    • Part B: Chapters 4, 5, 6
    • Part C: Chapters 7, 8
  2. R for Data Science (R4DS) by Garrett Grolemund and Hadley Wickham - this is a great resource, but may feel a little advanced for those of you brand new to coding, who may want to supplement it.
    • In Part A, we’ll discuss ideas from the Introduction and Explore sections, mostly.
    • Parts B and C will address some issues discussed in the Wrangle, Model and Communicate sections.
  3. Practical Regression and ANOVA using R, by Julian J. Faraway, (Faraway) which is one of the “More Free Books” to download at https://www.openintro.org/stat/extras.php. Also uses R, but much more focused on statistical issues. A more formal presentation is in Linear Models with R, Second Edition by Julian J. Faraway (Chapman and Hall / CRC Texts in Statistical Science) ISBN-13: 978-1439887332. But the free text is sufficient for 431 and, probably 432.
    • Faraway’s material is mostly a good resource for Part C, although Chapter 16 will help with ANOVA in Part B.

4.3.2 Articles

  1. Several of the guides prepared by Jeff Leek and his group, including:
  2. Part of the Ten Simple Rules series at PLOS Computational Biology, specifically
  3. The American Statistical Association’s Statement on p-Values: Context, Process and Purpose
  4. The preprint from Benjamin D Berger J Johannesson M et al. called “Redefine statistical significance”, which proposes to change the default p-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005. The manuscript will eventually appear in the journal Nature Human Behavior.

4.4 Supplemental (and Free) Texts That May Be Worth Your Time

  1. Ismay C Getting Used to R, RStudio and R Markdown - designed to provide new users to R, RStudio, and R Markdown with the introductory steps needed to begin their own reproducible research.
    • We recommend you use this material to help understand some of the basics of these three software tools. Use other sources to supplement statistical content.
  2. Ismay C Kim AY ModernDive: An Introduction to Statistical and Data Sciences via R - intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would. Some nice material for all three Parts of our course.
    • In Part A, you’ll be looking at the Data Exploration via the Tidyverse materials in this text.
    • In Part B, we’ll definitely be looking at the Inference materials.
    • Part C expands on what’s in the Data Modeling using Regression section.
  3. Horton NJ Pruim R Kaplan DT A Student’s Guide to R from Project MOSAIC. Most recent updates (pdf) at this link - you may need to scroll down. Free, downloadable PDF - an excellent guide to Getting Started with R Studio, and then working through some straightforward examples of how to deal with data in R. Makes heavy use of the mosaic package.
    • Part A of our course discusses ideas from Chapters 3, 13, 15 and some of Chapter 5.
    • Part B discusses ideas shown in Chapters 4, 6, 7 and 12.
    • Part C discusses Chapter 5 and 8, and some of Chapter 10.
  4. Peng RD Exploratory Data Analysis with R - especially useful material on using R for graphics and general EDA strategies. Covers some basic principles of constructing informative graphs.
    • In Part A, Chapters 3-6 may be helpful. The Case Study in Chapter 16 is interesting, and has a related video.
  5. Peng RD R Programming for Data Science - designed to help you get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to debug and optimize code, which will be more of an issue for us as the semester progresses.
    • Covers some of the same ground as the other Peng book, but at a level geared more for programming.
  6. Harrell FE Biostatistics for Biomedical Research - this is more a set of course notes than a full-fledged book, and uses R but not R Studio, mostly. However, it’s full of great, in-depth information on basic statistical methods, and likely to be very useful for Parts B and C of the course.
    • Chapters 1-3 include introductions to relevant R, algebra and biostatistics.
    • For Part A, the value is in Chapter 4 and some of Chapters 14 and 21.
    • Part B - see Chapters 5-7.
    • Part C - consider Chapters 8-12.
  7. R Studio has great Cheat Sheets for Data Import, Data Transformation, Data Visualization, R Markdown and other topics at https://www.rstudio.com/resources/cheatsheets/