Setting Up R

These Notes make extensive use of

both of which are free, and you’ll need to install them on your machine. Instructions for doing so will be found on the course website.

If you need a gentle introduction, or if you’re just new to R and RStudio and need to learn about them, we encourage you to take a look at https://moderndive.com/, which provides an introduction to statistical and data sciences via R at Ismay and Kim (2022).

Quarto

These notes were written using Quarto, which we’ll learn to use in 431. Quarto, like R and RStudio, is free and open source.

Quarto is described as an scientific and technical publishing system for data science, which lets you (among many other things):

  • save and execute R code
  • generate high-quality reports that can be shared with an audience

The Quarto Get Started page which provides an overview and quick tour of what’s possible with Quarto.

Another excellent resource to learn more about Quarto is the Communicate section (especially the [Quarto chapter]https://r4ds.hadley.nz/quarto.html)) of Hadley Wickham and Grolemund (2023).

R Packages

At the start of each chapter that involves R code, I’ll present a series of commands I run to set up R to use several packages (libraries) of functions that expand its capabilities, make a specific change to how I want R output to be displayed (that’s the comment = NA piece) and sets the theme for most graphs to theme_bw(). A chunk of code like this will occur near the top of any Quarto work.

For example, this is the setup for one of our early chapters that loads four packages.

You only need to install a package once, but you need to reload it (using the library() function) every time you start a new session. I always load the package called tidyverse last, since doing so avoids some annoying problems.

The Love-boost.R script

In October, when we start Part B of the course, we’ll use some special R functions I’ve gathered for you in a script called Love-boost. I’ll tell R about that code using the following command…

source("data/Love-boost.R")

The Love-boost.R script includes four functions:

  • bootdif
  • saifs.ci
  • twobytwo
  • retrodesign

Packages Used in these Notes

A list of all R packages we want you to install this semester (which includes some packages not included in these Notes) is maintained at our course web site.

Package Parts Key functions in the Package
arm C
boot B
broom A, B, C tidy, glance, augment (part of tidymodels)
car A, C boxCox, powerTransform
Epi B twoby2
fivethirtyeight Appendix source of data
GGally A, C ggpairs
ggrepel C
ggridges A, B
ggstance A
gt A for presenting tables
gtsummary A tbl_summary
Hmisc A, B, C describe and others
janitor A, B, C tabyl and others
kableExtra A kbl, kable_stylings
knitr A, B, C kable
lvplot A geom_lv
mice C
modelsummary A, C modelsummary
mosaic A, B, C favstats, inspect
naniar A n_miss, miss_case_table, gg_miss_var
NHANES A source of data
palmerpenguins A source of data
patchwork A, B, C for combining/annotating plots
psych A, B describe
pwr B
rms C
sessioninfo Appendix
simputation A various imputation functions
summarytools A descr, dfSummary
tidyverse A, B, C, Appendix dozens of functions
vcd B
visdat A vis_dat, vis_miss

The tidyverse

The tidyverse package is actually a meta-package which includes the following core packages:

  • ggplot2 for creating graphics
  • dplyr for data manipulation
  • tidyr for creating tidy data
  • readr for reading in rectangular data
  • purrr for working with functions and vectors
  • tibble for creating tibbles - lazy, surly data frames
  • stringr for working with data strings
  • forcats for solving problems with factors

Loading the tidyverse with library(tidyverse) loads those eight packages.

Installing the tidyverse also installs several other useful packages on your machine, like glue and lubridate, for example. Read more about the tidyverse at https://www.tidyverse.org/