Using data from an R package
To use data from an R package, for instance, the bechdel
data from the fivethirtyeight
package, you can simply load the relevant package with library
and then the data frame will be available
# A tibble: 1,794 × 15
year imdb title test clean_test binary budget domgross intgross code
<int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr>
1 2013 tt1711425 21 & … nota… notalk FAIL 1.3 e7 25682380 4.22e7 2013…
2 2012 tt1343727 Dredd… ok-d… ok PASS 4.50e7 13414714 4.09e7 2012…
3 2013 tt2024544 12 Ye… nota… notalk FAIL 2 e7 53107035 1.59e8 2013…
4 2013 tt1272878 2 Guns nota… notalk FAIL 6.1 e7 75612460 1.32e8 2013…
5 2013 tt0453562 42 men men FAIL 4 e7 95020213 9.50e7 2013…
6 2013 tt1335975 47 Ro… men men FAIL 2.25e8 38362475 1.46e8 2013…
7 2013 tt1606378 A Goo… nota… notalk FAIL 9.2 e7 67349198 3.04e8 2013…
8 2013 tt2194499 About… ok-d… ok PASS 1.20e7 15323921 8.73e7 2013…
9 2013 tt1814621 Admis… ok ok PASS 1.3 e7 18007317 1.80e7 2013…
10 2013 tt1815862 After… nota… notalk FAIL 1.3 e8 60522097 2.44e8 2013…
# ℹ 1,784 more rows
# ℹ 5 more variables: budget_2013 <int>, domgross_2013 <dbl>,
# intgross_2013 <dbl>, period_code <int>, decade_code <int>
For more on this example, visit Bechdel analysis using the tidyverse.
Using read_rds
to read in an R data set
We have provided the nnyfs.Rds
data file on the course data page.
Suppose you have downloaded this data file into a directory on your computer called data
which is a sub-directory of the directory where you plan to do your work, perhaps called 431-nnyfs
.
Open RStudio and create a new project into the 431-nnyfs
directory on your computer. You should see a data
subdirectory in the Files window in RStudio after the project is created.
Now, read in the nnyfs.Rds
file to a new tibble in R called nnyfs
with the following command:
Here are the results…
# A tibble: 1,518 × 45
SEQN sex age_child race_eth educ_child language sampling_wt income_pov
<chr> <fct> <dbl> <fct> <dbl> <fct> <dbl> <dbl>
1 71917 Female 15 3_Black No… 9 English 28299. 0.21
2 71918 Female 8 3_Black No… 2 English 15127. 5
3 71919 Female 14 2_White No… 8 English 29977. 5
4 71920 Female 15 2_White No… 8 English 80652. 0.87
5 71921 Male 3 2_White No… NA English 55592. 4.34
6 71922 Male 12 1_Hispanic 6 English 27365. 5
7 71923 Male 12 2_White No… 5 English 86673. 5
8 71924 Female 8 4_Other Ra… 2 English 39549. 2.74
9 71925 Male 7 1_Hispanic 0 English 42333. 0.46
10 71926 Male 8 3_Black No… 2 English 15307. 1.57
# ℹ 1,508 more rows
# ℹ 37 more variables: age_adult <dbl>, educ_adult <fct>, respondent <fct>,
# salt_used <fct>, energy <dbl>, protein <dbl>, sugar <dbl>, fat <dbl>,
# diet_yesterday <fct>, water <dbl>, plank_time <dbl>, height <dbl>,
# weight <dbl>, bmi <dbl>, bmi_cat <fct>, arm_length <dbl>, waist <dbl>,
# arm_circ <dbl>, calf_circ <dbl>, calf_skinfold <dbl>,
# triceps_skinfold <dbl>, subscapular_skinfold <dbl>, active_days <dbl>, …
Using read_csv
to read in a comma-separated version of a data file
We have provided the fev_ros.csv
data file on the course data page.
Suppose you have downloaded this data file into a directory on your computer called data
.
Now, read in the fev_ros.csv
file to a new tibble in R called fev_ros
with the following command, assuming you also want to convert the character
variables to factors
, as you will often want to do before analyzing the results.
Rows: 654 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): sex, smoke
dbl (4): id, age, fev, height
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 654 × 6
id age fev height sex smoke
<dbl> <dbl> <dbl> <dbl> <fct> <fct>
1 301 9 1.71 57 female non-current smoker
2 451 8 1.72 67.5 female non-current smoker
3 501 7 1.72 54.5 female non-current smoker
4 642 9 1.56 53 male non-current smoker
5 901 9 1.90 57 male non-current smoker
6 1701 8 2.34 61 female non-current smoker
7 1752 6 1.92 58 female non-current smoker
8 1753 6 1.42 56 female non-current smoker
9 1901 8 1.99 58.5 female non-current smoker
10 1951 9 1.94 60 female non-current smoker
# ℹ 644 more rows
Note that, for example, sex
and smoke
are now listed as factor (fctr
) variables.
For more on factors, visit https://r4ds.had.co.nz/factors.html.