Lab 0

Published

2024-01-10

What is this?

500 Lab 0 is an Example with instructions, meant to help you get started on other Labs. Your task for Lab 0 is to read over these materials and see if they help you answer the questions that arise in generating your responses to your actual Lab assignments (Lab 1 - Lab 4) this semester. It’s not a bad idea to try to build a response yourself, and then check it against my solution, but there’s nothing for you to turn in here.

This work uses a data set called lab0.csv (which is a comma-separated .csv file suitable for reading into R.) You’ll find this in the data folder at our 500-data web site.

Template for Lab 0

We have provided a template for building a response to Lab 0, specifically a Quarto file containing some key bits of code which you can edit to produce a response. You’ll find the Lab 0 template in the templates folder at our 500-data web site.

Answer Sketch for Lab 0

We have also provided an Answer Sketch for Lab 0, including both the Quarto file we used to create the sketch and the resulting HTML file it generates. You’ll find those files in the Labs section of our Shared Google Drive.

Original Instructions for this work

Here are the instructions I gave to students for whom this was a required assignment.

Do professional work with this little problem. What do I mean by this?

Properly labeled graphs/figures are a minimal expectation for graduate school.
Use complete English sentences to describe your findings and clarify when annotating code.
Make sure that the answers include enough of the question that your text responses (in addition to the graphs) stand on their own. Be sure to address all three tasks.
Present edited code, making an effort to delete false starts, and comment liberally. Don’t present R code without explaining what you’re doing in English. Quarto makes it easy to intersperse code with explanations, so make that happen.
Use words I know, without simply repeating my explanations verbatim, please.

You are welcome to discuss Lab 0 with anyone, including myself, or your colleagues, but your answer must be prepared by you alone.

If you are confused by the assignment, or stuck in the development of your response, please ask questions!

The Data

The lab0.csv data file is available on the 500-data web site. Remember to download the raw version of the .csv file.

The file includes 135 subjects, the first 40 of whom have received a particular treatment and the remaining 95 of whom have not received it.

Also provided are five meaningful predictors of treatment status, labeled (imaginatively) cov1, cov2, cov3, cov4 and cov5.
Covariates 1-4 are continuous covariates, gathered at varying levels of precision. The cov5 variable is an indicator of whether the subject has a particular characteristic (1 = yes, 0 = no.)
Happily, there are no missing values in the data.

Tasks

Build a logistic regression model using the main effects of the five predictors to predict treatment status.
- Use R to add two columns to the data set, specifically the fitted probability (according to your logistic regression model) of being treated, and the linear component of the logistic regression model (the logit of the probability of being treated.)
Next, summarize the resulting probabilities across the untreated and treated patients in an appropriate and attractive manner.
- Raw R code is rarely attractive on its face - build something brief, effective and appropriate for a presentation.
- Of course, we’d expect that the average probability of being treated will be higher in the patients who are actually treated. Verify that this is the case, in a short numerical and graphical summary of your findings.
How much overlap is there between the fitted probabilities of the treated patients and the fitted probabilities of the untreated patients?
- A graph of this overlap (perhaps a boxplot, but a better option would be a dot chart or density plot of some sort; creativity is welcome here) is crucial, supplemented by a short written description of your findings.

R Setup

To start, I’ll request that R sets its responses to be rendered without the default pair of hashtags. Next, I’ll load two R packages that will help me with these instructions. Then, I’ll load the data, and take a look at the variables.

knitr::opts_chunk$set(comment = NA)

library(janitor)
library(tidyverse)

url_0 <- "https://raw.githubusercontent.com/THOMASELOVE/500-data/master/data/lab0.csv"

lab0 <- read_csv(url_0, show_col_types = FALSE) |>
    mutate(subject = as.character(subject),
           treatment = factor(treatment))

dim(lab0)

[1] 135   7

A Hint for Task 1

Partial R code you might use to do this work follows…

m1 <- glm((treatment=="Treated") ~ cov1 + cov2 + cov3 + cov4 + cov5,
          family=binomial(), data=lab0)

lab0$linpred <- m1$linear.predictors
lab0$prob <- m1$fitted.values

lab0 # note new columns

# A tibble: 135 × 9
   subject treatment  cov1  cov2  cov3  cov4  cov5 linpred  prob
   <chr>   <fct>     <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
 1 101     Treated    38.4  53.7    13    19     0  -0.602 0.354
 2 102     Treated    39.1  48.5    15    21     0  -0.676 0.337
 3 103     Treated    67.3  53.9    11    16     0   0.759 0.681
 4 104     Treated    61.5  52.2    18    21     1   0.404 0.600
 5 105     Treated    66.4  55.6    19    22     0  -0.392 0.403
 6 106     Treated    57.1  44.4    24    33     0  -0.658 0.341
 7 107     Treated    50.3  66.8    15    16     1  -0.224 0.444
 8 108     Treated    61.7  68.9    15    26     0   0.104 0.526
 9 109     Treated    44.7  54.2    19    18     1  -0.742 0.322
10 110     Treated    56.9  36.8    16    24     1   1.18  0.765
# ℹ 125 more rows

Be sure to include Session Information

Please display your session information at the end of your submission, as shown below.

xfun::session_info()

R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Locale:
  LC_COLLATE=English_United States.utf8 
  LC_CTYPE=English_United States.utf8   
  LC_MONETARY=English_United States.utf8
  LC_NUMERIC=C                          
  LC_TIME=English_United States.utf8    

Package version:
  askpass_1.2.0       backports_1.4.1     base64enc_0.1.3    
  bit_4.0.5           bit64_4.0.5         blob_1.2.4         
  broom_1.0.5         bslib_0.7.0         cachem_1.0.8       
  callr_3.7.6         cellranger_1.1.0    cli_3.6.2          
  clipr_0.8.0         colorspace_2.1-0    compiler_4.3.3     
  conflicted_1.2.0    cpp11_0.4.7         crayon_1.5.2       
  curl_5.2.1          data.table_1.15.4   DBI_1.2.2          
  dbplyr_2.5.0        digest_0.6.35       dplyr_1.1.4        
  dtplyr_1.3.1        ellipsis_0.3.2      evaluate_0.23      
  fansi_1.0.6         farver_2.1.1        fastmap_1.1.1      
  fontawesome_0.5.2   forcats_1.0.0       fs_1.6.3           
  gargle_1.5.2        generics_0.1.3      ggplot2_3.5.0      
  glue_1.7.0          googledrive_2.1.1   googlesheets4_1.1.1
  graphics_4.3.3      grDevices_4.3.3     grid_4.3.3         
  gtable_0.3.4        haven_2.5.4         highr_0.10         
  hms_1.1.3           htmltools_0.5.8.1   htmlwidgets_1.6.4  
  httr_1.4.7          ids_1.0.1           isoband_0.2.7      
  janitor_2.2.0       jquerylib_0.1.4     jsonlite_1.8.8     
  knitr_1.46          labeling_0.4.3      lattice_0.22.6     
  lifecycle_1.0.4     lubridate_1.9.3     magrittr_2.0.3     
  MASS_7.3.60.0.1     Matrix_1.6.5        memoise_2.0.1      
  methods_4.3.3       mgcv_1.9.1          mime_0.12          
  modelr_0.1.11       munsell_0.5.1       nlme_3.1.164       
  openssl_2.1.1       parallel_4.3.3      pillar_1.9.0       
  pkgconfig_2.0.3     prettyunits_1.2.0   processx_3.8.4     
  progress_1.2.3      ps_1.7.6            purrr_1.0.2        
  R6_2.5.1            ragg_1.3.0          rappdirs_0.3.3     
  RColorBrewer_1.1.3  readr_2.1.5         readxl_1.4.3       
  rematch_2.0.0       rematch2_2.1.2      reprex_2.1.0       
  rlang_1.1.3         rmarkdown_2.26      rstudioapi_0.16.0  
  rvest_1.0.4         sass_0.4.9          scales_1.3.0       
  selectr_0.4.2       snakecase_0.11.1    splines_4.3.3      
  stats_4.3.3         stringi_1.8.3       stringr_1.5.1      
  sys_3.4.2           systemfonts_1.0.6   textshaping_0.3.7  
  tibble_3.2.1        tidyr_1.3.1         tidyselect_1.2.1   
  tidyverse_2.0.0     timechange_0.3.0    tinytex_0.50       
  tools_4.3.3         tzdb_0.4.0          utf8_1.2.4         
  utils_4.3.3         uuid_1.2.0          vctrs_0.6.5        
  viridisLite_0.4.2   vroom_1.6.5         withr_3.0.0        
  xfun_0.43           xml2_1.3.6          yaml_2.3.8