Lab 6

Published

2026-04-01

General Instructions

Submit your work via Canvas.
The deadline for this Lab is specified on the Course Calendar.
- We charge a 5 point penalty for a lab that is 1-48 hours late.
- Labs that are more than 48 hours late will receive 30 points (out of a possible 50.)
- No labs may be skipped in 432. Students must submit all seven Labs to pass the course.
Your response should include a Quarto file (.qmd) and an HTML document that is the result of applying your Quarto file to the data we’ve provided.
Our usual advice and templates apply to Lab 6 in the same way as they did in Labs 1-4.

The `topmodels` package

For the package topmodels so far only a development version is available, which is hosted on R-Forge at https://R-Forge.R-project.org/projects/topmodels/pkg/topmodels/ in a Subversion (SVN) repository. The package can be installed by running the following command in the R console (NOT within a Quarto file).

install.packages("topmodels", repos = "https://R-Forge.R-project.org")

or via

remotes::install_svn("svn://R-Forge.R-project.org/svnroot/topmodels/pkg/topmodels")

where a specific revision can be installed by setting the optional argument revision. The topmodels package page describing this is here.

R Packages and Setup

My answer sketch uses the following R packages and set-up.

knitr::opts_chunk$set(comment = NA) 

library(conflicted)
library(janitor)
library(naniar)

library(haven)

library(broom)
library(topmodels)
library(survival)
library(survminer)
library(yardstick)

library(easystats)
library(tidyverse) 

conflicts_prefer(dplyr::filter)

theme_set(theme_lucid())

Tip

Note that my list of R packages does not include separate loading of any of the core tidyverse packages, or the packages in the easystats framework. The core tidyverse packages are listed at https://www.tidyverse.org/packages/#core-tidyverse, and the packages in the easystats framework are listed at https://easystats.github.io/easystats/. If you separately load any of these packages here or in Lab 7 or Project B, you will lose points.

The Data

The chr_2015.csv csv file (from Lab 1), and the remit48.sav SPSS file appear on the 432 data page.
A detailed codebook for all of the data in the chr_2015 file (and also for the chr_2024 file) is available here. We use the chr_2015 data in Lab 6 Question 1.
The variables included in the remit48 data are described in Question 2, below.

Question 1. (25 points)

Use the chr_2015 data to build a model to predict each county’s percentage of the population ages 16 and older who are unemployed but seeking work, as measured in 2013 (and reported in CHR 2015). Note that each of the values in the data are integers (that fall between 1 and 28), and so we will treat the unemp values as counts in Question 1. You will produce a Poisson regression model for unemp using the main effects of two quantitative predictors: the county’s food environment index and the county’s adult obesity rate.

{10} Produce the Poisson regression model, which I’ll call mod1, then carefully interpret the exponentiated coefficient (the point estimate and a 90% confidence interval around it) for the food_env variable in context. Round your estimates to two decimal places.

Tip

An appropriate response to Question 1a should compare two counties with specified characteristics, and should clearly state assumptions regarding both the way in which the sample was collected and the accuracy of the model.
My response in the answer sketch is four sentences, including 111 words, to give you an idea of what we’re looking for.
There is no need to present both the exponentiated and un-exponentiated results in your response.

{5} Produce and interpret the meaning (in a complete sentence) of a rootogram for mod1. You can find the rootogram here if you have trouble generating it.
{5} An \(R^2\) value for mod1 can be built in at least two different ways:

the Nagelkerke \(R^2\), or
the squared correlation of the observed and model-predicted outcome values.

Produce and specify each of these values, expressed as a proportion (between 0 and 1) rounded to three decimal places.

{5} Use mod1 to make a prediction of the unemp rate (rounded to two decimal places) for Cuyahoga County, in Ohio, based on its values for food_env (6.7) and for obesity (28). Then, in a complete sentence or two, compare the mod1 prediction to the observed unemp rate for Cuyahoga County as reported in 2013 as part of CHR 2015.

Question 2. (15 points)

The remit48.sav file gathers initial remission times, in days (the variable is called days) for 48 adult subjects with a leukemia diagnosis who were randomly allocated to one of two different treatments, labeled Old and New. Some patients were right-censored before their remission times could be fully determined, as indicated by values of censored = “Yes” in the data set. Note that remission is a good thing, so long times before remission are bad.

Tip

Here is my code creating the tibble for Question 2, which I call lab6q2.

lab6q2 <- read_spss(here("data/remit48.sav")) 

lab6q2$treatment |> attr("label")
lab6q2$censored |> attr("label")

lab6q2 <- lab6q2 |>
  mutate(treatment = 
           fct_recode(factor(treatment), "New" = "1", "Old" = "2"),
         censored = 
           fct_recode(factor(censored), "No" = "1", "Yes" = "2"),
         subject = as.character(subject)) |>
  zap_labels()

Be sure a glimpse at your lab6q2 produces the following:

> glimpse(lab6q2)

Rows: 48
Columns: 4
$ subject   <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", …
$ treatment <fct> New, New, Old, New, New, Old, Old, Old, New, New, …
$ days      <dbl> 269, 139, 161, 9, 31, 199, 19, 20, 28, 29, …
$ censored  <fct> Yes, No, Yes, No, No, Yes, No, No, No, No, …

{10} Plot appropriate Kaplan-Meier estimates of the survival functions for each of the two treatments in a single plot. Then create a table that shows the restricted mean and median for survival time in days for each of the two treatment groups.

Note

In the answer sketch for Question 2a, I silenced a warning in building the plot. You can do the same, if needed, for this plot.

{5} In a sentence or two, what conclusions can you draw from your plot and table?

Question 3. (10 points)

Write an essay of at least 125 words (and a minimum of 5 complete sentences) specifying something from your reading of Jeff Leek’s How To Be a Modern Scientist that strongly resonates with you, and that you want to put into practice. Please be specific about what Leek’s suggestion is, how you hope to accomplish this, and why you want to.

Note

This isn’t the place to complain about something in the book. That will come in Lab 7.
We will award full credit to any student who we believe:
- provides an insightful and enthusiastic response
- provides a response that is written well, using complete sentences
- avoids grammar, syntax and spelling errors
- clearly indicates the source of the advice and context for it
- clearly indicates why this idea resonates with them, and thus why they want to do it, and
- clearly indicates how they hope to put the idea into action, with specific information about what they plan to do,
- in at least 125 words and five sentences (we won’t worry about the essay being too long)

Use of AI

If you decide to use some sort of AI to help you with this Lab, we ask that you place a note to that effect, describing what you used and how you used it, as a separate section called “Use of AI”, after your answers to our questions, and just before your presentation of the Session Information. Thank you.

Be sure to include Session Information

Please display your session information at the end of your submission, as shown below.

xfun::session_info()

R version 4.5.3 (2026-03-11 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)

Locale:
  LC_COLLATE=English_United States.utf8 
  LC_CTYPE=English_United States.utf8   
  LC_MONETARY=English_United States.utf8
  LC_NUMERIC=C                          
  LC_TIME=English_United States.utf8    

Package version:
  abind_1.4-8            askpass_1.2.1          backports_1.5.1       
  base64enc_0.1.6        bayestestR_0.17.0      bit_4.6.0             
  bit64_4.6.0.1          blob_1.3.0             boot_1.3.32           
  broom_1.0.12           bslib_0.10.0           cachem_1.1.0          
  callr_3.7.6            car_3.1-5              carData_3.0-6         
  cellranger_1.1.0       cli_3.6.6              clipr_0.8.0           
  coda_0.19-4.1          codetools_0.2-20       colorspace_2.1-2      
  commonmark_2.0.0       compiler_4.5.3         conflicted_1.2.0      
  correlation_0.8.8      corrplot_0.95          cowplot_1.2.0         
  cpp11_0.5.4            crayon_1.5.3           curl_7.0.0            
  data.table_1.18.2.1    datasets_4.5.3         datawizard_1.3.0      
  DBI_1.3.0              dbplyr_2.5.2           Deriv_4.2.0           
  digest_0.6.39          distributions3_0.2.3   doBy_4.7.1            
  dplyr_1.2.1            dtplyr_1.3.3           easystats_0.7.5       
  effectsize_1.0.2       emmeans_2.0.3          estimability_1.5.1    
  evaluate_1.0.5         exactRankTests_0.8.36  farver_2.1.2          
  fastmap_1.2.0          fontawesome_0.5.3      forcats_1.0.1         
  forecast_9.0.2         Formula_1.2-5          fracdiff_1.5.3        
  fs_2.0.1               gargle_1.6.1           generics_0.1.4        
  ggplot2_4.0.2          ggpubr_0.6.3           ggrepel_0.9.8         
  ggsci_4.3.0            ggsignif_0.6.4         ggtext_0.1.2          
  glue_1.8.0             googledrive_2.1.2      googlesheets4_1.1.2   
  graphics_4.5.3         grDevices_4.5.3        grid_4.5.3            
  gridExtra_2.3          gridtext_0.1.6         gtable_0.3.6          
  hardhat_1.4.3          haven_2.5.5            highr_0.12            
  hms_1.1.4              htmltools_0.5.9        htmlwidgets_1.6.4     
  httr_1.4.8             ids_1.0.1              insight_1.4.6         
  isoband_0.3.0          janitor_2.2.1          jpeg_0.1.11           
  jquerylib_0.1.4        jsonlite_2.0.0         knitr_1.51            
  labeling_0.4.3         lattice_0.22-9         lifecycle_1.0.5       
  litedown_0.9           lme4_2.0.1             lmtest_0.9.40         
  lubridate_1.9.5        magrittr_2.0.5         markdown_2.0          
  MASS_7.3-65            Matrix_1.7-5           MatrixModels_0.5.4    
  maxstat_0.7.26         memoise_2.0.1          methods_4.5.3         
  mgcv_1.9.4             microbenchmark_1.5.0   mime_0.13             
  minqa_1.2.8            modelbased_0.14.0      modelr_0.1.11         
  multcomp_1.4-30        mvtnorm_1.3-6          naniar_1.1.0          
  nlme_3.1.169           nloptr_2.2.1           nnet_7.3.20           
  norm_1.0.11.1          numDeriv_2016.8.1.1    openssl_2.3.5         
  otel_0.2.0             parallel_4.5.3         parameters_0.28.3     
  patchwork_1.3.2        pbkrtest_0.5.5         performance_0.16.0    
  pillar_1.11.1          pkgconfig_2.0.3        plyr_1.8.9            
  png_0.1.9              polynom_1.4.1          prettyunits_1.2.0     
  processx_3.8.7         progress_1.2.3         ps_1.9.2              
  purrr_1.2.2            quantreg_6.1           R6_2.6.1              
  ragg_1.5.2             rappdirs_0.3.4         rbibutils_2.4.1       
  RColorBrewer_1.1-3     Rcpp_1.1.1             RcppArmadillo_15.2.4.1
  RcppEigen_0.3.4.0.2    Rdpack_2.6.6           readr_2.2.0           
  readxl_1.4.5           reformulas_0.4.4       rematch_2.0.0         
  rematch2_2.1.2         report_0.6.3           reprex_2.1.1          
  rlang_1.2.0            rmarkdown_2.31         rstatix_0.7.3         
  rstudioapi_0.18.0      rvest_1.0.5            S7_0.2.1              
  sandwich_3.1-1         sass_0.4.10            scales_1.4.0          
  see_0.13.0             selectr_0.5.1          snakecase_0.11.1      
  SparseM_1.84.2         sparsevctrs_0.3.6      splines_4.5.3         
  stats_4.5.3            stringi_1.8.7          stringr_1.6.0         
  survival_3.8-6         survminer_0.5.2        sys_3.4.3             
  systemfonts_1.3.2      textshaping_1.0.5      TH.data_1.1-5         
  tibble_3.3.1           tidyr_1.3.2            tidyselect_1.2.1      
  tidyverse_2.0.0        timechange_0.4.0       timeDate_4052.112     
  tinytex_0.59           tools_4.5.3            topmodels_0.3-0       
  tzdb_0.5.0             UpSetR_1.4.0           urca_1.3.4            
  utf8_1.2.6             utils_4.5.3            uuid_1.2.2            
  vctrs_0.7.3            viridis_0.6.5          viridisLite_0.4.3     
  visdat_0.6.0           vroom_1.7.1            withr_3.0.2           
  xfun_0.57              xml2_1.5.2             xtable_1.8-8          
  yaml_2.3.12            yardstick_1.4.0        zoo_1.8-15

After the Lab

We will post an answer sketch to our Shared Google Drive 48 hours after the Lab is due.
We will post grades to our Grading Roster on our Shared Google Drive one week after the Lab is due.
See the Lab Appeal Policy in Section 8.4 of our Syllabus if you are interested in having your Lab grade reviewed, and use the Lab Regrade Request form to complete the task. The form (which is optional) deadline is specified in the Calendar. Thank you.