knitr::opts_chunk$set(comment = NA)
library(conflicted)
library(janitor)
library(naniar)
library(haven)
library(broom)
library(topmodels)
library(survival)
library(survminer)
library(yardstick)
library(easystats)
library(tidyverse)
conflicts_prefer(dplyr::filter)
theme_set(theme_lucid()) Lab 6
Information to come.
General Instructions
- Submit your work via Canvas.
- The deadline for this Lab is specified on the Course Calendar.
- We charge a 5 point penalty for a lab that is 1-48 hours late.
- Labs that are more than 48 hours late will receive 30 points (out of a possible 50.)
- No labs may be skipped in 432. Students must submit all seven Labs to pass the course.
- Your response should include a Quarto file (.qmd) and an HTML document that is the result of applying your Quarto file to the data we’ve provided.
- Our usual advice and templates apply to Lab 6 in the same way as they did in Labs 1-4.
R Packages and Setup
My answer sketch uses the following R packages and set-up.
Note that my list of R packages does not include separate loading of any of the core tidyverse packages, or the packages in the easystats framework. The core tidyverse packages are listed at https://www.tidyverse.org/packages/#core-tidyverse, and the packages in the easystats framework are listed at https://easystats.github.io/easystats/. If you separately load any of these packages here or in Lab 7 or Project B, you will lose points.
The Data
- The
chr_2015.csvcsv file (from Lab 1), and theremit48.savSPSS file appear on the 432 data page. - A detailed codebook for all of the data in the
chr_2024file is available here. - The variables included in the
remit48data are described in Question 2, below.
Question 1. (25 points)
Use the chr_2015 data to build a model to predict each county’s percentage of the population ages 16 and older who are unemployed but seeking work, as measured in 2013 (and reported in CHR 2015). Note that each of the values in the data are integers (that fall between 1 and 28), and so we will treat the unemp values as counts in Question 1. You will produce a Poisson regression model for unemp using the main effects of two quantitative predictors: the county’s food environment index and the county’s adult obesity rate.
- {10} Produce the Poisson regression model, which I’ll call
mod1, then carefully interpret the exponentiated coefficient (the point estimate and a 90% confidence interval around it) for thefood_envvariable in context. Round your estimates to two decimal places.
- An appropriate response to Question 1a should compare two counties with specified characteristics, and should clearly state assumptions regarding both the way in which the sample was collected and the accuracy of the model.
- My response in the answer sketch is four sentences, including 111 words, to give you an idea of what we’re looking for.
- There is no need to present both the exponentiated and un-exponentiated results in your response.
{5} Produce and interpret the meaning (in a complete sentence) of a rootogram for
mod1.{5} An \(R^2\) value for
mod1can be built in at least two different ways:
- the Nagelkerke \(R^2\), or
- the squared correlation of the observed and model-predicted outcome values.
Produce and specify each of these values, expressed as a proportion (between 0 and 1) rounded to three decimal places.
- {5} Use
mod1to make a prediction of theunemprate (rounded to two decimal places) for Cuyahoga County, in Ohio, based on its values forfood_env(6.7) and forobesity(28). Then, in a complete sentence or two, compare themod1prediction to the observedunemprate for Cuyahoga County as reported in 2013 as part of CHR 2015.
Question 2. (15 points)
The remit48.sav file gathers initial remission times, in days (the variable is called days) for 48 adult subjects with a leukemia diagnosis who were randomly allocated to one of two different treatments, labeled Old and New. Some patients were right-censored before their remission times could be fully determined, as indicated by values of censored = “Yes” in the data set. Note that remission is a good thing, so long times before remission are bad.
Here is my code creating the tibble for Question 2, which I call lab6q2.
lab6q2 <- read_spss(here("data/remit48.sav"))
lab6q2$treatment |> attr("label")
lab6q2$censored |> attr("label")
lab6q2 <- lab6q2 |>
mutate(treatment =
fct_recode(factor(treatment), "New" = "1", "Old" = "2"),
censored =
fct_recode(factor(censored), "No" = "1", "Yes" = "2"),
subject = as.character(subject)) |>
zap_labels()
Be sure a glimpse at your lab6q2 produces the following:
> glimpse(lab6q2)
Rows: 48
Columns: 4
$ subject <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", …
$ treatment <fct> New, New, Old, New, New, Old, Old, Old, New, New, …
$ days <dbl> 269, 139, 161, 9, 31, 199, 19, 20, 28, 29, …
$ censored <fct> Yes, No, Yes, No, No, Yes, No, No, No, No, …
- {10} Plot appropriate Kaplan-Meier estimates of the survival functions for each of the two treatments in a single plot. Then create a table that shows the restricted mean and median for survival time in days for each of the two treatment groups.
In the answer sketch for Question 2a, I silenced a warning in building the plot. You can do the same, if needed, for this plot.
- {5} In a sentence or two, what conclusions can you draw from your plot and table?
Question 3. (10 points)
Write an essay of at least 125 words (and a minimum of 5 complete sentences) specifying something from your reading of Jeff Leek’s How To Be a Modern Scientist that strongly resonates with you, and that you want to put into practice. Please be specific about what Leek’s suggestion is, how you hope to accomplish this, and why you want to.
This isn’t the place to complain about something in the book. That will come in Lab 7.
We will award full credit to any student who we believe:
- provides an insightful and enthusiastic response
- provides a response that is written well, using complete sentences
- avoids grammar, syntax and spelling errors
- clearly indicates the source of the advice and context for it
- clearly indicates why this idea resonates with them, and thus why they want to do it, and
- clearly indicates how they hope to put the idea into action, with specific information about what they plan to do,
- in at least 125 words and five sentences (we won’t worry about the essay being too long)
Use of AI
If you decide to use some sort of AI to help you with this Lab, we ask that you place a note to that effect, describing what you used and how you used it, as a separate section called “Use of AI”, after your answers to our questions, and just before your presentation of the Session Information. Thank you.
Be sure to include Session Information
Please display your session information at the end of your submission, as shown below.
xfun::session_info()R version 4.5.2 (2025-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)
Locale:
LC_COLLATE=English_United States.utf8
LC_CTYPE=English_United States.utf8
LC_MONETARY=English_United States.utf8
LC_NUMERIC=C
LC_TIME=English_United States.utf8
Package version:
abind_1.4-8 askpass_1.2.1 backports_1.5.0
base64enc_0.1.6 bayestestR_0.17.0 bit_4.6.0
bit64_4.6.0.1 blob_1.3.0 boot_1.3.32
broom_1.0.12 bslib_0.10.0 cachem_1.1.0
callr_3.7.6 car_3.1-5 carData_3.0-6
cellranger_1.1.0 cli_3.6.5 clipr_0.8.0
coda_0.19-4.1 codetools_0.2-20 colorspace_2.1-2
commonmark_2.0.0 compiler_4.5.2 conflicted_1.2.0
correlation_0.8.8 corrplot_0.95 cowplot_1.2.0
cpp11_0.5.3 crayon_1.5.3 curl_7.0.0
data.table_1.18.2.1 datasets_4.5.2 datawizard_1.3.0
DBI_1.3.0 dbplyr_2.5.2 Deriv_4.2.0
digest_0.6.39 distributions3_0.2.3 doBy_4.7.1
dplyr_1.2.0 dtplyr_1.3.3 easystats_0.7.5
effectsize_1.0.1 emmeans_2.0.2 estimability_1.5.1
evaluate_1.0.5 exactRankTests_0.8.35 farver_2.1.2
fastmap_1.2.0 fontawesome_0.5.3 forcats_1.0.1
forecast_9.0.1 Formula_1.2-5 fracdiff_1.5.3
fs_1.6.6 gargle_1.6.1 generics_0.1.4
ggplot2_4.0.2 ggpubr_0.6.3 ggrepel_0.9.7
ggsci_4.2.0 ggsignif_0.6.4 ggtext_0.1.2
glue_1.8.0 googledrive_2.1.2 googlesheets4_1.1.2
graphics_4.5.2 grDevices_4.5.2 grid_4.5.2
gridExtra_2.3 gridtext_0.1.6 gtable_0.3.6
hardhat_1.4.2 haven_2.5.5 highr_0.12
hms_1.1.4 htmltools_0.5.9 htmlwidgets_1.6.4
httr_1.4.8 ids_1.0.1 insight_1.4.6
isoband_0.3.0 janitor_2.2.1 jpeg_0.1.11
jquerylib_0.1.4 jsonlite_2.0.0 knitr_1.51
labeling_0.4.3 lattice_0.22-9 lifecycle_1.0.5
litedown_0.9 lme4_1.1.38 lmtest_0.9.40
lubridate_1.9.5 magrittr_2.0.4 markdown_2.0
MASS_7.3-65 Matrix_1.7-4 MatrixModels_0.5.4
maxstat_0.7.26 memoise_2.0.1 methods_4.5.2
mgcv_1.9.4 microbenchmark_1.5.0 mime_0.13
minqa_1.2.8 modelbased_0.14.0 modelr_0.1.11
multcomp_1.4-29 mvtnorm_1.3-3 naniar_1.1.0
nlme_3.1.168 nloptr_2.2.1 nnet_7.3.20
norm_1.0.11.1 numDeriv_2016.8.1.1 openssl_2.3.5
otel_0.2.0 parallel_4.5.2 parameters_0.28.3
patchwork_1.3.2 pbkrtest_0.5.5 performance_0.16.0
pillar_1.11.1 pkgconfig_2.0.3 plyr_1.8.9
png_0.1.8 polynom_1.4.1 prettyunits_1.2.0
processx_3.8.6 progress_1.2.3 ps_1.9.1
purrr_1.2.1 quantreg_6.1 R6_2.6.1
ragg_1.5.0 rappdirs_0.3.4 rbibutils_2.4.1
RColorBrewer_1.1-3 Rcpp_1.1.1 RcppArmadillo_15.2.3.1
RcppEigen_0.3.4.0.2 Rdpack_2.6.6 readr_2.2.0
readxl_1.4.5 reformulas_0.4.4 rematch_2.0.0
rematch2_2.1.2 report_0.6.3 reprex_2.1.1
rlang_1.1.7 rmarkdown_2.30 rstatix_0.7.3
rstudioapi_0.18.0 rvest_1.0.5 S7_0.2.1
sandwich_3.1-1 sass_0.4.10 scales_1.4.0
see_0.13.0 selectr_0.5.1 snakecase_0.11.1
SparseM_1.84.2 sparsevctrs_0.3.6 splines_4.5.2
stats_4.5.2 stringi_1.8.7 stringr_1.6.0
survival_3.8-6 survminer_0.5.2 sys_3.4.3
systemfonts_1.3.2 textshaping_1.0.4 TH.data_1.1-5
tibble_3.3.1 tidyr_1.3.2 tidyselect_1.2.1
tidyverse_2.0.0 timechange_0.4.0 timeDate_4052.112
tinytex_0.58 tools_4.5.2 topmodels_0.3-0
tzdb_0.5.0 UpSetR_1.4.0 urca_1.3.4
utf8_1.2.6 utils_4.5.2 uuid_1.2.2
vctrs_0.7.1 viridis_0.6.5 viridisLite_0.4.3
visdat_0.6.0 vroom_1.7.0 withr_3.0.2
xfun_0.56 xml2_1.5.2 xtable_1.8-8
yaml_2.3.12 yardstick_1.3.2 zoo_1.8-15
After the Lab
- We will post an answer sketch to our Shared Google Drive 48 hours after the Lab is due.
- We will post grades to our Grading Roster on our Shared Google Drive one week after the Lab is due.
- See the Lab Appeal Policy in Section 8.4 of our Syllabus if you are interested in having your Lab grade reviewed, and use the Lab Regrade Request form to complete the task. The form (which is optional) deadline is specified in the Calendar. Thank you.