General Instructions
- Submit your work via Canvas.
- The deadline for this Lab is specified on the Course Calendar.
- We charge a 5 point penalty for a lab that is 1-48 hours late.
- We do not grade work that is more than 48 hours late.
- Your response should include a Quarto file (.qmd) and an HTML document that is the result of applying your Quarto file to the data we’ve provided.
You can skip exactly one of Labs 1-5 without penalty, but all students must complete both Lab 6 and Lab 7. If you decide to skip a lab, please submit a note to Canvas by the deadline saying that you are skipping the lab.
Template
There is a Lab 3 Quarto template available on our 432-data page. Please use the template to prepare your response to Lab 3, as it will make things easier for you and for the people grading your work.
Our Best Advice
Review your HTML output file carefully before submission for copy-editing issues (spelling, grammar and syntax.) Even with spell-check in RStudio (just hit F7), it’s hard to find errors with these issues in your Quarto file so long as it is running. You really need to look closely at the resulting HTML output.
The Data
- The
nh_1500
R data set is available for download on the 432 data page.
- A detailed description of each variable in the
nh_1500
(and also the nh_3143
) data is available here.
Question 1 (25 points)
In question 1, you will evaluate a linear regression fit in the nh_1500
data to predict a subject’s red blood cell count using these five predictors:
- the subject’s sex,
- the subject’s race/ethnicity,
- the subject’s waist circumference,
- the subject’s pulse rate, and
- whether or not the subject has smoked 100 cigarettes in their life
Note that the main effects model using all five of these predictors will use 7 degrees of freedom, since there are four race/ethnicity categories, and the other four variables are all either binary or quantitative.
{10 points} Use a Spearman \(\rho^2\) plot to identify a single non-linear term which could be added to the model. Your selected non-linear term may add at most 3 degrees of freedom to the main effects model. Specify the added term clearly, and then fit both the main effects model (call it m1_main
) and the model with your non-linear term (call it m1_add
) using both ols()
and lm()
.
{10} Which of the two models you fit in part a. appears to do a better job, when evaluated using bootstrap validation in the development sample? Why? An appropriate response will compare the models in terms of validated R-square and MSE values using set.seed(2025)
and 40 bootstrap replications.
{5} Plot the effect summary (using plot(summary)
after an ols()
fit) for the model you preferred in part b, and explain the meaning of the pulse
coefficient shown in the plot in a complete English sentence.
Question 2 (25 points)
Again using the nh_1500
data, we will now build a set of logistic regression models to predict whether a subject is limited in the kind or amount of work they can do by a physical, mental or emotional problem.
{10} Build a model to predict limited
on the basis of self-reported overall health. Call this model2a
. Then add the main effects of white blood cell count, waist circumference and age to the model and call this new model model2b
.
{10} Interpret the odds ratio associated with self-reported overall health being Excellent as compared to being Good in each of your two models, and provide a 90% confidence interval for each such estimate.
{5} As measured by a validated C statistic using a seed of 432
and 40 bootstrap replications, which model performs better, model2a
or model2b
, and why?
Use of AI
If you decide to use some sort of AI to help you with this Lab, we ask that you place a note to that effect, describing what you used and how you used it, as a separate section called “Use of AI”, after your answers to our questions, and just before your presentation of the Session Information. Thank you.
After the Lab
- We will post an answer sketch to our Shared Google Drive 48 hours after the Lab is due.
- We will post grades to our Grading Roster on our Shared Google Drive one week after the Lab is due.
- See the Lab Appeal Policy in our Syllabus if you are interested in having your Lab grade reviewed, and use the Lab Regrade Request form specified there to complete the task. Thank you.