This page was last updated: 2021-10-25 09:36:49.
Here is an outline of the “Analysis” section of your final Project A report. Note that none of this material should be included in your Project A proposal.
In this section you will build a simple linear regression model to predict your outcome using one of your two quantitative predictors (it’s your choice.) Start by identifying those variables, and restricting your data set for this analysis to the complete cases on those variables.
Here, you should state your research question clearly. A good research question (and your response to it) is the most important part of this analysis.
How well does a linear model using [predictor] predict [outcome], in [number] counties in the states of [list of your states]?
or
What is the nature of the association between [predictor] and [outcome], in [number] counties in the states of [list of your states]?
Provide an attractive, well-labeled scatterplot of your outcome and predictor, before any transformations, including all code necessary to build it, and describe what the plot tells you about the association of the variables, as detailed in the instructions.
Here, you will provide information about any transformations you chose to apply to the outcome, and explain why you did (or didn’t) choose to use a transformation.
If you decide to use a transformation, specify the transformation you’ve chosen carefully, and show the scatterplot of the transformed data, using the same suggestions as were provided in the previous section to describe the resulting plot. Write a few sentences describing the transformations you considered, and why you thought this was the most promising, and why you eventually decided to use the transformed versions of your variables.
If you decide not to use a transformation, identify the ONE plot which was the most promising of available transformations and show that one. Write a few sentences describing the transformations you considered, and why you thought this was the most promising, and why you eventually decided to stick with the original versions of the variables.
Your response will include ONE plot demonstrating a particular transformation, although you will probably fit and view several plots in order to select a final one for this work.
For this part of Project A, confine your search to either a logarithm, an inverse, or a square as applied to the outcome. If you want to consider one of those transformations for the predictor as well, that’s OK but not crucial. You should select the most promising transformation on the basis of a scatterplot (perhaps with a loess smooth and linear fit) after the transformation has been applied.
You are discouraged from using numerical summaries of fit in this section. Of course, summary statistics like \(R^2\) when the outcome is transformed will definitely not be comparable.
Note that this transformation assessment is part of Analysis 1, but will not be shown in Analyses 2 or 3. Feel free to use the transformation (of the outcome) that you select in Analysis 1 for the other two Analyses, if you like.
Fit your model to use your predictor to predict your outcome (applying your selected transformation) and provide the code you used, and the following summary elements in this section.
A written statement of the full prediction equation, with coefficients nicely rounded, and a careful description of what the coefficients mean in context.
A tidy summary of the model’s coefficients, including 90% confidence interval for model estimates.
The model’s R-squared, residual standard error, the number of observations to which the model was fit, and the Pearson correlation of your predictor and outcome.
Here, you’ll need to do four things.
augment
function would be very helpful here.m1
, you could use something like plot(m1, which = c(1:2))
to obtain these two plots and that’s OK.ggplot2
and patchwork
following the strategy I’ve demonstrated multiple times in the slides and the Course Notes, should you desire.Here, you’ll write two paragraphs.
In the first paragraph, you should provide a clear restatement of your research question, followed by a clear and appropriate response to your research question, motivated by your results.
Then, write a paragraph which summarizes the key limitations of your work in Analysis 1.
In this section you will build a simple linear regression model to predict your outcome using either of your two categorical predictors (it’s your choice.) Start by identifying those variables, and restricting your data set for this analysis to the complete cases on those variables.
Here, you should state your research question clearly. A good research question (and your response to it) is the most important part of this analysis.
Does [category A] or [category B] show higher levels of [outcome], in [number] counties in the states of [list of your states]? (for a binary predictor)
or
Which of [the categories in your predictor] is associated with the highest mean level of [outcome], in [number] counties in the states of [list of your states]? (for a multi-categorical predictor)
In Analysis 2, it is up to decide whether or not a transformation of the outcome would be valuable. Your model will assume that the distribution of the outcome is Normal, with similar variance across each level of your categorical predictor. If you decide to transform the outcome, again, stick with either the logarithm, inverse or square.
Provide an attractive boxplot (with or without a violin plot) showing your outcome broken down by levels of your predictor, after whatever transformation you choose (if any), including all code necessary to build it, and describe what the plot tells you about the association of the variables, as detailed in the instructions.
Fit your model to use your predictor to predict your outcome (applying your selected transformation) and provide the code you used, and the following summary elements in this section.
A written statement of the full prediction equation, with coefficients nicely rounded, and a careful description of what the coefficients mean in context.
A tidy summary of the model’s coefficients, including 90% confidence interval for model estimates.
The model’s R-squared, residual standard error, and the number of observations to which the model was fit.
Here, you’ll need to do three things.
Again, augment
would be very helpful here.
Your first paragraph in this response should provide a clear restatement of your research question, and then a clear and appropriate response to your research question, motivated by your results. This should include a statement comparing the categories on the mean of the outcome you modeled.
Then, in your second and final paragraph in this section, provide a brief description of the limitations of this Analysis, Be specific about your concerns.
In this section you will build a linear regression model to predict your outcome using one of your two quantitative predictors (it’s your choice) and the state
(which should be a factor in your model with Ohio as the baseline category.)
Here, you should state your research question clearly. A good research question (and your response to it) is the most important part of this analysis.
A dull but moderately effective and minimally appropriate research questions in this setting would be:
How well does [predictor] predict [outcome] after accounting for differences between states?
Provide an attractive, well-labeled scatterplot of your outcome and your quantitative predictor, and incorporate some way of providing separate results by state. You can do this in several ways in R, including through the use of facets.
Fit your model to use your predictors to predict your outcome (applying your selected transformation) and provide the code you used, and the following summary elements in this section.
A written statement of the full prediction equation, with coefficients nicely rounded, and a careful description of what the coefficients mean for the intercept, the quantitative predictor and one of the states, in context.
A tidy summary of the model’s coefficients, including 90% confidence interval for model estimates. Be sure that Ohio is used as the baseline state.
The model’s R-squared, residual standard error, the number of observations to which the model was fit.
Do what you did in Analysis 1.
Your first paragraph in this response should provide a clear restatement of your research question, and then a clear and appropriate response to your research question, motivated by your results.
Then, write a paragraph to summarize the key limitations of your work, as you did in the previous analyses.
Note that there is a checklist for the final report as part of the Examples & Hints page that includes many things we’ll be looking for in grading. It’s well worth your time to review the checklist before submitting your work.