Final Report Checklist

Forty Things That Students Should Check in Their Project A Report

This list is not comprehensive. These aren’t the only things that Dr. Love and the TAs will be checking, and we won’t necessarily check all of them in every project. There is no 1:1 correspondence between your grade on the report (which is determined holistically) and the elements on this checklist. It is fair to say that your report will receive a better grade if these elements are complete.

The date of the project is automatically generated by R Markdown in the YAML using r Sys.Date() and appears in the format 2022-10-31 in the HTML.
The name(s) of the author(s) are clearly listed in the YAML and in the HTML.
Neither warning = FALSE nor include = FALSE nor eval = FALSE is used anywhere in the project.
All packages are loaded near the top of the document (as opposed to partway through) and message = FALSE is used to suppress the messages created when loading packages.
Code-folding and code download should be used, with this set to code_folding: show so that the code is visible (by default) in the HTML, as well as code-download: TRUE.
The option comment = NA is used in the R Markdown file - usually with knitr::opts_chunk$set(comment = NA) - to ensure that R results in the HTML are not preceded with ##.
There are no warnings in the HTML document.
The project uses read_csv to read in the data, and show_col_types = FALSE is used to suppress the message about column specifications there.
message = FALSE is used to suppress the message created when running something from the mosaic package (usually favstats) for the first time.
The headings and subheadings (so things like 1 and 1.1) of the entire Project A document are numbered automatically (use number_sections: TRUE in the YAML), appear in the table of contents, and contain no misspelled words.
The headings and first-level subheadings in the outline for Project A Analyses 1, 2, and 3 are used as they are shown in Professor Love’s sample project A.
Professor Love’s instructions are NOT repeated in your HTML.
The function summary() as applied to a tibble does not appear in your HTML.
The raw data (prior to filtering rows and selecting variables) is not printed or summarized or listed in your HTML.
The tibble is printed in its final form as part of the “Our Analytic Tibble” section, before the Codebook and Analyses.
The final tibble is saved to an Rds file before the Analyses and that file to which it is saved contains only letters and numbers and in particular no spaces in its name. Please submit the .Rds file to Canvas with your final report, unlike what we asked for in the Proposal.
The Hmisc::describe results (which follow the printing of the tibble) include all rows shown in the printed tibble, and the variable names shown there match those shown in the codebook and then used in the three Analyses (unless a transformation is used.)
There are no avoidable scrolling windows in the HTML document. Avoidable scrolling windows are those fixed by hitting ENTER more often in writing R code.
The count of counties matches up throughout the document until missingness is accounted for analytically (with either imputation or complete-case analysis).
The variable described in the codebook as the outcome is actually used as the outcome (perhaps after a transformation) in Analyses 1, 2 and 3.
The project demonstrates that all five variables of interest have been cleaned in accordance with the instructions specified here (for example, proportions are presented/interpreted as percentages.)
Analyses 1, 2, and 3 are done on complete cases for the outcome variables, and the number of rows used in each Analysis matches up with the previously described county counts, less the counties with missing values on the variables used in that analysis if complete cases are used, and imputation is appropriately performed if a decision is made to use it.
The three main functions from the broom package: tidy, glance and augment are used appropriately in the Analyses.
The predictor used in Analysis 1 was identified as a quantitative predictor in the codebook, and has at least 15 distinct values in the final tibble.
The predictor used in Analysis 2 was created as a categorical predictor correctly in the data development section, using the number of levels and names for those levels that are used in Analysis 2.
The order of levels for the predictor in Analysis 2 is appropriate (for binary predictors, any order is fine if it is clear, for an ordered multi-category variable, the order matters.)
The categorical predictor in Analysis 2 shows the same levels as are specified in the codebook.
If a multi-categorical predictor is used in Analysis 2, it is included as a factor. (If a binary predictor is used, then either a 1-0 or factor representation is OK.)
The choice of baseline category for the predictor in Analysis 2 is explicitly stated in the description of Analysis 2 variables and matches what the analysis actually generates.
Ohio is chosen as the baseline state for Analysis 3.
The quantitative predictor in Analysis 3 was identified as a quantitative predictor in the codebook.
In Analysis 1, there is a plot to compare the data without a transformation to the data with the chosen (or best alternative) transformation.
Transformation decisions are motivated by patterns shown in the visualizations, without reference to numerical summaries like R-square that are inappropriate for that task.
For each of the three Analyses, the research question is stated clearly, is grammatically correct, and ends with a question mark.
For each of the three Analyses, a clearly written answer to the research question is provided in the first paragraph of the Conclusions section, and is motivated by the analysis you did in a clear way. This is followed by a second paragraph that describes meaningful limitations of that Analysis. (Note that it’s not a good idea to suggest limitations that you could fix with the tools you have - instead, apply those tools and build a better Analysis.)
The project title contains no misspelled words and reflects what is actually studied in the Analyses (or at least in one of the Analyses.) Do not use the terms “Project” or “Project A” or “431” in your title. You can use “CHR-2022” as the abbreviation in your title for “County Health Rankings, 2022.” Keep your title to 80 characters. Subtitles are permitted if desired.
Your residual analysis / prediction analysis includes all things required in the outline, which includes but is not limited to:
- An appropriate set of residual plots (for Analyses 1 and 3) or plot (for Analysis 2) with a summary description of your conclusions in complete sentences.
- Your model’s predictions for Cuyahoga County, OH and a comparison of that prediction to the observed value for that county on the original scale.
- Identification of two counties (by name and state) where the model is least successful at predicting the outcome.
If the outcome is transformed, then all of the following are true.
- The coefficients of the regression model are interpreted correctly in terms of the transformed outcome. If both the outcome and a predictor are transformed, the interpretation of the relevant coefficient estimates describes change in the transformed outcome as a function of a change in the transformed predictor.
- The project describes what is being predicted as, for example, the log of (outcome as measured in units) rather than attributing units directly to log(outcome).
- When the model is used to predict the outcome for individual counties, this includes back-conversion to the original outcome scale.
- In Analysis 1, when you specify the Pearson correlation, that should be the correlation after transformation(s) so that the R-squared value reported for your model is the square of the Pearson correlation you report.
- If the outcome is transformed in Analysis 1, the same transformation is used in Analyses 2 and 3, or there is a meaningful explanation (accompanied by a plot) to explain why they switched. It’s 100% fine to use a transformation in one Analysis that is different than the transformation selected in another Analysis, but this should be justified with an appropriate plot and description of why the choice was made.
The session information is printed, in a section of its own that appears in the Table of Contents and is the last part of the document.
If you are working with a partner, one of you submits the Rmd, HTML and video files to Canvas, and the other partner submits a one-page note to Canvas stating who their partner is and that their partner will submit the materials for their project team. Everyone (including both members of each team) should fill out the self-evaluation form after the Rmd, HTML and video files are posted to Canvas.

Good luck!