431 Project B Instructions

Author

431 Staff

Published

2023-11-16

What is Project B?

Project B is the second of two real data science projects you’ll be doing this semester. It involves the completion of four tasks, which you’ll start working on in early November:

  1. You will complete a Registration Form to obtain my approval for your plan, let me know if you’re working with a partner, and schedule your oral presentation.
  2. You (and your partner, if applicable) will present your project sometime between 12-11 and 12-14 to Dr. Love in his office in person or via Zoom. The schedule is here.
  3. You will build Quarto and HTML reports describing your work.
  4. Finally, you will complete a Self-Evaluation form.

What’s on this Website

All of the material you need (from a statistical and coding perspective) to do Project B has been or will be covered in our first 24 classes (which includes classes through the end of November), as well as in the Course Notes and Labs 1-7.

Project B Deliverables

  1. You will complete a Registration Form to obtain my approval for your plan, let me know if you’re working with a partner, and schedule your oral presentation, by the (mid-November) deadline on the Course Calendar.
  2. You (and your partner, if applicable) will present your project to Dr. Love in his office. Details on the Oral Presentation are found in the Checklist menu above. Presentations will be scheduled on December 11-14 using the Registration Form. The schedule is now posted here.
  3. You will build two Quarto and HTML reports (separate reports for Study 1 and Study 2) by the final Project B deadline in the Course Calendar.
    • If you’re not using NHANES data, you’ll also submit your data to Dr. Love at that time.
  4. Finally, you will complete a Self-Evaluation form, as you did in Project A, by the final Project B deadline in the Course Calendar.

Partnerships?

You can work alone, or with one other person on this project. If you work as a pair, you will commit to that when you register for the project. Each of you will receive the team grade for the project reports, and an individual grade for the other components of the project.

The Data

You will work with the same data source for Study 1 and for Study 2, and these data will be developed either from NHANES or from another public source that you identify.

  • You will find detailed instructions regarding the use of NHANES data for Project B here.
  • If you want to use other data, you’ll need it to meet some specifications we’ll describe, and you’ll have to get Dr. Love’s permission when you register your project.
  • Since most people consider working with NHANES data to be easier, we award four extra points to projects which use non-NHANES data.

Study 1

  • Study 1 is about making descriptive and exploratory comparisons and summaries of data. It’s not about building sophisticated statistical models.
  • You will ingest, merge and clean the data in R, then select variables to complete any four out of five potential analyses, as described in these instructions.
    • You can do all five analyses if you like (as preparation for Quiz 2, for instance) but you will only present four in your report. No bonus credit for doing all five analyses.
  • Dr Love has developed Study 1 Report Specifications and a Study 1 Example Report which should guide your eventual submitted Study 1 report.

Study 2

  • Study 2 is about building a model and making predictions. You will complete all elements of a data science project designed to create a statistical model for a quantitative outcome, then use it for prediction, and assess the quality of those predictions.
  • Study 2 involves working with data from the same source that you used for Study 1. Again, you will work through all cleaning and data management requirements in your Study 2 report.
    • Study 2 involves the prediction of a quantitative outcome using a key predictor and some additional predictors in two linear regression models, and then comparing those two models.
    • All of the material you need (from a statistical and coding perspective) to do these analyses has been or will be covered in our first 24 classes and in the Course Notes.
  • Dr Love has developed Study 2 Report Specifications and a Study 2 Example Report which should guide your eventual submitted Study 2 report.

Grading

Project B will be graded by Dr. Love on a scale from 0-100.

  • On-time successful completion of the Registration Form is worth 5 points.
  • The two study reports (Study 1 and Study 2) due at the final Project B deadline are worth a combined 45 points.
  • The oral presentation is worth 40 points. Details on the Oral Presentation are found in the Checklist.
  • The self-evaluation is worth 10 points.
  • Late work on Project B is unacceptable. All deadlines are in the Course Calendar.

Dr. Love will provide no written feedback on your Project B work. The grading timeline is simply too tight on my end. I apologize in advance.

Questions?

If you have questions, let us know about them on Campuswire using the projectB folder, or speak with Dr. Love before or after class, or discuss them with the TAs during office hours.