This page was last updated: 2021-10-25 09:36:55.

What is on this Website

Project A Overview

Project A Objectives

On the remainder of this page, you’ll find a description of the educational objectives for this project and for projects in this course, generally.

It is hard to learn statistics (or anything else) passively; concurrent theory and application are essential1.

Project A is about building linear models and visualizing data from a (fairly clean) data set I provide to you. In Project A, you will complete most of the elements of a data science project designed to create a statistical model for a quantitative outcome, then use it for prediction, and assess the quality of those predictions. Tools necessary for Project A include:

In Project A’s analysis stage, everyone will be working with different parts of the same data set.

Think of a graph as a comparison. All graphs are comparisons (indeed, all statistical analyses are comparisons). If you already have the graph in mind, think of what comparisons it’s enabling. Or if you haven’t settled on the graph yet, think of what comparisons you’d like to make. Andrew Gelman

Why Two Projects?

The main reason is that I can’t figure out a way to get you to think about all of the things I hope you’ll learn from this course in a single Project. Another important reason is that I want you to be able to make mistakes during the semester without worrying about it too much, and having two projects spreads out this learning a bit.

  1. I set different tasks for Project A and for Project B, allowing us to touch on a wider fraction of the things I hope you’ll learn in 431.
  2. I give more guidance (including a sample Project) in Project A than in Project B.
  3. I have to evaluate each of your projects, and there are many students in the class. Knowing at least one of the data sets you’ll be working with helps me manage this.
  4. Having a broad range of activities to evaluate helps reduce the cost of a mistake on any one of them, so that we can build on what you do well.
  5. All of Project A can be done using materials discussed in classes 1-17.

Educational Objectives

“Statistics has no reason for existence except as the catalyst for investigation and discovery.” George E. P. Box

I am primarily interested in your learning something interesting, useful and even valuable from your project experiences in 431. In particular, an effective Project A will demonstrate:

  1. The ability to create and formulate research questions that are statistically and scientifically appropriate.
  2. The ability to turn research questions into measures of interest.
  3. The ability to pull and merge and clean and tidy data, then present the data set following Jeff Leek’s guide to sharing data with a statistician.
  4. The ability to build a reasonable (if simplistic) linear model, assess the quality of the model, and use it to make predictions.
  5. The ability to identify and (with help) solve problems that crop up
  6. The ability to comment on your work within code, and in written and oral presentation.
  7. The ability to build a Markdown-based report and to give a short presentation based on key findings from that report.

  1. Though by no means an original idea, this particular phrasing is stolen from Harry Roberts.↩︎

431 Footer