Required Study 1 Analyses

Most Recent Update: 2022-12-15 12:21:10.

In your four analyses (chosen from five possibilities), you will be doing:

Descriptive and exploratory summaries of the data across the groups for each of your chosen outcomes, including, of course, attractive and well-constructed visualizations, graphs and tables.
Comparisons of the population mean difference for at least one quantitative outcome across a set of two (or three) groups, including appropriate demonstrations of the reasons behind the choices you made between parametric, non-parametric and bootstrap procedures.
Comparisons of population proportions for categorical outcomes across exposure groups, including appropriately interpreted point estimates and confidence intervals, where available.

Data Management

All data merging and cleaning must be included in your R Markdown report, starting from the raw data obtained either through the nhanesA package or via reading in your non-NHANES raw data, so that we can replicate your work.
You can use the data as they were collected (quantitative, binary or multi-categorical), but you can also create categorical variables from the quantitative variables provided, should that be of interest in one or more of your analyses.
- Should you decide a categorical variable from a quantitative one, be sure to describe that process carefully, and demonstrate that each level of your created categorical variable contains the minimum number of observations specified on the Data Development page.

Analyses You’ll Do

You will complete any four of the following five Analyses, and present these results in your Study 1 Report.

For each of these analyses, you will provide complete code (including whatever you did to clean the raw data for the variables you are studying), appropriate visualizations and detailed explanations of your analytic decisions, and conclusions. Use a 90% confidence level for all Study 1 work, please.

Again, if you are using NHANES data, you will need between 500 and 8,750 observations with a minimum of 500 observations containing complete data on all of the variables you will use in Study 1.
If you are using any other data source, you will need between 250 and 10,000 observations, and at least 250 with complete data on all variables you will use in Study 1.

Each analysis should be self-contained (so that I don’t have to read Analysis A first to understand Analysis C, for example). Present each new analysis as a subsection with an appropriate heading in the table of contents.

Analysis A. Compare two means/medians using paired samples

Here, you will need to identify two quantitative variables (outcomes) which are paired (so that they have a natural link between them, and use the same units of measurement.) You’ll analyze the results and build a confidence interval for the population mean difference with an appropriate t-based or bootstrap procedure. Again, we require that all variables treated as quantitative in Study 1 (Analyses A, B, or C) contain at least 15 unique values.

Analysis B. Compare two means/medians using independent samples

Here, you will need to identify one quantitative (outcome) and one categorical variable (binary - 2 levels.) You’ll analyze the results and build a confidence interval for the difference in means with an appropriate t-based or bootstrap procedure. Note that it’s generally easier to find independent samples comparisons than paired samples comparisons in most of the data I expect you’ll be using. This would require a quantitative outcome (with at least 15 unique values), and a binary categorical variable which divides the data into two subgroups, so that each subgroup has a minimum of 30 observations.

Analysis C. Compare 3-6 means/medians using independent samples

Here, you will need to identify one quantitative (outcome) and one categorical variable (multi-categorical with 3-6 levels.) Here, you should be thinking about an analysis of variance with pre-planned Tukey HSD pairwise comparisons. This would require a quantitative outcome (with at least 15 unique values), and a multi-categorical variable with 3-6 categories which divides the data into subgroups, so that each subgroup has a minimum of 30 observations.

Analysis D. Create and analyze a \(2 \times 2\) table

Here, you will need to identify two categorical (binary) variables. Each cell of the resulting 2 x 2 table should contain a minimum of 30 subjects. You should be focused on the relative risk, odds ratio and risk difference comparisons.

Analysis E. Create and analyze a \(J \times K\) table, where \(2 \leq J \leq 5\) and \(3 \leq K \leq 5\)

Here, you will need to identify two categorical variables, at least one of which should contain 3-5 levels, while the other contains 2-5 levels. Each cell in the cross-tabulation of the two variables within the table should have a minimum of 15 observations. Here, you should be providing the results of an appropriate chi-square test, accompanied by a useful visualization and description of the nature of the observed association.