If you select the NHANES option for Study 2, then you will be using data from the National Health and Nutrition Examination Survey.

If you decide to use some other data set instead for Study 2, then you should visit this page.

About NHANES (from the NHANES website)

The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NHANES is a major program of the National Center for Health Statistics (NCHS). NCHS is part of the Centers for Disease Control and Prevention (CDC) and has the responsibility for producing vital and health statistics for the Nation.

The NHANES program began in the early 1960s and has been conducted as a series of surveys focusing on different population groups or health topics. In 1999, the survey became a continuous program that has a changing focus on a variety of health and nutrition measurements to meet emerging needs. The survey examines a nationally representative sample of about 5,000 persons each year. These persons are located in counties across the country, 15 of which are visited each year.

The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.

Findings from this survey will be used to determine the prevalence of major diseases and risk factors for diseases. Information will be used to assess nutritional status and its association with health promotion and disease prevention. NHANES findings are also the basis for national standards for such measurements as height, weight, and blood pressure. Data from this survey will be used in epidemiological studies and health sciences research, which help develop sound public health policy, direct and design health programs and services, and expand the health knowledge for the Nation.

General Advice for NHANES: Learning About The Available Data

The links in this section go to the Survey Data and Documentation section of the NHANES website.

  1. We strongly encourage the use of the 2017-18 NHANES data for Project B.
    • This is the most recent public data (it was made available in June 2020).
    • Dr. Love will consider projects using NHANES data from an earlier cycle only if relevant questions are not available in the 2017-18 cycle.
  2. You are required to use variables taken from at least three different NHANES data sets. This must include the Demographics data set, in addition to two other data sets taken from at least one of the other four available data groups (Dietary, Examination, Laboratory and Questionnaire.)
  3. You will need to use the nhanesA package in R to import and work with the available data.

Getting the NHANES data

Visit the NHANES website and identify the data you want to view.

Using the nhanesA package

Once you’ve selected the data sets from NHANES that you want to use in your project (remember that you need at least 3), the nhanesA package in R can be used to obtain them.

Here’s a little vignette (now a bit out of date) introducing nhanesA from Christopher Endres, who built the package. The key functions in the nhanesA package that I think you might use are those described in that vignette, but the main one is simply called nhanes.

An Example

For example, suppose we want to load the Blood Pressure data from the 2017-18 Examination files at NHANES (contained in the BPX_J data file) into a tibble called bp_data in R.

We would use the following code, which will take a few minutes to run.

library(nhanesA)
library(tidyverse)

bp_raw <- nhanes('BPX_J') %>% tibble()

saveRDS(bp_raw, "data/BPX_J.Rds")

Once you’ve downloaded the file once, you should save it as an R data frame, and then comment out the initial code you used to pull down the data in R. Then, when you rerun, it’ll be all set. Remember to create a data subfolder in your R Project directory for Study 2 first before you run this code.

So your final presentation in Project B should instead look like this, which will run much more quickly.

library(nhanesA)
library(tidyverse)

# pull in data from BPX_J from NHANES and save it

# bp_raw <- nhanes('BPX_J') %>% tibble()

# saveRDS(bp_raw, "data/BPX_J.Rds")

# Now that data are saved, I can just read in the tibble

bp_raw <- readRDS("data/BPX_J.Rds")

Merging NHANES files

You will need to include data from multiple tibbles (data sets) pulled down in your project. I suggest you first select only those variables you intend to use in your analytic data file from each individual tibble you have created. This should always include the SEQN variable in every tibble, since that is what you will use to match up responses across those tibbles.

To merge a demographics tibble called DEMO with a BPX tibble to create a tibble called NEW that contains the variables from both DEMO and BPX for all of the subjects contained in DEMO, I’d use a left_join, as follows.

NEW <- left_join(DEMO, BPX, by = "SEQN")

I’d then use another left_join to merge this NEW result with another tibble (say, the HDL_J tibble) and so on.

NEW2 <- left_join(NEW, HDL_J, by = "SEQN")

Then, when I was done merging and cleaning the data I would be sure to save that result as a new Rds file, just in case I needed it again.

Which variables / subjects should I use?

That’s up to you. Find variables of interest in the description files, and pull them out and see if they will work for you.

Video to help with cleaning NHANES data

This page was last updated: 2020-12-06 13:42:31.

431 Footer