Using NHANES Data

If you select the NHANES option for Study 2, then you will be using data from the National Health and Nutrition Examination Survey.

If you decide to use some other data set instead for Study 2, then you should visit this page.

About NHANES (from the NHANES website)

The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NHANES is a major program of the National Center for Health Statistics (NCHS). NCHS is part of the Centers for Disease Control and Prevention (CDC) and has the responsibility for producing vital and health statistics for the Nation.

The NHANES program began in the early 1960s and has been conducted as a series of surveys focusing on different population groups or health topics. In 1999, the survey became a continuous program that has a changing focus on a variety of health and nutrition measurements to meet emerging needs. The survey examines a nationally representative sample of about 5,000 persons each year. These persons are located in counties across the country, 15 of which are visited each year.

The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.

Findings from this survey will be used to determine the prevalence of major diseases and risk factors for diseases. Information will be used to assess nutritional status and its association with health promotion and disease prevention. NHANES findings are also the basis for national standards for such measurements as height, weight, and blood pressure. Data from this survey will be used in epidemiological studies and health sciences research, which help develop sound public health policy, direct and design health programs and services, and expand the health knowledge for the Nation.

General Advice for NHANES: Learning About The Available Data

The links in this section go to the Survey Data and Documentation section of the NHANES website.

We strongly encourage the use of the 2017-18 NHANES data for Project B.
- This is the most recent public data (it was made available in June 2020).
- Dr. Love will consider projects using NHANES data from an earlier cycle only if relevant questions are not available in the 2017-18 cycle.
You are required to use variables taken from at least three different NHANES data sets. This must include the Demographics data set, in addition to two other data sets taken from at least one of the other four available data groups (Dietary, Examination, Laboratory and Questionnaire.)
- The Demographics data group should be part of all projects, and it contains a single data set.
- The Dietary data group includes 14 different data sets.
- The Examination data group includes 7 data sets.
- The Laboratory data group contains over 30 data sets.
- The Questionnaire data group also contains over 30 data sets.
You will need to use the nhanesA package in R to import and work with the available data.

Getting the NHANES data

Visit the NHANES website and identify the data you want to view.

For example, the Demographic Variables and Sample Weights for NHANES 2017-18 are described here.
Each NHANES data set is associated with a Doc File (which stands for Data Documentation, Codebook and Frequencies). For instance, here’s the one for Demographics in 2017-18. This file can be viewed online (it’s an HTML file) and it will tell you what variables are included in that data set.
Each NHANES data set is available as a SAS transport file. For example, it’s the DEMO_J file for Demographics in 2017-18, as you can see here.

Using the nhanesA package

Once you’ve selected the data sets from NHANES that you want to use in your project (remember that you need at least 3), the nhanesA package in R can be used to obtain them.

Here’s a little vignette (now a bit out of date) introducing nhanesA from Christopher Endres, who built the package. The key functions in the nhanesA package that I think you might use are those described in that vignette, but the main one is simply called nhanes.

An Example

For example, suppose we want to load the Blood Pressure data from the 2017-18 Examination files at NHANES (contained in the BPX_J data file) into a tibble called bp_data in R.

We would use the following code, which will take a few minutes to run.

library(nhanesA)
library(tidyverse)

bp_raw <- nhanes('BPX_J') %>% tibble()

saveRDS(bp_raw, "data/BPX_J.Rds")

Once you’ve downloaded the file once, you should save it as an R data frame, and then comment out the initial code you used to pull down the data in R. Then, when you rerun, it’ll be all set. Remember to create a data subfolder in your R Project directory for Study 2 first before you run this code.

So your final presentation in Project B should instead look like this, which will run much more quickly.

library(nhanesA)
library(tidyverse)

# pull in data from BPX_J from NHANES and save it

# bp_raw <- nhanes('BPX_J') %>% tibble()

# saveRDS(bp_raw, "data/BPX_J.Rds")

# Now that data are saved, I can just read in the tibble

bp_raw <- readRDS("data/BPX_J.Rds")

Merging NHANES files

You will need to include data from multiple tibbles (data sets) pulled down in your project. I suggest you first select only those variables you intend to use in your analytic data file from each individual tibble you have created. This should always include the SEQN variable in every tibble, since that is what you will use to match up responses across those tibbles.

To merge a demographics tibble called DEMO with a BPX tibble to create a tibble called NEW that contains the variables from both DEMO and BPX for all of the subjects contained in DEMO, I’d use a left_join, as follows.

NEW <- left_join(DEMO, BPX, by = "SEQN")

I’d then use another left_join to merge this NEW result with another tibble (say, the HDL_J tibble) and so on.

NEW2 <- left_join(NEW, HDL_J, by = "SEQN")

Then, when I was done merging and cleaning the data I would be sure to save that result as a new Rds file, just in case I needed it again.

Which variables / subjects should I use?

That’s up to you. Find variables of interest in the description files, and pull them out and see if they will work for you.

Focus on subjects who have a RIDSTATR value of 2 (meaning they were both interviewed and examined) - this variable is part of the Demographics file.
- For 2017-18, there are 8,704 such subjects.
I encourage you to filter your final set of variables to complete cases, and reflect on how many observations you wind up with
- In many cases, you should have well over 8000, but depending on what you select, you may have a much smaller subset, and you should be able to explain to us why that is the case, if it is.
- For example, if you’re studying something that is only measured in females, or in children, you’ll have a smaller sample for that reason, and you need to make that clear to us in your report.

Video to help with cleaning NHANES data

On 2020-11-17, Dr. Love built a video called video_nhanes_example.mp4 available on our shared Google Drive in the Project B folder.
- In the video, Dr. Love downloads data using nhanesA from three data sets in NHANES 2017-18, selects a few variables, merges the files, and then cleans up one of the variables to turn “Don’t Know” and “Refused” responses to missing values.
- Dr. Love made a mistake at the very end of the video when he suggested you ask questions via Canvas. Of course, he meant via Piazza.
The R Markdown file Dr. Love displayed and ran in the video is also available.

This page was last updated: 2020-12-06 13:42:31.

431 Footer