3  Summarizing Penguins

We will again use the data contained in the palmerpenguins package in this chapter. Here, we present a few of the more appealing ways to obtain numerical and graphical summaries, without much explanation. We’ll discuss these issues further in the rest of Part A of these Course Notes.

3.1 Setup: Packages Used Here

Here, we’ll add several new packages to allow us to display some additional summaries, and present our tables and plots in different ways.

We will also use functions from the mosaic and Hmisc packages here, though I won’t load them into our session at this time.

3.2 Our Data Set

Let’s look again at the penguins data contained in the palmerpenguins package.

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

3.3 Numerical Summaries for a Tibble

Note that in this work, I sometimes don’t explain all of the numerical summaries provided. Some of that discussion is postponed to Chapter 7.

3.3.1 Using summary()

We have several ways to obtain useful summaries of all variables in the penguins data.

penguins |>
  summary()
      species          island    bill_length_mm  bill_depth_mm  
 Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
 Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
 Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
                                 Mean   :43.92   Mean   :17.15  
                                 3rd Qu.:48.50   3rd Qu.:18.70  
                                 Max.   :59.60   Max.   :21.50  
                                 NA's   :2       NA's   :2      
 flipper_length_mm  body_mass_g       sex           year     
 Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
 1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
 Median :197.0     Median :4050   NA's  : 11   Median :2008  
 Mean   :200.9     Mean   :4202                Mean   :2008  
 3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
 Max.   :231.0     Max.   :6300                Max.   :2009  
 NA's   :2         NA's   :2                                 

3.3.2 Using inspect() from mosaic

Some people like the inspect() function from the mosaic package.

penguins |> 
  mosaic::inspect()

categorical variables:  
     name  class levels   n missing
1 species factor      3 344       0
2  island factor      3 344       0
3     sex factor      2 333      11
                                   distribution
1 Adelie (44.2%), Gentoo (36%) ...             
2 Biscoe (48.8%), Dream (36%) ...              
3 male (50.5%), female (49.5%)                 

quantitative variables:  
               name   class    min       Q1  median     Q3    max       mean
1    bill_length_mm numeric   32.1   39.225   44.45   48.5   59.6   43.92193
2     bill_depth_mm numeric   13.1   15.600   17.30   18.7   21.5   17.15117
3 flipper_length_mm integer  172.0  190.000  197.00  213.0  231.0  200.91520
4       body_mass_g integer 2700.0 3550.000 4050.00 4750.0 6300.0 4201.75439
5              year integer 2007.0 2007.000 2008.00 2009.0 2009.0 2008.02907
           sd   n missing
1   5.4595837 342       2
2   1.9747932 342       2
3  14.0617137 342       2
4 801.9545357 342       2
5   0.8183559 344       0

Daniel Kaplan’s Statistical Modeling, 2nd edition provides an entire course which coordinates nicely with the tools available in the mosaic package. In our course, we’ll most often use this inspect() tool, and a related tool called favstats.

3.3.3 Using favstats() from mosaic.

The favstats function lets us look at some common summaries for a single variable, or for one variable divided into groups by another. We’ll also return to this approach in Chapter 7.

mosaic::favstats(~ bill_length_mm, data = penguins) |>
  kbl() |>
  kable_styling()
min Q1 median Q3 max mean sd n missing
32.1 39.225 44.45 48.5 59.6 43.92193 5.459584 342 2
mosaic::favstats(bill_length_mm ~ species, data = penguins) |>
  kbl() |>
  kable_styling()
species min Q1 median Q3 max mean sd n missing
Adelie 32.1 36.75 38.80 40.750 46.0 38.79139 2.663405 151 1
Chinstrap 40.9 46.35 49.55 51.075 58.0 48.83382 3.339256 68 0
Gentoo 40.9 45.30 47.30 49.550 59.6 47.50488 3.081857 123 1

3.3.4 Using describe() from psych

We can use the describe() function from the psych package to get some additional summaries, if we’re interested, and here we also demonstrate the use of the kbl() and kable_styling() functions from the kableExtra package to make the table look appealing in HTML. More on the use of the kableExtra package is available here. We’ll also return to this approach in Chapter 7.

penguins |>
  psych::describe() |>
  kbl() |>
  kable_styling()
vars n mean sd median trimmed mad min max range skew kurtosis se
species* 1 344 1.918605 0.8933198 2.00 1.898551 1.48260 1.0 3.0 2.0 0.1591315 -1.7318773 0.0481646
island* 2 344 1.662791 0.7261940 2.00 1.579710 1.48260 1.0 3.0 2.0 0.6086049 -0.9064333 0.0391538
bill_length_mm 3 342 43.921930 5.4595837 44.45 43.906934 7.04235 32.1 59.6 27.5 0.0526530 -0.8931397 0.2952205
bill_depth_mm 4 342 17.151170 1.9747932 17.30 17.172628 2.22390 13.1 21.5 8.4 -0.1422086 -0.9233523 0.1067846
flipper_length_mm 5 342 200.915205 14.0617137 197.00 200.335766 16.30860 172.0 231.0 59.0 0.3426554 -0.9991866 0.7603704
body_mass_g 6 342 4201.754386 801.9545357 4050.00 4154.014598 889.56000 2700.0 6300.0 3600.0 0.4662117 -0.7395200 43.3647348
sex* 7 333 1.504504 0.5007321 2.00 1.505618 0.00000 1.0 2.0 1.0 -0.0179376 -2.0056743 0.0274400
year 8 344 2008.029070 0.8183559 2008.00 2008.036232 1.48260 2007.0 2009.0 2.0 -0.0532601 -1.5092478 0.0441228

3.3.5 Using describe() from Hmisc

One approach Frank Harrell has developed that I find helpful is the describe() function within his Hmisc package, which produces these results. We’ll also return to this approach in Chapter 7.

penguins |> 
  Hmisc::describe() 
penguins 

 8  Variables      344  Observations
--------------------------------------------------------------------------------
species 
       n  missing distinct 
     344        0        3 
                                        
Value         Adelie Chinstrap    Gentoo
Frequency        152        68       124
Proportion     0.442     0.198     0.360
--------------------------------------------------------------------------------
island 
       n  missing distinct 
     344        0        3 
                                        
Value         Biscoe     Dream Torgersen
Frequency        168       124        52
Proportion     0.488     0.360     0.151
--------------------------------------------------------------------------------
bill_length_mm 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
     342        2      164        1    43.92    6.274    35.70    36.60 
     .25      .50      .75      .90      .95 
   39.23    44.45    48.50    50.80    51.99 

lowest : 32.1 33.1 33.5 34   34.1, highest: 55.1 55.8 55.9 58   59.6
--------------------------------------------------------------------------------
bill_depth_mm 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
     342        2       80        1    17.15    2.267     13.9     14.3 
     .25      .50      .75      .90      .95 
    15.6     17.3     18.7     19.5     20.0 

lowest : 13.1 13.2 13.3 13.4 13.5, highest: 20.7 20.8 21.1 21.2 21.5
--------------------------------------------------------------------------------
flipper_length_mm 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
     342        2       55    0.999    200.9    16.03    181.0    185.0 
     .25      .50      .75      .90      .95 
   190.0    197.0    213.0    220.9    225.0 

lowest : 172 174 176 178 179, highest: 226 228 229 230 231
--------------------------------------------------------------------------------
body_mass_g 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
     342        2       94        1     4202    911.8     3150     3300 
     .25      .50      .75      .90      .95 
    3550     4050     4750     5400     5650 

lowest : 2700 2850 2900 2925 2975, highest: 5850 5950 6000 6050 6300
--------------------------------------------------------------------------------
sex 
       n  missing distinct 
     333       11        2 
                        
Value      female   male
Frequency     165    168
Proportion  0.495  0.505
--------------------------------------------------------------------------------
year 
       n  missing distinct     Info     Mean      Gmd 
     344        0        3    0.888     2008   0.8919 
                            
Value       2007  2008  2009
Frequency    110   114   120
Proportion 0.320 0.331 0.349

For the frequency table, variable is rounded to the nearest 0
--------------------------------------------------------------------------------

3.3.6 Using tbl_summary() from gtsummary

If you want to produce results which look like you might expect to see in a published paper, the tbl_summary() function from the gtsummary package has many nice features.

penguins |> 
  tbl_summary()
Characteristic N = 3441
species
    Adelie 152 (44%)
    Chinstrap 68 (20%)
    Gentoo 124 (36%)
island
    Biscoe 168 (49%)
    Dream 124 (36%)
    Torgersen 52 (15%)
bill_length_mm 44.5 (39.2, 48.5)
    Unknown 2
bill_depth_mm 17.30 (15.60, 18.70)
    Unknown 2
flipper_length_mm 197 (190, 213)
    Unknown 2
body_mass_g 4,050 (3,550, 4,750)
    Unknown 2
sex
    female 165 (50%)
    male 168 (50%)
    Unknown 11
year
    2007 110 (32%)
    2008 114 (33%)
    2009 120 (35%)
1 n (%); Median (IQR)

A vignette explaining the use of the gtsummary package is available here. We’ll also return to this approach in Chapter 7.

3.3.6.1 Using descr from summarytools

The descr() function from the summarytools package can also be used to provide numerical descriptions of all of the numerical variables contained within a tibble.

penguins |> 
  descr(stats = "common" )
Non-numerical variable(s) ignored: species, island, sex
Descriptive Statistics  
penguins  
N: 344  

                  bill_depth_mm   bill_length_mm   body_mass_g   flipper_length_mm      year
--------------- --------------- ---------------- ------------- ------------------- ---------
           Mean           17.15            43.92       4201.75              200.92   2008.03
        Std.Dev            1.97             5.46        801.95               14.06      0.82
            Min           13.10            32.10       2700.00              172.00   2007.00
         Median           17.30            44.45       4050.00              197.00   2008.00
            Max           21.50            59.60       6300.00              231.00   2009.00
        N.Valid          342.00           342.00        342.00              342.00    344.00
      Pct.Valid           99.42            99.42         99.42               99.42    100.00

An introduction to the summarytools package is available here, and illustrates some other ways to modify this output to suit your needs. We’ll also return to this approach in Chapter 7.

3.3.7 dfSummary() from summarytools

The dfSummary() function from the summarytools package can be used to provide some additional descriptions of all variables within a tibble. You’ll find more information about these numerical descriptions in Chapter 7.

dfSummary(penguins, 
          plain.ascii  = FALSE, 
          style        = "grid", 
          graph.magnif = 0.75, 
          valid.col    = FALSE)
### Data Frame Summary  
#### penguins  
**Dimensions:** 344 x 8  
**Duplicates:** 0  

+----+--------------------+---------------------------+---------------------+------------------------+---------+
| No | Variable           | Stats / Values            | Freqs (% of Valid)  | Graph                  | Missing |
+====+====================+===========================+=====================+========================+=========+
| 1  | species\           | 1\. Adelie\               | 152 (44.2%)\        | IIIIIIII \             | 0\      |
|    | [factor]           | 2\. Chinstrap\            | 68 (19.8%)\         | III \                  | (0.0%)  |
|    |                    | 3\. Gentoo                | 124 (36.0%)         | IIIIIII                |         |
+----+--------------------+---------------------------+---------------------+------------------------+---------+
| 2  | island\            | 1\. Biscoe\               | 168 (48.8%)\        | IIIIIIIII \            | 0\      |
|    | [factor]           | 2\. Dream\                | 124 (36.0%)\        | IIIIIII \              | (0.0%)  |
|    |                    | 3\. Torgersen             | 52 (15.1%)          | III                    |         |
+----+--------------------+---------------------------+---------------------+------------------------+---------+
| 3  | bill_length_mm\    | Mean (sd) : 43.9 (5.5)\   | 164 distinct values | \ \ \ \ . \ \ \ \ . :\ | 2\      |
|    | [numeric]          | min < med < max:\         |                     | \ \ . : : : : :\       | (0.6%)  |
|    |                    | 32.1 < 44.5 < 59.6\       |                     | \ \ : : : : : :\       |         |
|    |                    | IQR (CV) : 9.3 (0.1)      |                     | \ \ : : : : : : .\     |         |
|    |                    |                           |                     | : : : : : : : : .      |         |
+----+--------------------+---------------------------+---------------------+------------------------+---------+
| 4  | bill_depth_mm\     | Mean (sd) : 17.2 (2)\     | 80 distinct values  | \ \ \ \ \ \ \ \ \ \ :\ | 2\      |
|    | [numeric]          | min < med < max:\         |                     | \ \ \ \ \ \ \ \ : :\   | (0.6%)  |
|    |                    | 13.1 < 17.3 < 21.5\       |                     | \ \ : . : : : .\       |         |
|    |                    | IQR (CV) : 3.1 (0.1)      |                     | . : : : : : :\         |         |
|    |                    |                           |                     | : : : : : : : . .      |         |
+----+--------------------+---------------------------+---------------------+------------------------+---------+
| 5  | flipper_length_mm\ | Mean (sd) : 200.9 (14.1)\ | 55 distinct values  | \ \ \ \ \ \ :\         | 2\      |
|    | [integer]          | min < med < max:\         |                     | \ \ \ \ . :\           | (0.6%)  |
|    |                    | 172 < 197 < 231\          |                     | \ \ \ \ : : : \ \ . .\ |         |
|    |                    | IQR (CV) : 23 (0.1)       |                     | \ \ . : : : \ \ : : :\ |         |
|    |                    |                           |                     | \ \ : : : : : : : : :  |         |
+----+--------------------+---------------------------+---------------------+------------------------+---------+
| 6  | body_mass_g\       | Mean (sd) : 4201.8 (802)\ | 94 distinct values  | \ \ \ \ :\             | 2\      |
|    | [integer]          | min < med < max:\         |                     | \ \ . :\               | (0.6%)  |
|    |                    | 2700 < 4050 < 6300\       |                     | \ \ : : : :\           |         |
|    |                    | IQR (CV) : 1200 (0.2)     |                     | \ \ : : : : : .\       |         |
|    |                    |                           |                     | . : : : : : :          |         |
+----+--------------------+---------------------------+---------------------+------------------------+---------+
| 7  | sex\               | 1\. female\               | 165 (49.5%)\        | IIIIIIIII \            | 11\     |
|    | [factor]           | 2\. male                  | 168 (50.5%)         | IIIIIIIIII             | (3.2%)  |
+----+--------------------+---------------------------+---------------------+------------------------+---------+
| 8  | year\              | Mean (sd) : 2008 (0.8)\   | 2007 : 110 (32.0%)\ | IIIIII \               | 0\      |
|    | [integer]          | min < med < max:\         | 2008 : 114 (33.1%)\ | IIIIII \               | (0.0%)  |
|    |                    | 2007 < 2008 < 2009\       | 2009 : 120 (34.9%)  | IIIIII                 |         |
|    |                    | IQR (CV) : 2 (0)          |                     |                        |         |
+----+--------------------+---------------------------+---------------------+------------------------+---------+

3.3.8 Visualizing with visdat functions

The vis_dat() function from the visdat package shows something about the types of variables, providing visual clues about what’s inside. The picture below identifies variables types, and missing values.

vis_dat(penguins)

We can explore the missing data further using the vis_miss function.

vis_miss(penguins)

A vignette explaining the use of the visdat package is available here.

3.4 Histograms for a Variable

The most common tool we use in producing a graphical summary of a variable, like the penguin’s flipper length, is a histogram. Here’s one option.

ggplot(data = penguins, aes(x = flipper_length_mm)) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 2 rows containing non-finite values (`stat_bin()`).

This approach produces two messages that alert us to potential concerns, and a fairly unattractive plot.

This time, we’ll first exclude the two penguins without a measured flipper length, and then set the binwidth to be 10. How well does that work?

penguins2 <- 
  penguins |>
  filter(complete.cases(flipper_length_mm))

ggplot(data = penguins2, aes(x = flipper_length_mm)) +
  geom_histogram(binwidth = 10)

Now we’ve eliminated the messages, but it would be nice to have some more granularity in the bars (so we’d like a smaller binwidth) and I’d also like to make the bars more clearly separated with colors. I’d also like to add a title. Like this:

ggplot(data = penguins2, aes(x = flipper_length_mm)) +
  geom_histogram(binwidth = 5, fill = "orange", col = "navy") +
  labs(title = "Distribution of Flipper Length in 342 Palmer Penguins")

There are some other options for creating a graphical summary of a variable’s distribution. For example, we might consider a density plot, as well as a rug plot along the horizontal (X) axis:

ggplot(data = penguins2, aes(x = flipper_length_mm)) +
  geom_density(col = "navy") +
  geom_rug(col = "red") +
  labs(title = "Density and Rug Plot of Flipper Length in 342 Palmer Penguins")

Or perhaps a dotplot would provide a useful look…

ggplot(data = penguins2, aes(x = flipper_length_mm)) +
  geom_dotplot(binwidth = 1, fill = "orange", col = "navy") +
  labs(title = "Dot Plot of Flipper Length in 342 Palmer Penguins")

We’ll learn about several other approaches to summarizing the distribution of a variable graphically later in the course.

3.5 Comparing Penguins by Species Numerically

We have data from three different species of penguin. Can we compare their flipper lengths numerically, perhaps by calculating the mean flipper length within each species?

penguins |>
  group_by(species) |>
  summarise(mean(flipper_length_mm))
# A tibble: 3 × 2
  species   `mean(flipper_length_mm)`
  <fct>                         <dbl>
1 Adelie                          NA 
2 Chinstrap                      196.
3 Gentoo                          NA 

Well, that’s a problem. Looks like we have some missing values. Can we fix that, and also provide some additional summaries, like the sample size (n) and the median and standard deviation within each species? While we’re at it, can we make it prettier, with kbl() and kable_styling()?

penguins |>
  filter(complete.cases(species, flipper_length_mm)) |>
  group_by(species) |>
  summarise(n = n(), 
            mean = mean(flipper_length_mm), 
            sd = sd(flipper_length_mm), 
            median = median(flipper_length_mm)) |>
  kbl() |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
species n mean sd median
Adelie 151 189.9536 6.539457 190
Chinstrap 68 195.8235 7.131894 196
Gentoo 123 217.1870 6.484976 216

3.6 Using favstats() from the mosaic package

As we noted previously, we can also use favstats() from the mosaic package to help us look at the results for a single variable, split into groups by another, like this:

mosaic::favstats(bill_length_mm ~ species, data = penguins) |>
  kbl() |>
  kable_styling()
species min Q1 median Q3 max mean sd n missing
Adelie 32.1 36.75 38.80 40.750 46.0 38.79139 2.663405 151 1
Chinstrap 40.9 46.35 49.55 51.075 58.0 48.83382 3.339256 68 0
Gentoo 40.9 45.30 47.30 49.550 59.6 47.50488 3.081857 123 1

One advantage of this approach is that (as you’ll note) it handles the missing data in the way we’d probably expect, by restricting the summaries to the complete cases.

3.7 Using tbl_summary() to summarize the tibble

The tbl_summary() function from the gtsummary package can also do the job of summarizing all of the other variables in the tibble, broken down by species, very nicely.

penguins |> 
  tbl_summary(by = species)
Characteristic Adelie, N = 1521 Chinstrap, N = 681 Gentoo, N = 1241
island


    Biscoe 44 (29%) 0 (0%) 124 (100%)
    Dream 56 (37%) 68 (100%) 0 (0%)
    Torgersen 52 (34%) 0 (0%) 0 (0%)
bill_length_mm 38.8 (36.8, 40.8) 49.6 (46.4, 51.1) 47.3 (45.3, 49.6)
    Unknown 1 0 1
bill_depth_mm 18.40 (17.50, 19.00) 18.45 (17.50, 19.40) 15.00 (14.20, 15.70)
    Unknown 1 0 1
flipper_length_mm 190 (186, 195) 196 (191, 201) 216 (212, 221)
    Unknown 1 0 1
body_mass_g 3,700 (3,350, 4,000) 3,700 (3,488, 3,950) 5,000 (4,700, 5,500)
    Unknown 1 0 1
sex


    female 73 (50%) 34 (50%) 58 (49%)
    male 73 (50%) 34 (50%) 61 (51%)
    Unknown 6 0 5
year


    2007 50 (33%) 26 (38%) 34 (27%)
    2008 50 (33%) 18 (26%) 46 (37%)
    2009 52 (34%) 24 (35%) 44 (35%)
1 n (%); Median (IQR)

3.8 Comparing Penguins by Species Graphically

3.8.1 Faceting Histograms with facet_wrap()

We could compare the distributions of the flipper lengths across the three species, by creating a set of faceted histograms, like so…

penguins3 <- 
  penguins |>
  filter(complete.cases(flipper_length_mm, species))

ggplot(data = penguins3, aes(x = flipper_length_mm, fill = species)) +
  geom_histogram(binwidth = 5, col = "white") +
  facet_wrap(~ species) +
  labs(title = "Distribution of Flipper Length in Palmer Penguins, by Species")

We might add in the command

  guides(fill = "none") +

to eliminate the redundant legend on the right-hand side of the plot.

3.8.2 Using facet_grid() instead

The facet_wrap() approach has created three histograms, spread horizontally. Alternatively, we could plot the species vertically using facet_grid(), which clearly shows which species produces the penguins with the larger flipper lengths, especially if we reduce the width of the bins a bit.

ggplot(data = penguins3, aes(x = flipper_length_mm, fill = species)) +
  geom_histogram(binwidth = 2, col = "white") +
  facet_grid(species ~ .) +
  guides(fill = "none") +
  labs(title = "Distribution of Flipper Length in Palmer Penguins, by Species")

We’ll use facets like this all the time in what follows.

3.8.3 Boxplots

Another very common tool we’ll use for looking simultaneously at the distributions of a variable across two or more categories is a boxplot. More on this later, but here’s one example of what this might look like.

ggplot(data = penguins3, aes(x = flipper_length_mm, y = species, 
                             fill = species)) +
  geom_boxplot() +
  guides(fill = "none") +
  labs(title = "Distribution of Flipper Length in Palmer Penguins, by Species")

3.8.4 Adding Violins

And here’s a somewhat fancier version, including a violin plot, and with the coordinates flipped so the plots are shown vertically rather than horizontally.

ggplot(data = penguins3, aes(x = flipper_length_mm, y = species)) +
  geom_violin(aes(col = species)) +
  geom_boxplot(aes(fill = species), width = 0.3) +
  guides(col = "none", fill = "none") +
  coord_flip() +
  labs(title = "Augmented Boxplot of Flipper Length in Penguins by Species")

3.8.5 Letter-Value Plots (Boxplots for Large Data)

We might also consider a letter-value plot, using the geom_lv() function from the lvplot package in R, although I rarely use such a plot unless I have at least 1000 observations to work with.

ggplot(data = penguins3, aes(x = species, y = flipper_length_mm)) +
  geom_lv(aes(fill=..LV..)) + scale_fill_brewer() +
  labs(title = "Letter-Value Plot of Flipper Length in Penguins by Species")
Warning: The dot-dot notation (`..LV..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(LV)` instead.

3.9 Coming Up

You’re probably tiring of the penguins now. Next, we’ll look at some data on people, taken from the National Health and Nutrition Examination Survey, or NHANES.