Chapter 20 Confidence Intervals from Two Paired Samples of Quantitative Data

Here, we’ll consider the problem of estimating a confidence interval to describe the difference in population means (or medians) based on a comparison of two samples of quantitative data, gathered using a matched pairs design. Specifically, we’ll use as our example the Lead in the Blood of Children study, described in Section 17.

Recall that in that study, we measured blood lead content, in mg/dl, for 33 matched pairs of children, one of which was exposed (had a parent working in a battery factory) and the other of which was control (no parent in the battery factory, but matched to the exposed child by age, exposure to traffic and neighborhood). We then created a variable called leaddiff which contained the (exposed - control) differences within each pair.

# A tibble: 33 x 4
   pair  exposed control leaddiff
   <fct>   <int>   <int>    <int>
 1 P01        38      16       22
 2 P02        23      18        5
 3 P03        41      18       23
 4 P04        18      24       -6
 5 P05        37      19       18
 6 P06        36      11       25
 7 P07        23      10       13
 8 P08        62      15       47
 9 P09        31      16       15
10 P10        34      18       16
# ... with 23 more rows

20.1 t-based CI for Population Mean of Paired Differences, \(\mu_d\).

In R, there are at least three different methods for obtaining the t-based confidence interval for the population difference in means between paired samples. They are all mathematically identical. The key idea is to calculate the paired differences (exposed - control, for example) in each pair, and then treat the result as if it were a single sample and apply the methods discussed in Section 19.

20.1.1 Method 1

We can use the single-sample approach, applied to the variable containing the paired differences. Let’s build a 90% two-sided confidence interval for the population mean of the difference in blood lead content across all possible pairs of an exposed (parent works in a lead-based industry) and a control (parent does not) child, \(\mu_d\).


    One Sample t-test

data:  bloodlead$leaddiff
t = 6, df = 30, p-value = 2e-06
alternative hypothesis: true mean is not equal to 0
90 percent confidence interval:
 11.3 20.6
sample estimates:
mean of x 
       16 

The 90% confidence interval is (11.29, 20.65) according to this t-based procedure. An appropriate interpretation of the 90% two-sided confidence interval would be:

  • (11.29, 20.65) milligrams per deciliter is a 90% two-sided confidence interval for the population mean difference in blood lead content between exposed and control children.
  • Our point estimate for the true population difference in mean blood lead content is 15.97 mg.dl. The values in the interval (11.29, 20.65) mg/dl represent a reasonable range of estimates for the true population difference in mean blood lead content, and we are 90% confident that this method of creating a confidence interval will produce a result containing the true population mean difference.
  • Were we to draw 100 samples of 33 matched pairs from the population described by this sample, and use each such sample to produce a confidence interval in this manner, approximately 90 of those confidence intervals would cover the true population mean difference in blood lead content levels.

20.1.2 Method 2

Or, we can apply the single-sample approach to a calculated difference in blood lead content between the exposed and control groups. Here, we’ll get a 95% two-sided confidence interval for \(\mu_d\), instead of the 90% interval we obtained above.


    One Sample t-test

data:  bloodlead$exposed - bloodlead$control
t = 6, df = 30, p-value = 2e-06
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 10.3 21.6
sample estimates:
mean of x 
       16 

20.1.3 Method 3

Or, we can provide R with two separate samples (unaffected and affected) and specify that the samples are paired. Here, we’ll get a 99% one-sided confidence interval (lower bound) for \(\mu_d\), the population mean difference in blood lead content.


    Paired t-test

data:  bloodlead$exposed and bloodlead$control
t = 6, df = 30, p-value = 1e-06
alternative hypothesis: true difference in means is greater than 0
99 percent confidence interval:
 9.21  Inf
sample estimates:
mean of the differences 
                     16 

Again, the three different methods using t.test for paired samples will all produce identical results if we feed them the same confidence level and type of interval (two-sided, greater than or less than).

20.1.4 Assumptions

If we are building a confidence interval for the mean \(\mu\) of a population or process based on a sample of observations drawn from that population, then we must pay close attention to the assumptions of those procedures. The confidence interval procedure for the population mean \(\mu\) using the t distribution assumes that:

  1. We want to estimate the population mean \(\mu\).
  2. We have drawn a sample of observations at random from the population of interest.
  3. The sampled observations are drawn from the population independently and have identical distributions.
  4. The population follows a Normal distribution. At the very least, the sample itself is approximately Normal.

20.1.5 Using broom for a t test using paired samples

The broom package places the results of a t test, among other things, into a tidy data frame.

# A tibble: 1 x 8
  estimate statistic p.value parameter conf.low conf.high method
     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr> 
1     16.0      5.78 2.04e-6        32     10.3      21.6 One S~
# ... with 1 more variable: alternative <chr>

20.2 Bootstrap CI for mean difference using paired samples

The same bootstrap approach is used for paired differences as for a single sample. We again use the smean.cl.boot() function in the Hmisc package to obtain bootstrap confidence intervals for the population mean, \(\mu_d\), of the paired differences in blood lead content.

 Mean Lower Upper 
 16.0  10.8  21.3 

Note that in this case, the confidence interval for the difference in means is a bit less wide than the 95% confidence interval generated by the t test, which was (10.34, 21.59). It’s common for the bootstrap to produce a narrower range (i.e. an apparently more precise estimate) for the population mean, but it’s not automatic that the endpoints from the bootstrap will be inside those provided by the t test, either.

For example, this bootstrap CI doesn’t contain the t-test based interval, since its upper bound exceeds that of the t-based interval:

 Mean Lower Upper 
 16.0  11.0  21.8 

And neither does this one, which actually covers a wider range than the t-based interval.

 Mean Lower Upper 
 16.0  10.3  21.8 

This demonstration aside, the appropriate thing to do when applying the bootstrap to specify a confidence interval is select a seed and the number (B = 1,000 or 10,000, usually) of desired bootstrap replications, then run the bootstrap just once and move on, rather than repeating the process multiple times looking for a particular result.

20.2.1 Assumptions

The bootstrap confidence interval procedure for the population mean (or median) assumes that:

  1. We want to estimate the population mean \(\mu\) (or the population median).
  2. We have drawn a sample of observations at random from the population of interest.
  3. The sampled observations are drawn from the population independently and have identical distributions.
  4. We are willing to put up with the fact that different people (not using the same random seed) will get somewhat different confidence interval estimates using the same data.

As we’ve seen, a major part of the bootstrap’s appeal is the ability to relax some assumptions.

20.3 Wilcoxon Signed Rank-based CI for paired samples

We could also use the Wilcoxon signed rank procedure to generate a CI for the pseudo-median of the paired differences.


    Wilcoxon signed rank test with continuity correction

data:  bloodlead$leaddiff
V = 500, p-value = 1e-05
alternative hypothesis: true location is not equal to 0
90 percent confidence interval:
 11.0 20.5
sample estimates:
(pseudo)median 
          15.5 

As in the one sample case, we can revise this code slightly to specify a different confidence level, or gather a one-sided rather than a two-sided confidence interval.

20.3.1 Assumptions

The Wilcoxon signed rank confidence interval procedure assumes that:

  1. We want to estimate the population median.
  2. We have drawn a sample of observations at random from the population of interest.
  3. The sampled observations are drawn from the population independently and have identical distributions.
  4. The population follows a symmetric distribution. At the very least, the sample itself shows no substantial skew, so that the sample pseudo-median is a reasonable estimate for the population median.

20.3.2 Using broom and the tidy function with a Wilcoxon procedure

# A tibble: 1 x 7
  estimate statistic  p.value conf.low conf.high method        alternative
     <dbl>     <dbl>    <dbl>    <dbl>     <dbl> <chr>         <chr>      
1     15.5       499  1.15e-5     11.0      20.5 Wilcoxon sig~ two.sided  

20.4 Choosing a Confidence Interval Approach

Suppose we want to find a confidence interval for the mean of a population, \(\mu\), or, the population mean difference \(\mu_{d}\) between two populations based on matched pairs.

  1. If we are willing to assume that the population distribution is Normal
    • and that the population SD \(\sigma\) is known, we can use a Z-based CI.
    • and the population SD \(\sigma\) isn’t known, we use a t-based CI.
  2. If we are unwilling to assume that the population is Normal,
    • use a bootstrap procedure to get a CI for the population mean, or even the median
    • but are willing to assume the population is symmetric, consider a Wilcoxon signed rank procedure to get a CI for the median, rather than the mean.

The two methods you’ll use most often are the bootstrap (especially if the data don’t appear to be at least pretty well fit by a Normal model) and the t-based confidence intervals (if the data do appear to fit a Normal model well.)