Chapter 21 Confidence Intervals from Two Independent Samples of Quantitative Data

Here, we’ll consider the problem of estimating a confidence interval to describe the difference in population means (or medians) based on a comparison of two samples of quantitative data, gathered using an independent samples design. Specifically, we’ll use as our example the randomized controlled trial of Ibuprofen in Sepsis patients, as described in Section @ref(Sepsis_RCT).

In that trial, 300 patients meeting specific criteria (including elevated temperature) for a diagnosis of sepsis were randomly assigned to either the Ibuprofen group (150 patients) and 150 to the Placebo group. Group information (our exposure) is contained in the treat variable. The key outcome of interest to us was temp_drop, the change in body temperature (in \(^{\circ}\)C) from baseline to 2 hours later, so that positive numbers indicate drops in temperature (a good outcome.)

sepsis
# A tibble: 300 x 6
      id     treat     race apache temp_0 temp_drop
   <chr>    <fctr>   <fctr>  <int>  <dbl>     <dbl>
 1  S002 Ibuprofen AfricanA     14   38.7       1.4
 2  S004 Ibuprofen    White      3   38.3       0.4
 3  S005   Placebo    White      5   38.6       0.0
 4  S006 Ibuprofen    White     13   38.2      -0.2
 5  S009 Ibuprofen    White     25   38.2       0.6
 6  S011 Ibuprofen    White     21   38.1      -0.4
 7  S012   Placebo    White     14   38.6      -0.1
 8  S014   Placebo    White     23   37.9       0.3
 9  S016   Placebo    White     16   38.1       0.1
10  S020 Ibuprofen    Other     20   39.2       1.5
# ... with 290 more rows

21.1 t-based CI for population mean difference \(mu_1 - \mu_2\) from Independent Samples

21.1.1 The Welch t procedure

The default confidence interval based on the t test for independent samples in R uses something called the Welch test, in which the two populations being compared are not assumed to have the same variance. Each population is assumed to follow a Normal distribution.

t.test(sepsis$temp_drop ~ sepsis$treat, conf.level = 0.90, alt = "two.sided")

    Welch Two Sample t-test

data:  sepsis$temp_drop by sepsis$treat
t = 4, df = 300, p-value = 3e-05
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
 0.191 0.432
sample estimates:
mean in group Ibuprofen   mean in group Placebo 
                  0.464                   0.153 

21.1.2 The Pooled t procedure

The most commonly used t-procedure for building a confidence interval assumes not only that each of the two populations being compared follows a Normal distribution, but also that they have the same population variance. This is the pooled t-test, and it is what people usually mean when they describe a two-sample t test.

t.test(sepsis$temp_drop ~ sepsis$treat, conf.level = 0.90, alt = "two.sided", var.equal = TRUE)

    Two Sample t-test

data:  sepsis$temp_drop by sepsis$treat
t = 4, df = 300, p-value = 3e-05
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
 0.191 0.432
sample estimates:
mean in group Ibuprofen   mean in group Placebo 
                  0.464                   0.153 

21.1.3 Using linear regression to obtain a pooled t confidence interval

A linear regression model, using the same outcome and predictor (group) as the pooled t procedure, produces the same confidence interval, again, under the assumption that the two populations we are comparing follow a Normal distribution with the same (population) variance.

model1 <- lm(temp_drop ~ treat, data = sepsis)
model1

Call:
lm(formula = temp_drop ~ treat, data = sepsis)

Coefficients:
 (Intercept)  treatPlacebo  
       0.464        -0.311  
confint(model1, level = 0.90)
                5 %   95 %
(Intercept)   0.379  0.549
treatPlacebo -0.432 -0.191

We see that our point estimate from the linear regression model is that the difference in temp_drop is -0.311, where Ibuprofen subjects have higher temp_drop values than do Placebo subjects, and that the 90% confidence interval for this difference ranges from -0.432 to -0.191.

We can obtain a t-based confidence interval for each of the parameter estimates in a linear model directly using confint. Linear models usually summarize only the estimate and standard error. Remember that a reasonable approximation in large samples to a 95% confidence interval for a regression estimate (slope or intercept) can be obtained from estimate \(\pm\) 2 * standard error.

summary(model1)

Call:
lm(formula = temp_drop ~ treat, data = sepsis)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8527 -0.3640 -0.0527  0.3473  2.6360 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.4640     0.0516    8.99  < 2e-16 ***
treatPlacebo  -0.3113     0.0730   -4.27  2.7e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.632 on 298 degrees of freedom
Multiple R-squared:  0.0575,    Adjusted R-squared:  0.0544 
F-statistic: 18.2 on 1 and 298 DF,  p-value: 2.68e-05

So, in the case of the treatPlacebo estimate, we can obtain an approximate 95% confidence interval with -0.311 \(\pm\) 2 x 0.073 or (-0.457, -0.165). Compare this to the 95% confidence interval available from the model directly, shown below, and you’ll see only a small difference.

confint(model1, level = 0.95)
              2.5 % 97.5 %
(Intercept)   0.362  0.566
treatPlacebo -0.455 -0.168

21.2 Bootstrap CI for \(mu_1 - \mu_2\) from Independent Samples

The bootdif function contained in the Love-boost.R script, that we will use in this setting is a slightly edited version of the function at http://biostat.mc.vanderbilt.edu/wiki/Main/BootstrapMeansSoftware. Note that this approach uses a comma to separate the outcome variable (here, temp_drop) from the variable identifying the exposure groups (here, treat).

set.seed(431212)
bootdif(sepsis$temp_drop, sepsis$treat, conf.level = 0.90)
Loading required package: Hmisc
Loading required package: lattice

Attaching package: 'lattice'
The following object is masked from 'package:boot':

    melanoma
Loading required package: survival

Attaching package: 'survival'
The following object is masked from 'package:boot':

    aml
Loading required package: Formula

Attaching package: 'Hmisc'
The following objects are masked from 'package:dplyr':

    combine, src, summarize
The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units
Mean Difference            0.05            0.95 
         -0.311          -0.431          -0.197 

21.3 Wilcoxon Rank Sum-based CI from Independent Samples

As in the one-sample case, a rank-based alternative attributed to Wilcoxon (and sometimes to Mann and Whitney) provides a two-sample comparison of the pseudomedians in the two treat groups in terms of temp_drop. This is called a rank sum test, rather than the signed rank test for a single sample. Here’s the resulting 90% confidence interval.

wilcox.test(sepsis$temp_drop ~ sepsis$treat, 
            conf.int = TRUE, conf.level = 0.90, 
            alt = "two.sided")

    Wilcoxon rank sum test with continuity correction

data:  sepsis$temp_drop by sepsis$treat
W = 10000, p-value = 7e-06
alternative hypothesis: true location shift is not equal to 0
90 percent confidence interval:
 0.2 0.4
sample estimates:
difference in location 
                   0.3 

21.4 Using the tidy function from broom for t and Wilcoxon procedures

The tidy function is again available to us in dealing with a t-test or Wilcoxon rank sum test.

broom::tidy(t.test(sepsis$temp_drop ~ sepsis$treat, 
                   conf.level = 0.90, 
                   alt = "two.sided"))
  estimate estimate1 estimate2 statistic  p.value parameter conf.low
1    0.311     0.464     0.153      4.27 2.71e-05       288    0.191
  conf.high                  method alternative
1     0.432 Welch Two Sample t-test   two.sided
broom::tidy(wilcox.test(sepsis$temp_drop ~ sepsis$treat, 
                        conf.int = TRUE, 
                        conf.level = 0.90, 
                        alt = "two.sided"))
  estimate statistic  p.value conf.low conf.high
1      0.3     14614 7.28e-06      0.2       0.4
                                             method alternative
1 Wilcoxon rank sum test with continuity correction   two.sided

We can also use broom functions to place the elements of the linear model model1 into a tidy data frame. This provides the estimate of the Placebo-Ibuprofen difference, and its standard error, which we could use to formulate a confidence interval.

broom::tidy(model1)
          term estimate std.error statistic  p.value
1  (Intercept)    0.464    0.0516      8.99 2.91e-17
2 treatPlacebo   -0.311    0.0730     -4.27 2.68e-05
rm(model1)