Chapter 21 Confidence Intervals from Two Independent Samples of Quantitative Data

Here, we’ll consider the problem of estimating a confidence interval to describe the difference in population means (or medians) based on a comparison of two samples of quantitative data, gathered using an independent samples design. Specifically, we’ll use as our example the randomized controlled trial of Ibuprofen in Sepsis patients, as described in Section @ref(Sepsis_RCT).

In that trial, 300 patients meeting specific criteria (including elevated temperature) for a diagnosis of sepsis were randomly assigned to either the Ibuprofen group (150 patients) and 150 to the Placebo group. Group information (our exposure) is contained in the treat variable. The key outcome of interest to us was temp_drop, the change in body temperature (in \(^{\circ}\)C) from baseline to 2 hours later, so that positive numbers indicate drops in temperature (a good outcome.)

# A tibble: 300 x 6
   id    treat     race     apache temp_0 temp_drop
   <chr> <fct>     <fct>     <int>  <dbl>     <dbl>
 1 S002  Ibuprofen AfricanA     14   38.7       1.4
 2 S004  Ibuprofen White         3   38.3       0.4
 3 S005  Placebo   White         5   38.6       0  
 4 S006  Ibuprofen White        13   38.2      -0.2
 5 S009  Ibuprofen White        25   38.2       0.6
 6 S011  Ibuprofen White        21   38.1      -0.4
 7 S012  Placebo   White        14   38.6      -0.1
 8 S014  Placebo   White        23   37.9       0.3
 9 S016  Placebo   White        16   38.1       0.1
10 S020  Ibuprofen Other        20   39.2       1.5
# ... with 290 more rows

21.1 t-based CI for population mean difference \(\mu_1 - \mu_2\) from Independent Samples

21.1.1 The Welch t procedure

The default confidence interval based on the t test for independent samples in R uses something called the Welch test, in which the two populations being compared are not assumed to have the same variance. Each population is assumed to follow a Normal distribution.


    Welch Two Sample t-test

data:  sepsis$temp_drop by sepsis$treat
t = 4, df = 300, p-value = 3e-05
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
 0.191 0.432
sample estimates:
mean in group Ibuprofen   mean in group Placebo 
                  0.464                   0.153 

21.1.2 The Pooled t procedure

The most commonly used t-procedure for building a confidence interval assumes not only that each of the two populations being compared follows a Normal distribution, but also that they have the same population variance. This is the pooled t-test, and it is what people usually mean when they describe a two-sample t test.


    Two Sample t-test

data:  sepsis$temp_drop by sepsis$treat
t = 4, df = 300, p-value = 3e-05
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
 0.191 0.432
sample estimates:
mean in group Ibuprofen   mean in group Placebo 
                  0.464                   0.153 

21.1.3 Using linear regression to obtain a pooled t confidence interval

A linear regression model, using the same outcome and predictor (group) as the pooled t procedure, produces the same confidence interval, again, under the assumption that the two populations we are comparing follow a Normal distribution with the same (population) variance.


Call:
lm(formula = temp_drop ~ treat, data = sepsis)

Coefficients:
 (Intercept)  treatPlacebo  
       0.464        -0.311  
                5 %   95 %
(Intercept)   0.379  0.549
treatPlacebo -0.432 -0.191

We see that our point estimate from the linear regression model is that the difference in temp_drop is -0.311, where Ibuprofen subjects have higher temp_drop values than do Placebo subjects, and that the 90% confidence interval for this difference ranges from -0.432 to -0.191.

We can obtain a t-based confidence interval for each of the parameter estimates in a linear model directly using confint. Linear models usually summarize only the estimate and standard error. Remember that a reasonable approximation in large samples to a 95% confidence interval for a regression estimate (slope or intercept) can be obtained from estimate \(\pm\) 2 * standard error.


Call:
lm(formula = temp_drop ~ treat, data = sepsis)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8527 -0.3640 -0.0527  0.3473  2.6360 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.4640     0.0516    8.99  < 2e-16 ***
treatPlacebo  -0.3113     0.0730   -4.27  2.7e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.632 on 298 degrees of freedom
Multiple R-squared:  0.0575,    Adjusted R-squared:  0.0544 
F-statistic: 18.2 on 1 and 298 DF,  p-value: 2.68e-05

So, in the case of the treatPlacebo estimate, we can obtain an approximate 95% confidence interval with -0.311 \(\pm\) 2 x 0.073 or (-0.457, -0.165). Compare this to the 95% confidence interval available from the model directly, shown below, and you’ll see only a small difference.

              2.5 % 97.5 %
(Intercept)   0.362  0.566
treatPlacebo -0.455 -0.168

21.2 Bootstrap CI for \(mu_1 - \mu_2\) from Independent Samples

The bootdif function contained in the Love-boost.R script, that we will use in this setting is a slightly edited version of the function at http://biostat.mc.vanderbilt.edu/wiki/Main/BootstrapMeansSoftware. Note that this approach uses a comma to separate the outcome variable (here, temp_drop) from the variable identifying the exposure groups (here, treat).

Loading required package: Hmisc
Loading required package: lattice

Attaching package: 'lattice'
The following object is masked from 'package:boot':

    melanoma
Loading required package: survival

Attaching package: 'survival'
The following object is masked from 'package:boot':

    aml
Loading required package: Formula

Attaching package: 'Hmisc'
The following objects are masked from 'package:dplyr':

    src, summarize
The following objects are masked from 'package:base':

    format.pval, units
Mean Difference            0.05            0.95 
         -0.311          -0.431          -0.197 

21.3 Wilcoxon Rank Sum-based CI from Independent Samples

As in the one-sample case, a rank-based alternative attributed to Wilcoxon (and sometimes to Mann and Whitney) provides a two-sample comparison of the pseudomedians in the two treat groups in terms of temp_drop. This is called a rank sum test, rather than the signed rank test for a single sample. Here’s the resulting 90% confidence interval.


    Wilcoxon rank sum test with continuity correction

data:  sepsis$temp_drop by sepsis$treat
W = 10000, p-value = 7e-06
alternative hypothesis: true location shift is not equal to 0
90 percent confidence interval:
 0.2 0.4
sample estimates:
difference in location 
                   0.3 

21.4 Using the tidy function from broom for t and Wilcoxon procedures

The tidy function is again available to us in dealing with a t-test or Wilcoxon rank sum test.

# A tibble: 1 x 10
  estimate estimate1 estimate2 statistic p.value parameter conf.low
     <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
1    0.311     0.464     0.153      4.27 2.71e-5      288.    0.191
# ... with 3 more variables: conf.high <dbl>, method <chr>,
#   alternative <chr>
# A tibble: 1 x 7
  estimate statistic  p.value conf.low conf.high method        alternative
     <dbl>     <dbl>    <dbl>    <dbl>     <dbl> <chr>         <chr>      
1    0.300    14614.  7.28e-6    0.200     0.400 Wilcoxon ran~ two.sided  

We can also use broom functions to place the elements of the linear model model1 into a tidy data frame. This provides the estimate of the Placebo-Ibuprofen difference, and its standard error, which we could use to formulate a confidence interval.

# A tibble: 2 x 5
  term         estimate std.error statistic  p.value
  <chr>           <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)     0.464    0.0516      8.99 2.91e-17
2 treatPlacebo   -0.311    0.0730     -4.27 2.68e- 5