Chapter 21 Confidence Intervals from Two Independent Samples of Quantitative Data
Here, we’ll consider the problem of estimating a confidence interval to describe the difference in population means (or medians) based on a comparison of two samples of quantitative data, gathered using an independent samples design. Specifically, we’ll use as our example the randomized controlled trial of Ibuprofen in Sepsis patients, as described in Section @ref(Sepsis_RCT).
In that trial, 300 patients meeting specific criteria (including elevated temperature) for a diagnosis of sepsis were randomly assigned to either the Ibuprofen group (150 patients) and 150 to the Placebo group. Group information (our exposure) is contained in the treat
variable. The key outcome of interest to us was temp_drop
, the change in body temperature (in \(^{\circ}\)C) from baseline to 2 hours later, so that positive numbers indicate drops in temperature (a good outcome.)
# A tibble: 300 x 6
id treat race apache temp_0 temp_drop
<chr> <fct> <fct> <int> <dbl> <dbl>
1 S002 Ibuprofen AfricanA 14 38.7 1.4
2 S004 Ibuprofen White 3 38.3 0.4
3 S005 Placebo White 5 38.6 0
4 S006 Ibuprofen White 13 38.2 -0.2
5 S009 Ibuprofen White 25 38.2 0.6
6 S011 Ibuprofen White 21 38.1 -0.4
7 S012 Placebo White 14 38.6 -0.1
8 S014 Placebo White 23 37.9 0.3
9 S016 Placebo White 16 38.1 0.1
10 S020 Ibuprofen Other 20 39.2 1.5
# ... with 290 more rows
21.1 t-based CI for population mean difference \(\mu_1 - \mu_2\) from Independent Samples
21.1.1 The Welch t procedure
The default confidence interval based on the t test for independent samples in R uses something called the Welch test, in which the two populations being compared are not assumed to have the same variance. Each population is assumed to follow a Normal distribution.
Welch Two Sample t-test
data: sepsis$temp_drop by sepsis$treat
t = 4, df = 300, p-value = 3e-05
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
0.191 0.432
sample estimates:
mean in group Ibuprofen mean in group Placebo
0.464 0.153
21.1.2 The Pooled t procedure
The most commonly used t-procedure for building a confidence interval assumes not only that each of the two populations being compared follows a Normal distribution, but also that they have the same population variance. This is the pooled t-test, and it is what people usually mean when they describe a two-sample t test.
Two Sample t-test
data: sepsis$temp_drop by sepsis$treat
t = 4, df = 300, p-value = 3e-05
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
0.191 0.432
sample estimates:
mean in group Ibuprofen mean in group Placebo
0.464 0.153
21.1.3 Using linear regression to obtain a pooled t confidence interval
A linear regression model, using the same outcome and predictor (group) as the pooled t procedure, produces the same confidence interval, again, under the assumption that the two populations we are comparing follow a Normal distribution with the same (population) variance.
Call:
lm(formula = temp_drop ~ treat, data = sepsis)
Coefficients:
(Intercept) treatPlacebo
0.464 -0.311
5 % 95 %
(Intercept) 0.379 0.549
treatPlacebo -0.432 -0.191
We see that our point estimate from the linear regression model is that the difference in temp_drop
is -0.311, where Ibuprofen subjects have higher temp_drop
values than do Placebo subjects, and that the 90% confidence interval for this difference ranges from -0.432 to -0.191.
We can obtain a t-based confidence interval for each of the parameter estimates in a linear model directly using confint
. Linear models usually summarize only the estimate and standard error. Remember that a reasonable approximation in large samples to a 95% confidence interval for a regression estimate (slope or intercept) can be obtained from estimate \(\pm\) 2 * standard error.
Call:
lm(formula = temp_drop ~ treat, data = sepsis)
Residuals:
Min 1Q Median 3Q Max
-2.8527 -0.3640 -0.0527 0.3473 2.6360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4640 0.0516 8.99 < 2e-16 ***
treatPlacebo -0.3113 0.0730 -4.27 2.7e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.632 on 298 degrees of freedom
Multiple R-squared: 0.0575, Adjusted R-squared: 0.0544
F-statistic: 18.2 on 1 and 298 DF, p-value: 2.68e-05
So, in the case of the treatPlacebo
estimate, we can obtain an approximate 95% confidence interval with -0.311 \(\pm\) 2 x 0.073 or (-0.457, -0.165). Compare this to the 95% confidence interval available from the model directly, shown below, and you’ll see only a small difference.
2.5 % 97.5 %
(Intercept) 0.362 0.566
treatPlacebo -0.455 -0.168
21.2 Bootstrap CI for \(mu_1 - \mu_2\) from Independent Samples
The bootdif
function contained in the Love-boost.R
script, that we will use in this setting is a slightly edited version of the function at http://biostat.mc.vanderbilt.edu/wiki/Main/BootstrapMeansSoftware. Note that this approach uses a comma to separate the outcome variable (here, temp_drop
) from the variable identifying the exposure groups (here, treat
).
Loading required package: Hmisc
Loading required package: lattice
Attaching package: 'lattice'
The following object is masked from 'package:boot':
melanoma
Loading required package: survival
Attaching package: 'survival'
The following object is masked from 'package:boot':
aml
Loading required package: Formula
Attaching package: 'Hmisc'
The following objects are masked from 'package:dplyr':
src, summarize
The following objects are masked from 'package:base':
format.pval, units
Mean Difference 0.05 0.95
-0.311 -0.431 -0.197
21.3 Wilcoxon Rank Sum-based CI from Independent Samples
As in the one-sample case, a rank-based alternative attributed to Wilcoxon (and sometimes to Mann and Whitney) provides a two-sample comparison of the pseudomedians in the two treat
groups in terms of temp_drop
. This is called a rank sum test, rather than the signed rank test for a single sample. Here’s the resulting 90% confidence interval.
Wilcoxon rank sum test with continuity correction
data: sepsis$temp_drop by sepsis$treat
W = 10000, p-value = 7e-06
alternative hypothesis: true location shift is not equal to 0
90 percent confidence interval:
0.2 0.4
sample estimates:
difference in location
0.3
21.4 Using the tidy
function from broom
for t and Wilcoxon procedures
The tidy
function is again available to us in dealing with a t-test or Wilcoxon rank sum test.
# A tibble: 1 x 10
estimate estimate1 estimate2 statistic p.value parameter conf.low
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.311 0.464 0.153 4.27 2.71e-5 288. 0.191
# ... with 3 more variables: conf.high <dbl>, method <chr>,
# alternative <chr>
broom::tidy(wilcox.test(sepsis$temp_drop ~ sepsis$treat,
conf.int = TRUE,
conf.level = 0.90,
alt = "two.sided"))
# A tibble: 1 x 7
estimate statistic p.value conf.low conf.high method alternative
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 0.300 14614. 7.28e-6 0.200 0.400 Wilcoxon ran~ two.sided
We can also use broom
functions to place the elements of the linear model model1
into a tidy data frame. This provides the estimate of the Placebo-Ibuprofen difference, and its standard error, which we could use to formulate a confidence interval.
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.464 0.0516 8.99 2.91e-17
2 treatPlacebo -0.311 0.0730 -4.27 2.68e- 5