Chapter 30 Comparing Population Rates / Proportions
So, now we’ve built some methods for making statistical inferences about a single population proportion, the next step is to compare two proportions.
For instance, recall the Ebola Virus Disease study from the New England Journal of Medicine. Suppose we want to compare the proportion of deaths among cases that had a definitive outcome who were hospitalized to the proportion of deaths among cases that had a definitive outcome who were not hospitalized.
- We can summarize the data behind the two proportions we are comparing in a contingency table with two rows which identify the exposure or treatment of interest, and two columns to represent the outcomes of interest.
- In this case, we are comparing two groups of Ebola victims: those who were hospitalized and those who were not. The outcome of interest is whether the patient died or not.
- Our exposure is hospitalization and our outcome is death, and in the table we place the frequency for each combination of a row and a column.
- The rows need to be mutually exclusive and collectively exhaustive: each patient must either be hospitalized or not hospitalized. Similarly, the columns must meet the same standard: every patient is either dead or alive.
The article suggests that of the 1,737 cases with a definitive outcome, there were 1,153 hospitalized cases. Across those 1,153 hospitalized cases, 741 people (64.3%) died, which means that across the remaining 584 non-hospitalized cases, 488 people (83.6%) died.
Here is the initial contingency table, using only the numbers from the previous paragraph.
Initial Ebola Table | Deceased | Alive | Total |
---|---|---|---|
Hospitalized | 741 | – | 1153 |
Not Hospitalized | 488 | – | 584 |
Total | 1737 |
Now, we can use arithmetic to complete the table, since the rows and the columns are each mutually exclusive and collectively exhaustive.
Ebola 2x2 Table | Deceased | Alive | Total |
---|---|---|---|
Hospitalized | 741 | 412 | 1153 |
Not Hospitalized | 488 | 96 | 584 |
Total | 1229 | 508 | 1737 |
We want to compare the fatality risk (probability of being in the deceased column) for the population of people in the hospitalized row to the population of people in the not hospitalized row.
We do this by means of a hypothesis testing or confidence interval framework. The tricky part is that we have multiple ways to describe the relationship between hospitalization and death. We might compare the risks directly using the difference in probabilities, or the ratio of the two probabilities, or we might convert the risks to odds, and compare the ratio of those odds. In any case, we’ll get slightly different p values and confidence intervals, all of which will help us answer the question about whether there is a statistically significant difference in fatality rates between those people who were hospitalized and those who were not. We’ll return to this set of questions after discussing some of those approaches in a somewhat less depressing example.
30.1 Amoxicillin vs. Placebo for Otitis Media with Effusion
Van Balen et al. (1996) reported a double-blind placebo-controlled study of amoxicillin versus placebo for persistent otitis media with effusion (OME) in general practice. The research question was whether antibiotic treatment is any better than watchful waiting. In this study, 162 children were randomized to receive amoxicillin or placebo. The outcome was the absence of persistent OME after two weeks of treatment.
30.2 The 2 by 2 Table
Data for the 149 children completing the two-week follow-up period are shown below:
Treatment Arm | Without OME | With OME | Total |
---|---|---|---|
Amoxicillin | 37 | 42 | 79 |
Placebo | 11 | 59 | 70 |
Total | 48 | 101 | 149 |
This is an example of a 2x2 table, where we have the two treatments in the rows of the table, and the two possible outcomes in the columns of the table. This is an especially appropriate way to look at counts describing where subjects fall in the relationship between a categorical exposure/treatment and a categorical outcome.
30.3 Relating a Treatment to an Outcome
The question of interest is whether the percentage of amoxicillin kids without OME is different (specifically, larger) than the percentage of placebo kids without OME.
Treatment Arm | Without OME | With OME | Total | Proportion without OME |
---|---|---|---|---|
Amoxicillin | 37 | 42 | 79 | 0.468 |
Placebo | 11 | 59 | 70 | 0.157 |
In other words, what is the relationship between the treatment and the outcome?
30.4 Definitions of Probability and Odds
- Proportion = Probability = Risk of the trait = number with trait / total
- Odds of having the trait = (number with the trait / number without the trait) to 1
If p is the proportion of subjects with a trait, then the odds of having the trait are \(\frac{p}{1-p}\) to 1.
So, the probability of a good result (without OME) in this case is \(\frac{37}{79} = 0.4684\) in the amoxicillin group. The odds of a good result are thus \(\frac{0.4684}{1-0.4684} = 0.8811\) to 1.
Treatment | Without OME | With OME | Total | Pr(without OME) | Odds(without OME) |
---|---|---|---|---|---|
Amoxicillin | 37 | 42 | 79 | 0.4684 | 0.8811 |
Placebo | 11 | 59 | 70 | 0.1571 | 0.1864 |
30.5 Defining the Relative Risk
Among the amoxicillin subjects, the risk of a good outcome (without OME) is 46.84% or, stated as a proportion, 0.4684. Among the placebo subjects, the risk of a good outcome (without OME) is 15.71% or, stated as a proportion, 0.1571.
So our “crude” estimate of the relative risk of a good outcome for amoxicillin subjects as compared to placebo subjects, is the ratio of these two risks, or 0.4684/0.1571 = 2.98
- The fact that this relative risk is greater than 1 indicates that the probability of a good outcome is higher for amoxicillin subjects than for placebo subjects.
- A relative risk of 1 would indicate that the probability of a good outcome is the same for amoxicillin subjects and for placebo subjects.
- A relative risk less than 1 would indicate that the probability of a good outcome is lower for amoxicillin subjects than for placebo subjects.
30.6 Defining the Risk Difference
Our “crude” estimate of the risk difference of a good outcome for amoxicillin subjects as compared to placebo subjects, is 0.4684 - 0.1571 = 0.3113 or 31.1%
- The fact that this risk difference is greater than 0 indicates that the probability of a good outcome is higher for amoxicillin subjects than for placebo subjects.
- A risk difference of 0 would indicate that the probability of a good outcome is the same for amoxicillin subjects and for placebo subjects.
- A risk difference less than 0 would indicate that the probability of a good outcome is lower for amoxicillin subjects than for placebo subjects.
30.7 Defining the Odds Ratio, or the Cross-Product Ratio
Among the amoxicillin subjects, the odds of a good outcome (without OME) are 0.8811. Among the placebo subjects, the odds of a good outcome (without OME) are .1864.
So our “crude” estimate of the odds ratio of a good outcome for amoxicillin subjects as compared to placebo subjects, is 0.8811 / 0.1864 = 4.73
Another way to calculate this odds ratio is to calculate the cross-product ratio, which is equal to (ad) / (b c), for the 2 by 2 table with counts specified as shown:
A Generic Table | Good Outcome | Bad Outcome |
---|---|---|
Treatment Group 1 | a | b |
Treatment Group 2 | c | d |
So, for our table, we have a = 37, b = 42, c = 11, and d = 59, so the cross-product ratio is \(\frac{37 x 59}{42 x 11} = \frac{2183}{462} = 4.73\). As expected, this is the same as the “crude” odds ratio estimate.
- The fact that this odds ratio risk is greater than 1 indicates that the odds of a good outcome are higher for amoxicillin subjects than for placebo subjects.
- An odds ratio of 1 would indicate that the odds of a good outcome are the same for amoxicillin subjects and for placebo subjects.
- An odds ratio less than 1 would indicate that the odds of a good outcome are lower for amoxicillin subjects than for placebo subjects.
So, we have several different ways to compare the outcomes across the treatments. Are these differences and ratios large enough to rule out chance?
30.8 Comparing Rates in a 2x2 Table
The key question is whether the percentage of amoxicillin kids without OME is statistically significantly different (specifically, larger) than the percentage of placebo kids without OME. In other words, what is the relationship between the treatment and the outcome in the following two-by-two table?
Treatment Arm | (Good Outcome) Without OME | (Bad Outcome) With OME | Total |
---|---|---|---|
Amoxicillin | 37 | 42 | 79 |
Placebo | 11 | 59 | 70 |
Total | 48 | 101 | 149 |
30.9 The twobytwo
function in R
I built the twobytwo
function in R (based on existing functions in the Epi
library, which you need to have in your available Packages list in order for this to work) to do the work for this problem. All that is required is a single command, and a two-by-two table like this one, in standard epidemiological format (with the outcomes in the columns, and the treatments in the rows.)
The command just requires you to read off the cells of the table, followed by the labels for the two treatments, then the two outcomes, in this order:
twobytwo(37,42,11,59, "Amoxicillin", "Placebo", "Good", "Bad")
The resulting output follows. We’ll walk through it all in a moment.
2 by 2 table analysis:
------------------------------------------------------
Outcome : Good
Comparing : Amoxicillin vs. Placebo
Good Bad P(Good) 95% conf. interval
Amoxicillin 37 42 0.468 0.3615 0.578
Placebo 11 59 0.157 0.0892 0.262
95% conf. interval
Relative Risk: 2.980 1.650 5.383
Sample Odds Ratio: 4.725 2.164 10.316
Conditional MLE Odds Ratio: 4.675 2.051 11.384
Probability difference: 0.311 0.164 0.439
Exact P-value: 0
Asymptotic P-value: 1e-04
------------------------------------------------------
The main conclusion for the data using any of these tests and confidence intervals, is that with 95% confidence, we can conclude that the probability of a good outcome (i.e. no OME at two weeks) is significantly higher with the use of Amoxicillin as compared to placebo.
30.10 Walking through the twobytwo
function’s Results
30.10.1 Outcome Probabilities and Confidence Intervals Within the Treatment Groups
The output starts with estimates of the probability (risk) of a Good Outcome among patients who fall into the two treatment groups (Amoxicillin or Placebo), along with 95% confidence intervals for each of these probabilities.
2 by 2 table analysis:
------------------------------------------------------
Outcome : Good
Comparing : Amoxicillin vs. Placebo
Good Bad P(Good) 95% conf. interval
Amoxicillin 37 42 0.4684 0.3615 0.5781
Placebo 11 59 0.1571 0.0892 0.2619
The conditional probability of a Good outcome given that the patient is in the Amoxicillin treatment arm, is symbolized as Pr(Good | Amoxicillin) = 0.4684.
- Note that if these two confidence intervals fail to overlap (as these do) then we would expect to find a statistically significant difference in probability of a good outcome when we compare amoxicillin to placebo.
- If the two confidence intervals overlap, then we don’t know whether the difference will be statistically significant or not yet.
30.10.2 Relative Risk, Odds Ratio and Risk Difference, with Confidence Intervals
These elements are followed by estimates of the relative risk, odds ratio, and risk difference, each with associated 95% confidence intervals.
95% conf. interval
Relative Risk: 2.9804 1.6501 5.3832
Sample Odds Ratio: 4.7251 2.1643 10.3157
Conditional MLE Odds Ratio: 4.6746 2.0509 11.3837
Probability difference: 0.3112 0.1636 0.4391
- The relative risk, or the ratio of P(Good Outcome | Amoxicillin) to P(Good Outcome | Placebo), is shown first. Note that the 95% confidence interval is entirely greater than 1, suggesting that the true relative risk is significantly greater than 1, and thus that the probability of a good outcome is significantly more likely for amoxicillin.
- The odds ratio is presented using two different definitions (the sample odds ratio is the cross-product ratio we mentioned earlier). Note that the 95% confidence interval using either approach is entirely greater than 1, suggesting that the true odds ratio is significantly greater than 1. If that is true, then the odds (and thus also the probability) of a good outcome are significantly higher likely for amoxicillin.
- The probability (or risk) difference [P(Good Outcome | Amoxicillin) - P(Good Outcome | Placebo)] is presented last. Note that the 95% confidence interval is entirely greater than 0, suggesting that the true risk difference is significantly greater than 0, and thus that the probability of a good outcome is significantly more likely for amoxicillin.
- Note carefully that if there had been no difference between Amoxicillin and Placebo, the relative risk and odds ratios would be 1, but the probability difference would be zero.
30.10.3 Hypothesis Testing Results
Finally, the output gives p values for both a Fisher’s exact test (exact) and Pearson \(\chi^2\) test (asymptotic) of the hypotheses
- H0: Rows and Columns are statistically independent. vs.
- HA: Rows and Columns are associated with (or dependent on) each other.
Exact P-value: 0
Asymptotic P-value: 1e-04
Here, the tiny p values (in both cases, p < 0.001) suggest that we should reject H0 and conclude that the outcome probabilities are associated with the treatment received. In other words, which Treatment group you’re in significantly affects the probability of obtaining a Good outcome, which is the same conclusion we’ve drawn from each of the confidence intervals presented above.
30.11 Estimating a Rate More Accurately: Use (x + 1)/(n + 2) rather than x/n
Suppose you have some data involving n independent tries, with x successes. A natural estimate of the “success rate” in the data is x / n.
But, strangely enough, it turns out this isn’t an entirely satisfying estimator. Alan Agresti provides substantial motivation for the (x + 1)/(n + 2) estimate as an alternative. This is sometimes called a Bayesian augmentation.
- The big problem with x / n is that it estimates p = 0 or p = 1 when x = 0 or x = n.
- It’s also tricky to compute confidence intervals at these extremes, since the usual standard error for a proportion, \(\sqrt{n p (1-p)}\), gives zero, which isn’t quite right.
- (x + 1)/(n + 2) is much cleaner, especially when you build a confidence interval for the rate.
- The only place where (x + 1)/(n + 2) will go wrong (as in the SAIFS approach) is if n is small and the true probability is very close to 0 or 1.
For example, if n = 10, and p is 1 in a million, then x will almost certainly be zero, and an estimate of 1/12 is much worse than the simple 0/10. However, how big a deal is this? If p might be 1 in a million, you’re not going to estimate it with a n = 10 experiment.
30.12 Back to the OTE example
Returning to our example, let’s run an augmented analysis (with one extra “Good” and one extra “Bad” in each of the treatment groups)
2 by 2 table analysis:
------------------------------------------------------
Outcome : Good
Comparing : Amoxicillin vs. Placebo
Good Bad P(Good) 95% conf. interval
Amoxicillin 38 43 0.469 0.3635 0.578
Placebo 12 60 0.167 0.0972 0.271
95% conf. interval
Relative Risk: 2.815 1.598 4.96
Sample Odds Ratio: 4.419 2.071 9.43
Conditional MLE Odds Ratio: 4.375 1.965 10.33
Probability difference: 0.302 0.156 0.43
Exact P-value: 1e-04
Asymptotic P-value: 1e-04
------------------------------------------------------
Note that the augmentation moves both the estimate and interval endpoints towards 0.50.
30.13 Does the Bayesian Augmentation (x + 1)/(n + 2)
Matter, Practically?
Generally, this augmentation doesn’t matter much at all in any setting where you have a reasonably large sample size, or where the sample probability of success in each group isn’t too close to 0 or 1.
Suppose you have 50 subjects who were exposed to some stimulus, and another 45 who were not. Of the 50 exposed subjects, 20 have the outcome of interest, while this is true for 9 of the unexposed subjects. What conclusions do we draw, first without and then with this Bayesian augmentation?
First, without the augmentation:
2 by 2 table analysis:
------------------------------------------------------
Outcome : Has Outcome
Comparing : Exposed vs. Not Exposed
Has Outcome No Outcome P(Has Outcome) 95% conf. interval
Exposed 20 30 0.4 0.275 0.540
Not Exposed 9 36 0.2 0.107 0.342
95% conf. interval
Relative Risk: 2.00 1.0175 3.931
Sample Odds Ratio: 2.67 1.0585 6.718
Conditional MLE Odds Ratio: 2.64 0.9746 7.629
Probability difference: 0.20 0.0144 0.365
Exact P-value: 0.0451
Asymptotic P-value: 0.0375
------------------------------------------------------
And now, with the augmentation:
2 by 2 table analysis:
------------------------------------------------------
Outcome : Has Outcome
Comparing : Exposed vs. Not Exposed
Has Outcome No Outcome P(Has Outcome) 95% conf. interval
Exposed 21 31 0.404 0.280 0.541
Not Exposed 10 37 0.213 0.118 0.352
95% conf. interval
Relative Risk: 1.898 0.999 3.605
Sample Odds Ratio: 2.506 1.028 6.113
Conditional MLE Odds Ratio: 2.483 0.950 6.859
Probability difference: 0.191 0.008 0.355
Exact P-value: 0.0517
Asymptotic P-value: 0.0434
------------------------------------------------------
It is likely that the augmented version is a more accurate estimate here, but the two estimates will be comparable, generally, so long as either (a) the sample size in each exposure group is more than, say, 30 subjects, and/or (b) the sample probability of the outcome is between 10% and 90% in each exposure group.
30.14 Hypothesis Testing About a Population Proportion
To perform a hypothesis test about a population proportion, we’ll usually use the prop.test
or binom.test
approaches in R.
- The null hypothesis is that the population proportion is equal to some pre-specified value. Often, this is taken to be 0.5, but it can be any value, called \(\pi_0\) that is between 0 and 1.
- The alternative hypothesis may be one-sided or two-sided. If it is two-sided, it will be that the population proportion is not equal to the value \(\pi_0\) specified by the null hypothesis.
- In the two-sided case, we have \(H_0: \pi = \pi_0\) and \(H_A: \pi \neq \pi_0\)
- In the one-sided “greater than” case, we have \(H_0: \pi leq \pi_0\) and \(H_A: \pi > \pi_0\)
As an example, suppose we want to see if the evidence available so far is enough to conclude that the population case fatality rate across the countries included in the WHO’s report is more than 67% (i.e. more than two-thirds of those with definitive outcomes will die), and we want to do this using a 5% significance level.
We could use prop.test
or binom.test
here.
Exact binomial test
data: 1229 and 1737
number of successes = 1000, number of trials = 2000, p-value =
4e-04
alternative hypothesis: true probability of success is greater than 0.67
95 percent confidence interval:
0.689 1.000
sample estimates:
probability of success
0.708
1-sample proportions test with continuity correction
data: 1229 out of 1737, null probability 0.67
X-squared = 10, df = 1, p-value = 5e-04
alternative hypothesis: true p is greater than 0.67
95 percent confidence interval:
0.689 1.000
sample estimates:
p
0.708
- What conclusion should we draw here?
- Does it matter which of the two test procedures we use?
- Do the p values match up with the 95% confidence intervals?
30.15 Assumptions for Inferences about a Population Proportion
- There are n identical trials.
- There are two possible outcomes (designated as success and failure) for each trial.
- The true probability of success, \(\pi\), remains constant across trials.
- Each trial is independent of all of the other trials.
In order for the confidence intervals and tests we produce to remain reasonably accurate, we’d also like to see that both np = the observed number of successes and n(1-p) = the observed number of failures are greater than 5. If not, then the intervals may be incorrect (shifted away from the true value of \(\pi\)), and also less efficient (wider) than necessary.
30.16 Building a 2x2 Table in R from a Data Frame
Remember our first-day survey? It’s in the surveyday1.csv
file on our website, and loaded as the survey1
data frame here. Two of the questions on that survey asked you to specify your sex and whether English was your first language. Do men and women have statistically significantly different probabilities of being native speakers of English?
n y
f 13 57
m 16 68
I would like to make those a little easier to read, so I’m going to change the labels for the levels without changing their order.
survey1$sex.new <- factor(survey1$sex, labels = c("Female", "Male"))
survey1$lang1 <- factor(survey1$english, labels = c("Not English", "English"))
table(survey1$sex.new, survey1$lang1)
Not English English
Female 13 57
Male 16 68
30.17 Standard Epidemiological Format
Now, suppose we want this in standard epidemiological format, which means that:
- The rows of the table describe the “treatment” (which we’ll take here to be sex). The more interesting (sometimes also the more common) “treatment” is placed in the top row.
- The columns of the table describe the “outcome” (which we’ll take here to be whether English was your first language.) Typically, the more common “outcome” is placed to the left.
So, for standard format, we want to get the “Female” and “English” cell to the top left of the table, not the “Female” and “Not English” cell that is there now.
So, we are going to reorder the english
variable’s levels to accomplish this:
survey1$lang.new <- factor(survey1$lang1, levels=c("English", "Not English"))
table(survey1$sex.new, survey1$lang.new)
English Not English
Female 57 13
Male 68 16
And now, we can seamlessly grab these results and insert them into the twoby2
function from the Epi
package…
2 by 2 table analysis:
------------------------------------------------------
Outcome : English
Comparing : Female vs. Male
English Not English P(English) 95% conf. interval
Female 57 13 0.814 0.706 0.889
Male 68 16 0.809 0.711 0.880
95% conf. interval
Relative Risk: 1.0059 0.864 1.172
Sample Odds Ratio: 1.0317 0.458 2.324
Conditional MLE Odds Ratio: 1.0315 0.424 2.545
Probability difference: 0.0048 -0.122 0.126
Exact P-value: 1
Asymptotic P-value: 0.94
------------------------------------------------------
30.18 Use the Bayesian Augmentation (x + 1)/(n + 2)
As a default estimate for a rate, (x + 1)/(n + 2) is a better choice than x / n. Add a success and a failure to your data to get a better estimate (especially a confidence interval) of a population rate. This is sometimes called a Bayesian augmentation of the data. Occasionally, statisticians will use a more extensive adjustment, like (x + 2) / (n + 4), even.
If we like, we can analyze the sex-language relationship including the Bayesian augmentation:
2 by 2 table analysis:
------------------------------------------------------
Outcome : English
Comparing : Female vs. Male
English Not English P(English) 95% conf. interval
Female 58 14 0.806 0.698 0.881
Male 69 17 0.802 0.705 0.873
95% conf. interval
Relative Risk: 1.0040 0.860 1.172
Sample Odds Ratio: 1.0207 0.464 2.246
Conditional MLE Odds Ratio: 1.0206 0.431 2.446
Probability difference: 0.0032 -0.124 0.125
Exact P-value: 1
Asymptotic P-value: 0.959
------------------------------------------------------
30.19 Returning to the Ebola Virus Disease Survival Example
Recall our 2x2 table comparing case fatality rates by whether the subject was hospitalized.
Ebola 2x2 Table | Deceased | Alive | Total |
---|---|---|---|
Hospitalized | 741 | 412 | 1153 |
Not Hospitalized | 488 | 96 | 584 |
Total | 1229 | 508 | 1737 |
We can run these data through R, using the augmentation (adding a death and a survival to the hospitalized and also to the not hospitalized groups.)
2 by 2 table analysis:
------------------------------------------------------
Outcome : Deceased
Comparing : Hospitalized vs. Not Hospitalized
Deceased Alive P(Deceased) 95% conf. interval
Hospitalized 742 413 0.642 0.614 0.670
Not Hospitalized 489 97 0.835 0.802 0.862
95% conf. interval
Relative Risk: 0.770 0.728 0.814
Sample Odds Ratio: 0.356 0.278 0.457
Conditional MLE Odds Ratio: 0.357 0.275 0.460
Probability difference: -0.192 -0.232 -0.150
Exact P-value: 0
Asymptotic P-value: 0
------------------------------------------------------
- What conclusions can you draw from the R output above?
Now, in the same New England Journal of Medicine article, data are provided for the percentage of deaths among male and female patients, for a slightly different group of EVD patients. In that group, there were 874 men, of whom 631 died, and 818 women, of whom 572 died.
- Specify the null and alternative hypotheses that can be tested for these new data.
- Develop the appropriate 2x2 table and get it into R for analysis.
- What conclusions can we draw from your comparison of fatality risks by sex?
30.19.1 Answer Sketch for questions 2-4
The null hypothesis is that the population death rate among men is the same as the population death rate among women, against a two-sided alternative (that the rates are not the same)
Here is the appropriate set of 2x2 table results, including the Bayesian augmentation.
2 by 2 table analysis:
------------------------------------------------------
Outcome : Died
Comparing : Men vs. Women
Died Survived P(Died) 95% conf. interval
Men 632 244 0.722 0.691 0.750
Women 573 247 0.699 0.666 0.729
95% conf. interval
Relative Risk: 1.0325 0.9714 1.0973
Sample Odds Ratio: 1.1165 0.9051 1.3774
Conditional MLE Odds Ratio: 1.1165 0.9000 1.3851
Probability difference: 0.0227 -0.0205 0.0658
Exact P-value: 0.309
Asymptotic P-value: 0.303
------------------------------------------------------
Our conclusions are (from any of the comparisons) that the survival rates do not differ significantly by sex, at least at the 5% significance level. We could use the relative risk, odds ratio, probability difference, or even chi-square test, to see this.