13  Proportions and Rates

This is a DRAFT version of this Chapter.

This is a sketchy draft. I’ll remove this notice when I post a version of this Chapter that is essentially finished.

13.1 R setup for this chapter

Note

Appendix A lists all R packages used in this book, and also provides R session information.

13.2 Data: strep_tb data from the medicaldata R package

Note

Appendix C provides further guidance on pulling data from other systems into R, while Appendix D gives more information (including download links) for all data sets used in this book. Appendix B describes the 431-Love.R script, and demonstrates its use.

See pages 51-52 of R&OS for standard errors and confidence intervals for proportions, and for what to do when y = 0 or y = n

My source for these data is Higgins (2023).

strep_tb data from the medicaldata R package - we’ll look at study arm (streptomycin or control) and the dichotomous outcome of improved (true, false) - will need to work with a logical variable, and we’ll also keep the patient ID.

See https://higgi13425.github.io/medicaldata/ for more details.

strep <- medicaldata::strep_tb |>
  mutate(
    imp_f = factor(improved),
    imp_f = fct_recode(imp_f,
      "Improved" = "TRUE",
      "Worsened" = "FALSE"
    ),
    imp_f = fct_relevel(imp_f, "Improved")
  )
strep
# A tibble: 107 × 14
   patient_id arm     dose_strep_g dose_PAS_g gender baseline_condition
   <chr>      <fct>          <dbl>      <dbl> <fct>  <fct>             
 1 0001       Control            0          0 M      1_Good            
 2 0002       Control            0          0 F      1_Good            
 3 0003       Control            0          0 F      1_Good            
 4 0004       Control            0          0 M      1_Good            
 5 0005       Control            0          0 F      1_Good            
 6 0006       Control            0          0 M      1_Good            
 7 0007       Control            0          0 F      1_Good            
 8 0008       Control            0          0 M      1_Good            
 9 0009       Control            0          0 F      2_Fair            
10 0010       Control            0          0 M      2_Fair            
# ℹ 97 more rows
# ℹ 8 more variables: baseline_temp <fct>, baseline_esr <fct>,
#   baseline_cavitation <fct>, strep_resistance <fct>, radiologic_6m <fct>,
#   rad_num <dbl>, improved <lgl>, imp_f <fct>
table(strep$arm, strep$imp_f)
              
               Improved Worsened
  Streptomycin       38       17
  Control            17       35

13.3 Estimating a Proportion

Within those who received Streptomycin, 38 improved and 17 did not out of 55 subjects. Can we estimate a confidence interval for the population proportion of all subjects?

13.3.1 Using a Bayesian augmentation

binom.test(x = 38 + 2, n = 55 + 4, conf.level = 0.95)

    Exact binomial test

data:  38 + 2 and 55 + 4
number of successes = 40, number of trials = 59, p-value = 0.008641
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.5436200 0.7937535
sample estimates:
probability of success 
             0.6779661 

13.3.2 SAIFS: single augmentation with an imaginary failure or success

The saifs_ci() function

`saifs_ci` <- 
  function(x, n, conf.level=0.95, dig=3)
  {
    p.sample <- round(x/n, digits=dig)
    
    p1 <- x / (n+1)
    p2 <- (x+1) / (n+1)
    
    var1 <- (p1*(1-p1))/n
    se1 <- sqrt(var1)
    var2 <- (p2*(1-p2))/n
    se2 <- sqrt(var2)
    
    lowq = (1 - conf.level)/2
    tcut <- qt(lowq, df=n-1, lower.tail=FALSE)
    
    lower.bound <- round(p1 - tcut*se1, digits=dig)
    upper.bound <- round(p2 + tcut*se2, digits=dig)
    tibble(
      sample_x = x,
      sample_n = n,
      sample_p = p.sample,
      lower = lower.bound,
      upper = upper.bound,
      conf_level = conf.level
    )
  }

Using the saifs_ci() function from Love-431.R

saifs_ci(x = 38, n = 55, conf.level = 0.95, dig = 3)
# A tibble: 1 × 6
  sample_x sample_n sample_p lower upper conf_level
     <dbl>    <dbl>    <dbl> <dbl> <dbl>      <dbl>
1       38       55    0.691 0.552 0.821       0.95

13.4 Assessing the 2 x 2 table

This table is in standard epidemiological format, which means that:

  • The rows of the table describe the “treatment” (which we’ll take here to be arm).
    • The more interesting (sometimes also the more common) “treatment” is placed in the top row. That’s Streptomycin here.
  • The columns of the table describe the “outcome” (which we’ll take here to be whether the subject improved or not.)
    • Typically, the more common or more interesting “outcome” is placed to the left. Here, we’ll use “improved” on the left.
twoby2(table(strep$arm, strep$imp_f))
2 by 2 table analysis: 
------------------------------------------------------ 
Outcome   : Improved 
Comparing : Streptomycin vs. Control 

             Improved Worsened    P(Improved) 95% conf. interval
Streptomycin       38       17         0.6909    0.5579   0.7984
Control            17       35         0.3269    0.2139   0.4644

                                   95% conf. interval
             Relative Risk: 2.1134    1.3773   3.2429
         Sample Odds Ratio: 4.6021    2.0389  10.3877
Conditional MLE Odds Ratio: 4.5304    1.8962  11.2779
    Probability difference: 0.3640    0.1754   0.5182

             Exact P-value: 0.0002 
        Asymptotic P-value: 0.0002 
------------------------------------------------------

13.5 Ebola Virus Study

The World Health Organization’s Ebola Response Team published an article1 in the October 16, 2014 issue of the New England Journal of Medicine, which contained some data I will use in this example, focusing on their Table 2.

Suppose we want to compare the proportion of deaths among cases that had a definitive outcome who were hospitalized to the proportion of deaths among cases that had a definitive outcome who were not hospitalized.

The article suggests that of the 1,737 cases with a definitive outcome, there were 1,153 hospitalized cases. Across those 1,153 hospitalized cases, 741 people (64.3%) died, which means that across the remaining 584 non-hospitalized cases, 488 people (83.6%) died.

Here is the initial contingency table, using only the numbers from the previous paragraph.

Initial Ebola Table Deceased Alive Total
Hospitalized 741 1153
Not Hospitalized 488 584
Total 1737

Now, we can use arithmetic to complete the table, since the rows and the columns are each mutually exclusive and collectively exhaustive.

Ebola 2x2 Table Deceased Alive Total
Hospitalized 741 412 1153
Not Hospitalized 488 96 584
Total 1229 508 1737

We want to compare the fatality risk (probability of being in the deceased column) for the population of people in the hospitalized row to the population of people in the not hospitalized row.

See sections 25.4 and 26.11 in the 2023 course notes.

twobytwo(741, 412, 488, 96, 
         "Hosp", "Not Hosp", "Dead", "Alive", 
         conf.level = 0.95)
2 by 2 table analysis: 
------------------------------------------------------ 
Outcome   : Dead 
Comparing : Hosp vs. Not Hosp 

         Dead Alive    P(Dead) 95% conf. interval
Hosp      741   412     0.6427    0.6146   0.6698
Not Hosp  488    96     0.8356    0.8033   0.8635

                                    95% conf. interval
             Relative Risk:  0.7691    0.7271   0.8135
         Sample Odds Ratio:  0.3538    0.2756   0.4542
Conditional MLE Odds Ratio:  0.3540    0.2726   0.4566
    Probability difference: -0.1929   -0.2325  -0.1508

             Exact P-value: 0.0000 
        Asymptotic P-value: 0.0000 
------------------------------------------------------
twobytwo(412, 741, 96, 488, 
         "Hosp", "Not Hosp", "Alive", "Dead", 
         conf.level = 0.95)
2 by 2 table analysis: 
------------------------------------------------------ 
Outcome   : Alive 
Comparing : Hosp vs. Not Hosp 

         Alive Dead    P(Alive) 95% conf. interval
Hosp       412  741      0.3573    0.3302   0.3854
Not Hosp    96  488      0.1644    0.1365   0.1967

                                   95% conf. interval
             Relative Risk: 2.1737    1.7823   2.6512
         Sample Odds Ratio: 2.8264    2.2016   3.6284
Conditional MLE Odds Ratio: 2.8248    2.1900   3.6678
    Probability difference: 0.1929    0.1508   0.2325

             Exact P-value: 0.0000 
        Asymptotic P-value: 0.0000 
------------------------------------------------------

13.6 For More Information


  1. WHO Ebola Response Team (2014) Ebola virus disease in West Africa: The first 9 months of the epidemic and forward projections. New Engl J Med 371: 1481-1495 doi: 10.1056/NEJMoa1411100↩︎