Chapter 32 Larger Contingency Tables - Testing for Independence

What will we do with tables describing data from more than two categories at a time, returning to the notion of independent (rather than paired or matched) samples? The chi-square tests we have already seen in our twobytwo table output will extend nicely to this scenario, especially the Pearson \(\chi^2\) (asymptotic) test.

32.1 A 2x3 Table: Comparing Response to Active vs. Placebo

The table below, containing 2 rows and 3 columns of data (ignoring the marginal totals) specifies the number of patients who show complete, partial, or no response after treatment with either active medication or a placebo.

Group None Partial Complete
Active 16 26 29
Placebo 24 26 18

Is there a statistically significant association here? That is to say, is there a statistically significant difference between the treatment groups in the distribution of responses?

32.1.1 Getting the Table into R

To answer this, we’ll have to get the data from this contingency table into a matrix in R. Here’s one approach…

        None Partial Complete
Active    16      26       29
Placebo   24      26       18

32.1.2 Manipulating the Table’s presentation

We can add margins to the matrix to get a table including row and column totals.

        None Partial Complete Sum
Active    16      26       29  71
Placebo   24      26       18  68
Sum       40      52       47 139

Instead of the counts, we can tabulate the proportion of all patients within each cell.

         None Partial Complete
Active  0.115   0.187    0.209
Placebo 0.173   0.187    0.129

Or, we can tabulate the probabilities, rather than the proportions, after some rounding.

        None Partial Complete
Active  11.5    18.7     20.9
Placebo 17.3    18.7     12.9

R can also plot the percentages conditional on the rows…

        None Partial Complete
Active  22.5    36.6     40.8
Placebo 35.3    38.2     26.5

The 40.8 in Active/Complete means that of the Active medication patients, 40.8% had a complete response.

R can also plot the percentages conditional on the columns…

        None Partial Complete
Active    40      50     61.7
Placebo   60      50     38.3

If we add the row of column totals for these percentages as shown below, it clarifies that the 61.7 in Active/Complete here means that of the patients with a Complete response, 61.7% were on the active medication.

        None Partial Complete
Active    40      50     61.7
Placebo   60      50     38.3
Sum      100     100    100.0

32.2 Getting the Chi-Square Test Results

Now, to actually obtain a p value and perform the significance test with H0: rows and columns are independent vs. HA: rows and columns are associated, we simply run a Pearson chi-square test on T1 …


    Pearson's Chi-squared test

data:  T1
X-squared = 4, df = 2, p-value = 0.1

Thanks to a p-value of about 0.13 (using the Pearson \(\chi^2\) test) our conclusion would be to retain the null hypothesis of independence in this setting.

We could have run a Fisher’s exact test, too, if we needed it.


    Fisher's Exact Test for Count Data

data:  T1
p-value = 0.1
alternative hypothesis: two.sided

The Fisher exact test p value is also 0.13. Either way, there is insufficient evidence to conclude that there is a (true) difference in the distributions of responses.

32.3 Getting a 2x3 Table into R using a .csv file

Suppose we have a table like this available, and want to compute the Pearson and Fisher \(\chi^2\) test results in R, without having to set up the whole matrix piece discussed above?

Group None Partial Complete
Active 16 26 29
Placebo 24 26 18

We could do so by building a .csv file (a spreadsheet) containing the table above. In particular, I built a file called active2x3.csv, available on the course website. It is simply the table below.

Group   None  Partial   Complete
Active    16         26       29
Placebo   24         26       18

When we pull this .csv file into R, it emerges as a data frame and deliberately NOT as a tibble.

    Group None Partial Complete
1  Active   16      26       29
2 Placebo   24      26       18

32.4 Turning the Data Frame into a Table That R Recognizes

We need to turn this data frame into something that R can recognize as a contingency table. The steps to do this are:

  1. Establish row names from the Group variable.
  2. Delete the Group variable, retaining only the variables containing counts.
  3. Convert the data frame into a matrix.

Specifically in this case, the steps are:

        None Partial Complete
Active    16      26       29
Placebo   24      26       18

And this resulting tab2x3 object can be treated as the matrix was previously, using addmargins or chisq.test or prop.table or whatever. For instance,


    Fisher's Exact Test for Count Data

data:  tab2x3
p-value = 0.1
alternative hypothesis: two.sided

32.5 Collapsing Levels / Categories in a Factor

Returning to the survey1 data, let’s build a table of the relationship between response to the sex and favorite color questions.

   
    aqua aquamarine black blue brown chartreuse dark blue depends gray
  f    2          0     1   21     0          0         2       0    0
  m    0          1     2   41     3          1         0       1    1
   
    green grey light blue navy none orange pink purple red royal blue
  f    11    1          1    0    1      0    4     10   5          1
  m    12    2          0    2    0      3    1      1   8          0
   
    sapphire silver teal violet white yellow
  f        1      1    1      1     3      3
  m        0      0    0      0     2      1
Warning in chisq.test(table(survey1$sex, survey1$color)): Chi-squared
approximation may be incorrect

    Pearson's Chi-squared test

data:  table(survey1$sex, survey1$color)
X-squared = 40, df = 20, p-value = 0.02

Note the warning message. With all those small cell counts, and particular, so many zeros, this might not be so useful.

We might reasonably consider collapsing the colors data into four new categories:

  • Blue – in which I will include the old aqua, aquamarine, blue, dark blue and sapphire;
  • Green, which will include the old chartreuse and green,
  • Red - which will include the old orange, pink and red, and
  • Other – which will include all other colors, including purple and violet

One way to do this sort of work uses the plyr library in R, and the revalue function in particular. I learned about this in Chapter 15 of Chung’s R Graphics Cookbook, which is invaluable.

 [1] "aqua"       "aquamarine" "black"      "blue"       "brown"     
 [6] "chartreuse" "dark blue"  "depends"    "gray"       "green"     
[11] "grey"       "light blue" "navy"       "none"       "orange"    
[16] "pink"       "purple"     "red"        "royal blue" "sapphire"  
[21] "silver"     "teal"       "violet"     "white"      "yellow"    

So, what I’ve done here is create a new color variable that assigns the original colors into the four categories: Blue, Green, Red and Other that I defined above. Let’s run a sanity check on our recoding, then look at the relationship between sex and this new four-category variable, in a 2x4 table…

            
             Blue Other Green Red
  aqua          2     0     0   0
  aquamarine    1     0     0   0
  black         0     3     0   0
  blue         62     0     0   0
  brown         0     3     0   0
  chartreuse    0     0     1   0
  dark blue     2     0     0   0
  depends       0     1     0   0
  gray          0     1     0   0
  green         0     0    23   0
  grey          0     3     0   0
  light blue    1     0     0   0
  navy          2     0     0   0
  none          0     1     0   0
  orange        0     0     0   3
  pink          0     0     0   5
  purple        0    11     0   0
  red           0     0     0  13
  royal blue    1     0     0   0
  sapphire      1     0     0   0
  silver        0     1     0   0
  teal          1     0     0   0
  violet        0     1     0   0
  white         0     5     0   0
  yellow        0     4     0   0
   
    Blue Other Green Red
  f   29    21    11   9
  m   44    13    13  12

To neaten this output up, we might want to have Other show up last in the color.new group.

   
    Blue Green Red Other
  f   29    11   9    21
  m   44    13  12    13

Now, we run the new Pearson \(\chi^2\) test, and conclude that there is no evidence of statistically significant association (at the 5% significance level) between sex and these collapsed favorite color categories.


    Pearson's Chi-squared test

data:  table(survey1$sex, survey1$color.new2)
X-squared = 5, df = 3, p-value = 0.2

32.6 Accuracy of Death Certificates (A 6x3 Table)

The table below compiles data from six studies designed to investigate the accuracy of death certificates. 5373 autopsies were compared to the causes of death listed on the certificates. Of those, 3726 were confirmed to be accurate, 783 either lacked information or contained inaccuracies but did not require recoding of the underlying cause of death, and 864 were incorrect and required recoding. Do the results across studies appear consistent?

Date of Study [Confirmed] Accurate [Inaccurate] No Change [Incorrect] Recoding Total
1955-1965 2040 367 327 2734
1970 149 60 48 257
1970-1971 288 25 70 383
1975-1977 703 197 252 1152
1977-1978 425 62 88 575
1980 121 72 79 272
Total 3726 783 864 5373

32.7 The Pearson Chi-Square Test of Independence

We can assess the homogeneity of the confirmation results (columns) we observe in the table using a Pearson chi-squared test of independence.

  • The null hypothesis is that the rows and columns are independent.
  • The alternative hypothesis is that there is an association between the rows and the columns.
        Confirmed Inaccurate Incorrect  Sum
1955-65      2040        367       327 2734
1970          149         60        48  257
1970-71       288         25        70  383
1975-77       703        197       252 1152
1977-78       425         62        88  575
1980          121         72        79  272
Sum          3726        783       864 5373

To see the potential heterogeneity across rows in these data, we should perhaps also look at the proportions of autopsies in each of the three accuracy categories for each study.

        Confirmed Inaccurate Incorrect Sum
1955-65      74.6       13.4      12.0 100
1970         58.0       23.3      18.7 100
1970-71      75.2        6.5      18.3 100
1975-77      61.0       17.1      21.9 100
1977-78      73.9       10.8      15.3 100
1980         44.5       26.5      29.0 100

In three of the studies, approximately 3/4 of the results were confirmed. In the other three, 45%, 58% and 61% were confirmed. It looks like there’s a fair amount of variation in results across studies. To see if this is true, formally, we run Pearson’s chi-square test of independence, where the null hypothesis is that the rows and columns are independent, and the alternative hypothesis is that there is an association between the rows and the columns.


    Pearson's Chi-squared test

data:  z
X-squared = 200, df = 10, p-value <2e-16

The chi-square test statistic is 200 on 10 degrees of freedom, yielding p < 0.0001.

Autopsies are not performed at random; in fact, many are done because the cause of death listed on the certificate is uncertain. What problems may arise if you attempt to use the results of these studies to make inference about the population as a whole?