19  Two Factor ANOVA

This is a DRAFT version of this Chapter.

This is a sketchy draft. I’ll remove this notice when I post a version of this Chapter that is essentially finished.

19.1 R setup for this chapter

Note

Appendix A lists all R packages used in this book, and also provides R session information. Appendix B describes the 431-Love.R script, and demonstrates its use.

19.2 Data are Rosner’s FEV Data

Note

Appendix C provides further guidance on pulling data from other systems into R, while Appendix D gives more information (including download links) for all data sets used in this book.

Rosner’s FEV data in https://hbiostat.org/data/ FEV (quant) as a function of age, height, sex, smoker should be useful for a two-way anova perhaps adjusting for two additional continuous covariates

fev_ros <- read_csv("data/fev_ros.csv", show_col_types = FALSE) |>
  mutate(across(where(is.character), as_factor)) |>
  mutate(id = as.character(id))

fev_ros
# A tibble: 654 × 6
   id      age   fev height sex    smoke             
   <chr> <dbl> <dbl>  <dbl> <fct>  <fct>             
 1 301       9  1.71   57   female non-current smoker
 2 451       8  1.72   67.5 female non-current smoker
 3 501       7  1.72   54.5 female non-current smoker
 4 642       9  1.56   53   male   non-current smoker
 5 901       9  1.90   57   male   non-current smoker
 6 1701      8  2.34   61   female non-current smoker
 7 1752      6  1.92   58   female non-current smoker
 8 1753      6  1.42   56   female non-current smoker
 9 1901      8  1.99   58.5 female non-current smoker
10 1951      9  1.94   60   female non-current smoker
# ℹ 644 more rows
n_miss(fev_ros)
[1] 0

19.3 Interaction Plot

19.3.1 Simple Plot of Means

fev_ros |> group_by(sex, smoke) |>
  summarise(mean_fev = mean(fev)) |>
  ggplot(aes(x = sex, y = mean_fev, group = smoke, col = smoke)) +
  geom_point() +
  geom_line() +
  scale_color_metro_d() +
  labs(x = "sex", y = "Mean of FEV")
`summarise()` has grouped output by 'sex'. You can override using the `.groups`
argument.

19.3.2 Labeled Interaction Plot

fev_ros |> group_by(sex, smoke) |>
  summarise(mean_fev = mean(fev)) |>
  ggplot(aes(x = smoke, y = mean_fev, group = sex, col = sex)) +
  geom_line() +
  geom_label(aes(label = round(mean_fev,2))) +
  scale_color_see_d() +
  labs(x = "Smoking Status", y = "Mean of FEV", 
       title = "Interaction of Sex and Smoking Status on Mean of FEV",
       subtitle = "Rosner data")
`summarise()` has grouped output by 'sex'. You can override using the `.groups`
argument.

19.4 ANOVA models

fit1 <- aov(fev ~ sex * smoke, data = fev_ros)

model_parameters(fit1, es_type = "eta", ci = 0.95)
Parameter | Sum_Squares |  df | Mean_Square |     F |      p | Eta2 (partial) |  Eta2 95% CI
--------------------------------------------------------------------------------------------
sex       |       21.32 |   1 |       21.32 | 31.98 | < .001 |           0.05 | [0.02, 1.00]
smoke     |       33.68 |   1 |       33.68 | 50.51 | < .001 |           0.07 | [0.04, 1.00]
sex:smoke |        2.51 |   1 |        2.51 |  3.77 | 0.053  |       5.76e-03 | [0.00, 1.00]
Residuals |      433.40 | 650 |        0.67 |       |        |                |             

Anova Table (Type 1 tests)
fit2 <- lm(fev ~ sex * smoke, data = fev_ros)

model_parameters(fit2, ci = 0.95)
Parameter                           | Coefficient |   SE |        95% CI | t(650) |      p
------------------------------------------------------------------------------------------
(Intercept)                         |        2.38 | 0.05 | [ 2.28, 2.48] |  48.67 | < .001
sex [male]                          |        0.36 | 0.07 | [ 0.22, 0.49] |   5.27 | < .001
smoke [current smoker]              |        0.59 | 0.14 | [ 0.31, 0.86] |   4.20 | < .001
sex [male] × smoke [current smoker] |        0.42 | 0.22 | [ 0.00, 0.85] |   1.94 | 0.053 

Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

19.4.1 Bootstrapping CIs for estimates

set.seed(431)
model_parameters(fit2, 
           ci = 0.95, ci_method = "quantile", 
           bootstrap = TRUE, iteractions = 1000)
Parameter                           | Coefficient |        95% CI |      p
--------------------------------------------------------------------------
(Intercept)                         |        2.38 | [ 2.31, 2.45] | < .001
sex [male]                          |        0.35 | [ 0.22, 0.49] | < .001
smoke [current smoker]              |        0.58 | [ 0.44, 0.73] | < .001
sex [male] × smoke [current smoker] |        0.43 | [-0.01, 0.81] | 0.052 

Uncertainty intervals (equal-tailed) are naıve bootstrap intervals.

19.5 ANOVA without interaction

fit3 <- aov(fev ~ sex + smoke, data = fev_ros)

model_parameters(fit3, es_type = "eta", ci = 0.95)
Parameter | Sum_Squares |  df | Mean_Square |     F |      p | Eta2 (partial) |  Eta2 95% CI
--------------------------------------------------------------------------------------------
sex       |       21.32 |   1 |       21.32 | 31.85 | < .001 |           0.05 | [0.02, 1.00]
smoke     |       33.68 |   1 |       33.68 | 50.30 | < .001 |           0.07 | [0.04, 1.00]
Residuals |      435.91 | 651 |        0.67 |       |        |                |             

Anova Table (Type 1 tests)
anova(fit3, fit1)
Analysis of Variance Table

Model 1: fev ~ sex + smoke
Model 2: fev ~ sex * smoke
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1    651 435.91                              
2    650 433.40  1    2.5127 3.7684 0.05266 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

19.6 Considering Model Diagnostics

check_model(fit1)

check_model(fit3)

19.7 Comparing Model Performance

compare_performance(fit1, fit3)
# Comparison of Model Performance Indices

Name | Model |  AIC (weights) | AICc (weights) |  BIC (weights) |    R2 | R2 (adj.) |  RMSE | Sigma
---------------------------------------------------------------------------------------------------
fit1 |   aov | 1596.9 (0.709) | 1597.0 (0.706) | 1619.3 (0.206) | 0.117 |     0.113 | 0.814 | 0.817
fit3 |   aov | 1598.7 (0.291) | 1598.7 (0.294) | 1616.6 (0.794) | 0.112 |     0.109 | 0.816 | 0.818
plot(compare_performance(fit1, fit3))

19.8 For More Information