Chapter 48 Building Predictions from our models

The predict function, when applied to a linear regression model, produces the fitted values, just as the fitted function did, and, as we’ve seen, it can be used to generate prediction intervals for a single new observation, or confidence intervals for a group of new observations with the same predictor values.

48.1 Predictions for a “typical” island

Let us, just for a moment, consider a “typical” island, exemplified by the median value of all the predictors. There’s a trick to creating this and dumping it in a vector I will call x.medians.

x <- model.matrix(model1)
x.medians <- apply(x, 2, function(x) median(x))
x.medians
(Intercept)        area   elevation     nearest       scruz    adjacent 
       1.00        2.59      192.00        3.05       46.65        2.59 

We want to use the model to predict our outcome (species) on the basis of the inputs above: a new island with values of all predictors equal to the median of the existing islands. As before, building an interval forecast around a fitted value requires us to decide whether we are:

  • predicting the number of species for one particular island with the specified characteristics (in which case we use something called a prediction interval) or
  • predicting the mean number of species across all islands that have the specified characteristics (in which case we use the confidence interval).
newdata <- data.frame(t(x.medians))
predict(model1, newdata, interval="prediction", level = 0.95)
  fit   lwr upr
1  57 -72.1 186
predict(model1, newdata, interval="confidence", level = 0.95)
  fit  lwr  upr
1  57 28.5 85.4

48.1.1 Questions about the Prediction and Confidence Interval Methods

  1. What is the 95% prediction interval for this new observation? Does that make sense?
  2. Which interval (prediction or confidence) is wider? Does that make sense?
  3. Is there an island that has characteristics that match our new medians variable?
  4. What happens if we don’t specify new data in making a prediction?

48.2 Making a Prediction with New Data

  1. How does the output below help us to make a prediction with a new data point, or series of them? Interpret the resulting intervals.
newdata2 <- data.frame(area = 2, elevation = 100, nearest = 3, 
                       scruz = 5, adjacent = 1)
predict(model1, newdata2, interval="prediction", level = 0.95)
   fit   lwr upr
1 37.7 -92.5 168
predict(model1, newdata2, interval="confidence", level = 0.95)
   fit  lwr upr
1 37.7 4.39  71