Chapter 48 Building Predictions from our models
The predict
function, when applied to a linear regression model, produces the fitted values, just as the fitted
function did, and, as we’ve seen, it can be used to generate prediction intervals for a single new observation, or confidence intervals for a group of new observations with the same predictor values.
48.1 Predictions for a “typical” island
Let us, just for a moment, consider a “typical” island, exemplified by the median value of all the predictors. There’s a trick to creating this and dumping it in a vector I will call x.medians
x <- model.matrix(model1)
x.medians <- apply(x, 2, function(x) median(x))
(Intercept) area elevation nearest scruz adjacent
1.00 2.59 192.00 3.05 46.65 2.59
We want to use the model to predict our outcome (species) on the basis of the inputs above: a new island with values of all predictors equal to the median of the existing islands. As before, building an interval forecast around a fitted value requires us to decide whether we are:
- predicting the number of species for one particular island with the specified characteristics (in which case we use something called a prediction interval) or
- predicting the mean number of species across all islands that have the specified characteristics (in which case we use the confidence interval).
newdata <- data.frame(t(x.medians))
predict(model1, newdata, interval="prediction", level = 0.95)
fit lwr upr
1 57 -72.1 186
predict(model1, newdata, interval="confidence", level = 0.95)
fit lwr upr
1 57 28.5 85.4
48.1.1 Questions about the Prediction and Confidence Interval Methods
- What is the 95% prediction interval for this new observation? Does that make sense?
- Which interval (prediction or confidence) is wider? Does that make sense?
- Is there an island that has characteristics that match our new medians variable?
- What happens if we don’t specify new data in making a prediction?
48.2 Making a Prediction with New Data
- How does the output below help us to make a prediction with a new data point, or series of them? Interpret the resulting intervals.
newdata2 <- data.frame(area = 2, elevation = 100, nearest = 3,
scruz = 5, adjacent = 1)
predict(model1, newdata2, interval="prediction", level = 0.95)
fit lwr upr
1 37.7 -92.5 168
predict(model1, newdata2, interval="confidence", level = 0.95)
fit lwr upr
1 37.7 4.39 71