Generalized Additive Models (GAMs) are designed to capitalize on the strengths
of GLMs (ability to fit logistic and poisson regressions) without requiring the
problematic steps of *a priori* estimation of response curve shape or a
specific parametric response function. They employ a class of equations called
"smoothers" or "scatterplot smoothers" that attempt to generalize data into
smooth curves by local fitting to subsections of the data. The design and
development of smoothers is a very active area of research in statistics, and a
broad array of such functions has been developed. The simplest example that is
likely to be familiar to most ecologists is the running average, where you
calculate the average value of data in a "window" along some gradient. While
the running average is an example of a smoother, it's rather primitive and much
more efficient smoothers have been developed.

The idea behind GAMs is to "plot" (conceptually, not literally) the value of the dependent variable along a single independent variable, and then to calculate a smooth curve that goes through the data as well as possible, while being parsimonious. The trick is in the parsimony. It would be possible using a polynomial of high enough order to get a curve that went through every point. It is likely, however, that the curve would "wiggle" excessively, and not represent a parsimonious fit. The approach generally employed with GAMs is to divide the data into some number of sections, using "knots" at the ends of the sections. Then a low order polynomial or spline function is fit to the data in the section, with the added constraint that the derivative of the function at the knots must be the same for both sections sharing that knot. This eliminates kinks in the curve, and ensures that it is smooth and continuous at all points.

The problem with GAMs is that they are simultaneously very simple and extraordinarily complex. The idea is simple; let the data speak, and draw a simple smooth curve through the data. Most ecologists are quite capable of doing this by eye. The problem is determining goodness-of-fit and error terms for a curve fit by eye. GAMs make this unnecessary, and fit the curve algorithmically in a way that allows error terms to be estimated precisely. On the other hand, the algorithm that fits the curve is usually iterative and non-parametric, masking a great deal of complex numerical processing. It's much too complex to address here, but there are at least two significant approaches to solving the GAM parsimony problem, and it is an area of active research.

As a practical matter, we can view GAMs as non-parametric curve fitters that attempt to achieve an optimal compromise between goodness-of-fit and parsimony of the final curve. Similar to GLMs, on species data they operate on deviance, rather than variance, and attempt to achieve the minimal residual deviance on the fewest degrees of freedom. One of the interesting aspects of GAMs is that they can only approximate the appropriate number of degrees of freedom, and that the number of degrees of freedom is often not an integer, but rather a real number with some fractional component. This seems very odd at first, but is actually a fairly straight forward extension of the concepts you are already familiar with. A second order polynomial (or quadratic equation) in a GLM uses two degrees of freedom (plus one for the intercept). A curve that is slightly less regular than a quadratic might require two and a half degrees of freedom (plus one for the intercept), but might fit the data better.

The other aspect of GAMS that is different is that they don't handle interaction well. Rather than fit multiple variables simultaneously, the algorithm fits a smooth curve to each variable and then combines the results additively, thus giving rise to the name "Generalized Additive Models." The point is minimized here, somewhat, in that we never fit interaction terms in our GLMs in the previous lab. For example, it is possible that slope only matters at some range of elevations, giving rise to an interaction of slope and elevation. In practice, interaction terms can be significant, but often require fairly large amounts of data. The Bryce data set is relatively small, and tests of interaction were generally not significant.

result <- gam(y~s(x))

where
The Hastie and Tibshirani version has a slightly broader range of smoothers (including
`lo()` for loess), while Wood's function is restricted to `s()`.
Fortunately, both
versions have `predict()`, `fitted()`, and most importantly `plot()`
methods. To use the Wood function, download and install package `mgcv`;
to use the Hastie ans Tibshirani function, download and install package
`gam`. Unfortunately, because both functions are named `gam()` it
is awkward to have both packages loaded at the same time.

In my tests, the two implementations are similar but not identical. In addition, there are slight differences in the functions (e.g. default values for arguments). Accordingly, I will present the two implementations separately. You can look at which ever program you use, but I recommend reviewing both to see the effects of the difference in underlying code (Hastie and Tibshirani versus Wood).

To go to the Simon Wood's version, click here

bere1.gam <- gam(bere>0~s(elev),family=binomial)

Just as for the GLM, to get a logistic fit as appropriate for presence/absence
values we specify family=binomial. To see the result, use the
summary(bere1.gam)

Call: gam(formula = bere > 0 ~ s(elev), family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -2.2252 -0.6604 -0.4280 0.5318 1.9649 (Dispersion Parameter for binomial family taken to be 1) Null Deviance: 218.1935 on 159 degrees of freedom Residual Deviance: 133.4338 on 155 degrees of freedom AIC: 143.4339 Number of Local Scoring Iterations: 8 DF for Terms and Chi-squares for Nonparametric Effects Df Npar Df Npar Chisq P(Chi) (Intercept) 1 s(elev) 1 3 18.2882 0.0004

To really see what the results look like, use the `plot()` function.

plot(bere1.gam,se=TRUE)

The default plot shows several things. The solid line is the
predicted value of the
dependent variable as a function of the x axis.
The `se=TRUE` means to plot two times the standard errors of the
estimates (in dashed lines). The small lines along the x axis are the "rug",
showing the location of the sample plots. The y axis is
in the linear units, which in this case is logits, so that the values are
centered on 0 (50/50 odds), and extend
to both positive and negative values. To see the predicted values on the
probability scale we need to use
the back transformed values, which are available as fitted(bere1.gam). So,

plot(elev,fitted(bere1.gam))

Notice how the curve has multiple inflection points and modes, even though we did not specify a high order function. This is the beauty and bane of GAMs. In order to fit the data, the function fit a bimodal curve. It seems unlikely ecologically that a species would have a multi-modal response to a single variable. Rather, it would appear to suggest competitive displacement by some other species from 7300 to 8000 feet, or the effects of another variable that interacts strongly over that elevation range.

To compare to the GLM fit, we can superimpose the GLM predictions on the plot.

That's the predicted values from the first order logistic GLM in red, and the quadratic GLM in green. Notice how even though the GAM curve is quite flexible, it avoids the problematic upturn at low values shown by the quadratic GLM.

The GAM function achieves a residual deviance of 133.4 on 155 degrees of freedom. The following table gives the comparison to the GLMS.

model | residual deviance | residual degrees of
freedom | |

GAM | 133.43 | 155 | |

linear GLM | 151.95 | 158 | |

quadratic GLM | 146.63 | 157 |

How do we compare the fits? Well, since they're not nested, we can't properly use the anova method from last lab. However, since they're fit to the same data, we can at least heuristically compare the goodness-of-fits and residual degrees of freedom. As I mentioned previously, GAMs often employ fractional degrees of freedom.

anova(bere3.glm,bere1.gam,test='Chi')

Analysis of Deviance Table Model 1: berrep > 0 ~ elev + I(elev^2) Model 2: berrep > 0 ~ s(elev) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 157.0000 146.627 2 155.0000 133.434 2.0000 13.193 0.001 AIC(bere3.glm,bere1.gam) df AIC bere3.glm 3 152.6268 bere1.gam 2 143.4339

The GAM clearly achieves lower residual deviance, and at a cost of only two degrees of freedom compared to the quadratic GLM. The probability of achieving a reduction in deviance of 13.2 for 2 degrees of freedom in a NESTED model is about 0.001.

While this probability is not strictly correct for a non-nested test, I think it's safe to say that the GAM is a better fit. In addition, the AIC results are in agreement. One approach to a simple evaluation is to compare the residuals in a boxplot.

boxplot(bere1.glm$residuals,bere3.glm$residuals,bere1.gam$residuals, names=c("GLM 1","GLM 2","GAM"))

The plot shows that the primary difference among the models is in the number of extreme residuals; the mid-quartiles of all models are very similar.

bere2.gam <- gam(bere>0~s(elev)+s(slope),family=binomial)

The interpretation of the slope plot is that *Berberis* is most likely on
midslopes (10-50%), but distinctly less likely on flats or steep slopes.
Could a variable with that much error contribute significantly to the
fit?
Using the `anova()` function for GAMs, so we
can look directly at the influence of slope on the fit.

anova(bere2.gam,bere1.gam,test="Chi")

Analysis of Deviance Table Model 1: berrep > 0 ~ s(elev) + s(slope) Model 2: berrep > 0 ~ s(elev) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 151 113.479 2 155 133.434 -4 -19.954 0.001

plot(elev,fitted(bere2.gam))

Where before we had a smoothly varying line we now have a scatterplot with some significant departures from the line, especially at low elevations, due to slope effects. As for slope itself

plot(slope,fitted(bere2.gam))

The plot suggests a broad range of values for any specific slope, but there is a modal trend to the data.

Can a third variable help still more. Let's try aspect value (av).

bere3.gam <- gam(berrep>0~s(elev)+s(slope)+s(av),family=binomial) anova(bere3.gam,bere2.gam,test="Chi")

Analysis of Deviance Table Model 1: berrep > 0 ~ s(elev) + s(slope) + s(av) Model 2: berrep > 0 ~ s(elev) + s(slope) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 147.0001 100.756 2 151.0000 113.479 -3.9999 -12.723 0.013 AIC(bere3.gam,bere2.gam) df AIC bere3.gam 4 126.7560 bere2.gam 3 131.4795

Surprisingly, since aspect value is designed to linearize aspect, the
response curve is modal. *Berberis* appears to prefer warm aspects, but
not south slopes.
While we might expect some interaction here with elevation
(preferring different aspects at different elevations to compensate for
temperature), GAMs are not directly useful for interaction terms. Although
it's statistically significant, its effect size is small, as demonstrated in
the following plot.

Notice how almost all values are possible at most aspect values. Nonetheless, it is statistically significant.

bere1.gam <- gam(bere>0~s(elev),family=binomial) plot(bere1.gam)

Just as for the GLM, to get a logistic fit as appropriate for presence/absence
values we specify `family=binomial`.
The default plot shows several things. The solid line is the predicted value of
the dependent variable as a function of the x axis. The dotted lines are
plus-or-minus two standard errors, and the small lines along the x axis are the
"rug", showing the location of the sample plots. The y axis is in the linear
units, which in this case is logits, so that the values are centered on 0
(50/50 odds), and extend to both positive and negative values. To see the
predicted values on the probability scale we need to use the back transformed
values, which are available as `fitted(bere1.gam)`. So,

plot(elev,fit(bere1.gam))

Notice how the curve has multiple inflection points and modes, even though we did not specify a high order function. This is the beauty and bane of GAMs. In order to fit the data, the function fit a bimodal curve. It seems unlikely ecologically that a species would have a multi-modal response to a single variable. Rather, it would appear to suggest competitive displacement by some other species from 7200 to 8000 feet, or the effects of another variable that interacts strongly over that elevation range.

To compare to the GLM fit, we can superimpose the GLM predictions on the plot.

points(elev,fitted(bere1.glm),col=2) points(elev,fitted(bere3.glm),col=3)

That's the predicted values from the first order logistic GLM in red, and the quadratic GLM in green. Notice how even though the GAM curve is quite flexible, it avoids the problematic upturn at low values shown by the quadratic GLM.

How do we compare the fits? Well, since they're not nested, we can't properly
use the `anova` method from last lab. However, since they're fit to the
same data, we can at least heuristically compare the goodness-of-fits and
residual degrees of freedom. As I mentioned previously, GAMs often employ
fractional degrees of freedom. In this case, using the R `gam()`
function, the algorithm achieved a residual deviance of 130.83 and
used 4.95 degrees of freedom, stored in bere1.gam$deviance and bere1.gam$edf
respectively. The following table gives the respective values.

model | residual deviance | residual degrees of
freedom | |

GAM | 129.80 | 153.72 | |

linear GLM | 151.95 | 158 | |

quadratic GLM | 146.63 | 157 |

The GAM clearly achieves lower residual deviance, and at a cost of 3.28 degrees of freedom compared to the quadratic GLM. The probability of achieving a reduction in deviance of 16.83 for 3.28 degrees of freedom in a NESTED model is about 0.001

anova(bere3.glm,bere1.gam,test='Chi')

Analysis of Deviance Table Model 1: berrep > 0 ~ elev + I(elev^2) Model 2: berrep > 0 ~ s(elev) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 157.0000 146.627 2 153.7189 129.804 3.2811 16.823 0.001

boxplot(bere1.glm$residuals,bere3.glm$residuals,bere1.gam$residuals, names=c("GLM 1","GLM 2","GAM"))

The plot shows that the primary difference among the models is in the number of extreme residuals; the mid-quartiles of all models are very similar.

bere2.gam <- gam(bere>0~s(elev)+s(slope),family=binomial)

On the most recent version of Warning message: fitted probabilities numerically 0 or 1 occurred in: gam.fit2(x = args$X, y = args$y, sp = lsp, S = args$S, rS = args$rS,

plot(bere2.gam)

The default is to plot both smooths on the same scale. Accordingly, the result for elevation looks ridiculously flat compared to before (even though it's the same curve) because the standard errors on the smooth of slope are so large.

The interpretation of the slope plot seems to be that *Berberis* is
slightly more likely on moderate slopes (10-30%), but distinctly less likely on
steep slopes.
Could a variable with that much error contribute
significantly to the fit? Since these are nested models we can use
`anova()` to check.

anova(bere1.gam,bere2.gam,test='Chi')

Analysis of Deviance Table Model 1: berrep > 0 ~ s(elev) Model 2: berrep > 0 ~ s(elev) + s(slope) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 153.7189 129.804 2 149.4435 108.700 4.2753 21.105 0.0003968

plot(elev,fitted(bere2.gam))

Where before we had a smoothly varying line we now have a scatterplot with some significant departures from the line, especially at low elevations, due to slope effects. As for slope itself

plot(slope,fitted(bere2.gam))

The plot suggests that all possible values obtain at low slopes, where the
probabilities are dominated by elevation effects, but that
moderate slopes increase the likelihood of *Berberis*, followed by
decreases on steep slopes.

Can a third variable help still more. Let's try aspect value (av).

bere3.gam <- gam(berrep>0~s(elev)+s(slope)+s(av),family=binomial)

anova.gam(bere3.gam,bere2.gam,test='Chi') Analysis of Deviance Table Model 1: berrep > 0 ~ s(elev) + s(slope) + s(av) Model 2: berrep > 0 ~ s(elev) + s(slope) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 144.8843 95.407 2 149.4435 108.700 -4.5593 -13.293 0.015

plot(bere3.gam,scale=0)

Surprisingly, since aspect value is designed to linearize aspect, the response
curve is modal. *Berberis* appears to prefer east and west slopes to
north or south. While we might expect some interaction here with elevation
(preferring different aspects at different elevations to compensate for
temperature), GAMs are not directly useful for interaction terms. Although it's
statistically significant, its effect size is small, as demonstrated in the
following plot.

plot(av,fitted(bere3.gam))

Notice how almost all values are possible at most aspect values. Nonetheless, there is a small modal trend.

My own feeling is that given the ease of creating and presenting graphical output, the increased goodness-of-fit of GAMs and the insights available from analysis of their curvature over sub-regions of the data make GAMs a superior tool in the analysis of ecological data. Certainly, you can argue either way.

Like GLMs, they are suitable for fitting logistic and poisson regressions, which is of course very useful in ecological analysis.

Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428

Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114

Wood, S.N. (2004a) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:673-686

Wood, S.N. (2004b) On confidence intervals for GAMs based on penalized regression splines. Technical Report 04-12 Department of Statistics, University of Glasgow.

Wood, S.N. (2004c) Low rank scale invariant tensor product smooths for generalized additive mixed models. Technical Report 04-13 Department of Statistics, University of Glasgow.