Lab 12 — Canonical Correspondence Analysis

In the previous labs we have been following a general procedure of: (1) calculating a dissimilarity/distance matrix, (2) projecting or mapping the matrix to lower dimensions, and (3) fitting surfaces to the scatter diagram. The exception to this approach was RDA, where we used linear regression of individual species abundances on environmental variables to create a new "fitted vegetation matrix" which we then subjected to PCA. In general, approaches which use environmental variables in the construction of the ordination are called "constrained ordination" because the positions of the samples in the ordination are constrained by the environmental variables, rather than having the environmental variables enter in post-hoc analyses. While RDA did not work very well on our data, one significant reason is the use of correlation or variance/covariance matrices in PCA, which we have observed to be quite problematic so far.

One alternative would be to use a similar approach but to replace the calculation of the correlation matrix with something more suitable, and then to project the matrix to lower dimensions. This idea has lead to one of the most productive and widely-used methods in the history of multivariate analysis in ecology --- canonical correspondence analysis or CCA. Just as RDA relates to PCA, CCA relates to CA. That is, (1) start with a Chi-square vegetation matrix [(actual - predicted)/sqrt(predicted)], (2) regress the differences from expectation on environmental variables to get fitted values, using a weighted regression where total abundance by plots is used as the weights, and (3) calculate the Euclidean distance of the fitted vegetation matrix and project by eigen-analysis. The importance of specific environmental variables is then assessed by their correlation to the projected scatter diagram.

Just as CA, there are several algorithms available to calculate CCA. The approach outlined above follows the Legendre and Legendre (1988) approach. Ter Braak (19xx) describes an algorithm based on reciprocal averaging that is employed by the popular program CANOCO. The result is the same either way.

In addition, there is also more than one S-Plus/R algorithm to compute CCA. Stephane Dray contributed CAIV, while Jari Oksanen contributed a cca() function as part of his vegan package (version 1-3.9 or later). The two differ slightly in the conventions for scaling the results. Because the vegan cca() function returns results identical to CANOCO, and because we already load the vegan library, we will use the vegan cca() function. However, to keep the plots produced by cca() more comparable to those we have produced from other programs, we will replace the plotting routines supplied with the vegan cca() function with others.

Running cca()

To calculate a CCA, select those environmental variables you have reason to believe are important, and enter them into the cca() function in formula notation, just like we did for GLMs and GAMS. The full taxon matrix goes on the left-hand side of the equation, with selected environmental variables on the right.

attach(site)
cca.1 <- cca(bryceveg~elev+slope+av)
cca.1

Call:
cca(formula = bryceveg ~ elev + slope + av, data = env)

Inertia Rank
Total         10.8656
Constrained    0.6975    3
Unconstrained 10.1680  147
Inertia is mean squared contingency coefficient

Eigenvalues for constrained axes:
CCA1    CCA2    CCA3
0.51942 0.11098 0.06715

Eigenvalues for unconstrained axes:
CA1    CA2    CA3    CA4    CA5    CA6    CA7    CA8
0.6912 0.5571 0.5278 0.4311 0.3752 0.3487 0.2900 0.2605
(Showed only 8 of all 147 unconstrained eigenvalues)

As you can see from the above, vegan has supplied a nice default print() function for cca. The first items after the "Call:" section give the "mean squared contingency coefficients" of the analysis. These play the same role in CCA that total variance plays in RDA. Remember, the veg matrix is converted to Chi-square, so the entries are contingency coefficients. The top row, or "Mean squared contingency coefficient" is the total variability before the matrix is subjected to weighted regression. This is the variability that COULD BE explained. The second line ("Constrained:") is the variability in the vegetation matrix after weighted regression; this is the variability that will be explained by the axes in the CCA. The last line ("Unconstrained:") is the variance of the residuals of the regression, which can be subjected to CA.

In this particular example, the CCA was not very successful. Only 0.6975/10.8656 or 0.064 of the total variability was captured in the CCA. Clearly, the weighted regression step was not very successful at capturing the variability in vegetation composition, but after glm() and gam() we should not be too surprised.

The next set of lines gives the eigenvalues associated with the projection. The top line gives the "constrained" eigenvalues. Because we only had three variables in our environmental dataframe we can only have three constrained eigenvalues. The three values sum to 0.69755. so

0.51942/0.69755
[1] 0.7446348
0.11098/0.69755
[1] 0.1590997
0.06715/0.69755
[1] 0.0962655
shows that the first accounts for approximately 74% of the constrained variability, with the second at 16% and the third at 10%. Remember, this is a fraction of the constrained variability, which is a small fraction of the total variability. The first unconstrained eigenvalue is almost as large as the first three constrained values combined!

Plotting the CCA

To plot your results, use the plot() function as follows:
cca1.plot <- plot(cca.1,choices=c(1,2))
Notice that we put a new variable on the left hand side of an assignment operator. Package vegan uses a different convention than labdsv, where the plot() function not only produces the graphical plot, but also stores information about the scaling and axes in a new object, called an "ordiplot", for future use. The "choices=c(1,2)" following the cca.1 are the axes to plot. Since 1 and 2 are the defaults, they could be omitted.

As for CA, the species are shown as red crosses and samples as black circles. In this analysis, the first axis is associated with decreasing slope and increasing aspect value (av), while the second is associated with increasing elevation.

As you can see, the species are pretty well condensed in the center of the ordination. To get a better look, we can specify "scaling=1" to mean "samples as weighted averages of the species."

cca2.plot <- plot(cca.1,scaling=1)

Package vegan supplies a number of graphing functions for ordiplots, including points() and identify. We can use the identify function to identify specific samples or species. Depending on whether you want a clearer picture of samples of species, you can plot using the appropriate scaling, and then use the identification functions with the same scaling.

cca1.plot <- plot(cca.1)
identify(cca1.plot,what='sites')
}
else {