One alternative would be to use a similar approach but to replace the calculation of the correlation matrix with something more suitable, and then to project the matrix to lower dimensions. This idea has lead to one of the most productive and widely-used methods in the history of multivariate analysis in ecology --- canonical correspondence analysis or CCA. Just as RDA relates to PCA, CCA relates to CA. That is, (1) start with a Chi-square vegetation matrix [(actual - predicted)/sqrt(predicted)], (2) regress the differences from expectation on environmental variables to get fitted values, using a weighted regression where total abundance by plots is used as the weights, and (3) calculate the Euclidean distance of the fitted vegetation matrix and project by eigen-analysis. The importance of specific environmental variables is then assessed by their correlation to the projected scatter diagram.

Just as CA, there are several algorithms available to calculate CCA. The approach outlined above follows the Legendre and Legendre (1988) approach. Ter Braak (19xx) describes an algorithm based on reciprocal averaging that is employed by the popular program CANOCO. The result is the same either way.

In addition, there is also more than one S-Plus/R algorithm to compute CCA. Stephane Dray contributed CAIV, while Jari Oksanen contributed a cca() function as part of his vegan package (version 1-3.9 or later). The two differ slightly in the conventions for scaling the results. Because the vegan cca() function returns results identical to CANOCO, and because we already load the vegan library, we will use the vegan cca() function. However, to keep the plots produced by cca() more comparable to those we have produced from other programs, we will replace the plotting routines supplied with the vegan cca() function with others.

To calculate a CCA, select those environmental variables you have reason to
believe are important, and enter them into the `cca()` function in
formula notation, just like we did for GLMs and GAMS. The full taxon matrix goes on the
left-hand side of the equation, with selected environmental variables on the right.

attach(site) cca.1 <- cca(bryceveg~elev+slope+av) cca.1

Call: cca(formula = bryceveg ~ elev + slope + av, data = env) Inertia Rank Total 10.8656 Constrained 0.6975 3 Unconstrained 10.1680 147 Inertia is mean squared contingency coefficient Eigenvalues for constrained axes: CCA1 CCA2 CCA3 0.51942 0.11098 0.06715 Eigenvalues for unconstrained axes: CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 0.6912 0.5571 0.5278 0.4311 0.3752 0.3487 0.2900 0.2605 (Showed only 8 of all 147 unconstrained eigenvalues)

In this particular example, the CCA was not very successful. Only 0.6975/10.8656 or 0.064 of the total variability was captured in the CCA. Clearly, the weighted regression step was not very successful at capturing the variability in vegetation composition, but after glm() and gam() we should not be too surprised.

The next set of lines gives the eigenvalues associated with the projection. The top line gives the "constrained" eigenvalues. Because we only had three variables in our environmental dataframe we can only have three constrained eigenvalues. The three values sum to 0.69755. so

0.51942/0.69755

[1] 0.7446348

0.11098/0.69755

[1] 0.1590997

0.06715/0.69755

[1] 0.0962655

shows that the first accounts for approximately 74% of the
constrained variability, with the second at 16% and the third at 10%. Remember,
this is a fraction of the constrained variability, which is a small fraction of
the total variability. The first unconstrained eigenvalue is almost as large as
the first three constrained values combined!

cca1.plot <- plot(cca.1,choices=c(1,2))

Notice that we put a new variable on the left hand side of an assignment operator.
Package vegan uses a different convention than labdsv, where the

As for CA, the species are shown as red crosses and samples as black circles. In this analysis, the first axis is associated with decreasing slope and increasing aspect value (av), while the second is associated with increasing elevation.

As you can see, the species are pretty well condensed in the center of the ordination. To get a better look, we can specify "scaling=1" to mean "samples as weighted averages of the species."

cca2.plot <- plot(cca.1,scaling=1)

Package vegan supplies a number of graphing functions for ordiplots, including `points()`
and `identify`. We can use the `identify` function to identify specific
samples or species.
Depending on whether you want a clearer picture of samples of
species, you can plot using the appropriate scaling, and then use the identification
functions with the same scaling.

cca1.plot <- plot(cca.1) identify(cca1.plot,what='sites') points(cca1.plot,what='sites',bryceveg$arcpat>0,col=3)

cca2.plot <- plot(cca.2caling=1) identify(cca2.plot,what='species')

cca.3 <- cca(bryceveg~elev+slope+av+pos) cca.3

Call: cca(formula = veg ~ elev + slope + av + pos) Inertia Rank Total 10.866 Constrained 1.199 7 Unconstrained 9.666 147 Inertia is mean squared contingency coefficient Eigenvalues for constrained axes: CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CCA7 0.52682 0.27050 0.12919 0.08585 0.07230 0.06379 0.05095 Eigenvalues for unconstrained axes: CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 0.6105 0.5365 0.4349 0.4074 0.3525 0.3274 0.2727 0.2598 (Showed only 8 of all 147 unconstrained eigenvalues) plot(cca.3)

Notice how different this plot is from the first. While the total variability explained did not increase very much (and it can't go down with an increase of degrees of freedom), regressing the vegetation against topographic position in addition to the other variables results in a quite different perspective on the variability. Each possible topographic position is plotted at the centroid of the samples on that type, shown as an "X". To find out which one is which, look at last element of the summary of the cca object.

summary(cca.3)

. . . . . . . . . Centroids for factor constraints CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 posbottom -0.27212 -2.2070 -0.2082 -0.6002 0.7309 0.08555 poslow_slope -0.08944 -0.3111 0.7894 0.7438 -1.4639 -0.11783 posmid_slope 0.13114 0.5434 -0.1861 0.7238 0.7434 0.33647 posridge 0.45090 0.6428 0.6320 -0.6593 0.8699 -0.66962 posup_slope -0.19154 0.7557 -0.8329 -1.6023 -0.7748 -0.23554

"The functions find statistics that resemble ‘deviance’ and ‘AIC’ in constrained ordination. Actually, constrained ordination methods do not have a log-Likelihood, which means that they cannot have AIC and deviance. Therefore you should not use these functions, and if you use them, you should not trust them. If you use these functions, it remains as your responsibility to check the adequacy of the result."

The function below does not make use of log-likelihood directly, but rather employs a rather brutish permutation approach and tests whether adding a variable explains more inertia than expected at random. Nonetheless, I'm sure Jari disapproves and I include it here for whatever good it might serve.

The argument
`start` must be either a single continuous variable or a data.frame. the argument
`add`must be a data.frame of variables to consider adding to the model. Variable must
be continuous variables (i.e. not factors).

step.cca(bryceveg,start=elev,add=data.frame(slope,av))

Baseline = 0.5050382 variable delta_eig p_val 1 slope 0.11530671 0.11 2 av 0.07753032 0.13

step.cca <- function (taxa,start,add,numitr=99) { perm.cca <- function(taxa,start,add,numitr) { rndeig <- rep(0,numitr) if (!is.null(start)) { for (i in 1:numitr) { tmp <- data.frame(start,sample(add,replace=FALSE)) tmp2 <- cca(taxa,tmp) rndeig[i] <- sum(tmp2$CCA$eig) } } else { for (i in 1:numitr) { tmp <- data.frame(sample(add,replace=FALSE)) tmp2 <- cca(taxa,tmp) rndeig[i] <- sum(tmp2$CCA$eig) } } return(rndeig) } res <- data.frame(names(add),rep(NA,ncol(add)),rep(NA,ncol(add))) names(res) <- c('variable','delta_eig','p_val') for (i in 1:ncol(add)) { if (!any(is.na(add[,i]))) { if (!is.null(start)) tmp <- data.frame(start,add[,i]) else tmp <- add[,i] tmp <- cca(taxa,tmp) res[i,2] <- sum(tmp$CCA$eig) } rndeig <- perm.cca(taxa,start,add[,i],numitr) if (!is.null(start)) { full <- data.frame(start,add[,i]) base <- cca(taxa,start) basval <- sum(base$CCA$eig) } else { full <- add[,i] basval <- 0 } res[i,3] <- (sum(rndeig>=res[i,2])+1)/(numitr+1) res[i,2] <- res[i,2] - basval } cat(paste("Baseline = ", format(basval, 3), "\n")) print(res) } step.cap <- function (diss,start,add,numitr=99) { perm.capscale <- function(diss,start,add,numitr) { rndeig <- rep(0,numitr) if (!is.null(start)) { for (i in 1:numitr) { tmp <- data.frame(start,sample(add,replace=FALSE)) tmp2 <- capscale(diss~.,data=tmp) rndeig[i] <- sum(tmp2$CCA$eig) } } else { for (i in 1:numitr) { tmp <- data.frame(sample(add,replace=FALSE)) tmp2 <- capscale(diss~.,data=tmp) rndeig[i] <- sum(tmp2$CCA$eig) } } return(rndeig) } res <- data.frame(names(add),rep(NA,ncol(add)),rep(NA,ncol(add))) names(res) <- c('variable','delta_eig','p_val') for (i in 1:ncol(add)) { if (!any(is.na(add[,i]))) { if (!is.null(start)) tmp <- data.frame(start,add[,i]) else tmp <- data.frame(add[,i]) tmp <- capscale(diss~.,data=tmp) res[i,2] <- sum(tmp$CCA$eig) } rndeig <- perm.capscale(diss,start,add[,i],numitr) if (!is.null(start)) { full <- data.frame(start,add[,i]) start <- data.frame(start) base <- capscale(diss~.,data=start) basval <- sum(base$CCA$eig) } else { full <- add[,i] basval <- 0 } res[i,3] <- (sum(rndeig>=res[i,2])+1)/(numitr+1) res[i,2] <- res[i,2] - basval } cat(paste("Baseline = ", format(basval, 3), "\n")) print(res) }