PCA is an eigenvector technique that attempts to explain as much variance as possible on each of a series of orthogonal vectors spanning the vegetation space (see Kent and Coker Pp. 186- 203 for a gentle presentation; alternatively Legendre and Legendre 1998 Pp. 391-424 for a more thorough treatment). Eigenanalysis is based in linear or matrix algebra, and has a wide range of uses. Our particular interest is in finding an alternative description of our vegetation data in a low dimensional space. The first vector is chosen to account for as much variance in plot dispersion from the centroid as possible; the second is chosen by the same criterion but subject to the constraint of being orthogonal the first, and so on. The approach can be likened to regression by least squares, except that the residuals from the vector are measured perpendicular to the vector rather than perpendicular to the x or y axis.

In general, PCA is performed on a correlation or variance/covariance matrix, although equivalently, a matrix of sums-of-squares and-cross-products can be used. In R, however, the functions we will use convert the basic vegetation data into the appropriate forms, usually requiring only an argument to be specified in the function.

There are two ways to perform PCA in R: `princomp()` and
`prcomp()`. Essentially, they compute the same values (technically,
`princomp` computes an eigenanalysis and `prcomp` computes
a singular value decomposition). To make all of the ordination labs more similar,
the **LabDSV** library includes a function called `pca()` that I will
use in the following. `pca()` simply calls `prcomp`, but includes
different conventions for plotting and analysis that we will use with other
ordination techniques in later labs.

There are subtle differences in the Splus and R `princomp` function, but
they're essentially the same (at least for recent versions).
It is worth noting, however,
that the `princomp` function calculates the population
variance/covariance (or correlation), rather than the sample variance/covariance;
accordingly, values in the cross-products matrix are divided by N, rather than
N-1.

So, we first have to load the **LabDSV** library, if you haven't already.
If the libraries are installed properly

library(labdsv)

will do the trick.
(see R for Ecologists if you don't remember how to install and access libraries in S-Plus or R)

The `pca()` function accepts taxon matrices or dataframes, computing
**loadings** for each column and **scores** for each row in the matrix or dataframe.
The loadings are the contribution of the column vector to each of the
eigenvectors. A large positive component means that that column (species in our
case) is positively correlated with that eigenvector; a large negative values is
negative correlation; and small values mean that the species is unrelated to
that eigenvector. Often, we're more interested in the position of the sample
plots in the reduced-dimensional space; these positions are given by the
scores, or coordinates along the eigenvectors.

The `pca()` function allows you to specify a number of parameters
concerning the calculation, The first is whether you want to use a correlation
or covariance matrix. Again, this is a subject too detailed to expound on
adequately in HTML, but generally, PCA is sensitive to the scale of measurement
of the data. If all the data are not measured on the same scale, using
covariance means that the result will be determined mostly by the variable
with the largest values, as it will have the highest variance. Using a
correlation matrix treats all variables the same (standardized to mean=0 and
std. dev.=1). Even if all species were measured on the same scale (e.g. percent
cover), to prevent the dominant species from determining the results, you
probably want to use correlation. In `pca()`, this means specifying
`cor=TRUE` in the function call. In addition, by default `pca()` will generate
one eigenvector for every column (species), and scores for each row (plot) on
each of the eigenvectors. Generally, the vast majority of the variance is
described on the first few eigenvectors, and we can save space by only
calculating the scores for only the first few eigenvectors.
This can be specified by
including `dim=n` where `n` equals the number of dimensions you
want scores and loadings for. So, for example

pca.1 <- pca(veg,cor=TRUE,dim=10)

calculates a principal components analysis of the Bryce Canyon vegetation, using a correlation matrix and only calculating scores for the first 10 eigenvectors. The results of a PCA are complex, including:

- variance explained by eigenvector
- cumulative variance explained by eigenvector
- species loading by eigenvector
- plot scores by vector

summary(pca.1,dim=10)

The output looks like this:
Importance of components: numeric matrix: 3 rows, 169 columns. Comp. 1 Comp. 2 Comp. 3 Comp. 4 Comp. 5 Standard deviation 3.65275545 2.77224856 2.66016722 2.5459628 2.38617490 Proportion of Variance 0.07895043 0.04547552 0.04187272 0.0383546 0.03369131 Cumulative Proportion 0.07895043 0.12442594 0.16629866 0.2046533 0.23834456 Comp. 6 Comp. 7 Comp. 8 Comp. 9 Comp. 10 Standard deviation 2.31459860 2.1179390 2.06713660 1.90778099 1.85804579 Proportion of Variance 0.03170039 0.0265424 0.02528434 0.02153626 0.02042801 Cumulative Proportion 0.27004496 0.2965874 0.32187170 0.34340796 0.36383598

For reasons known only to R/S, the top line is the standard deviation
(rather than the variance) associated with each component. The next two lines are
the items of primary interest, however; proportion of total variance, and
cumulative proportion of variance. The `dim=10` means to list only the
first 10 dimensions; the default is to list ALL of them.
Notice that the variance explained is in order from highest to lowest by
eigenvector; this is by design. Notice also that the first ten eigenvectors
explain a total of 0.3638 of the total variance of the dataset.
The number of significant eigenvectors is at
most `min(#plots, #species)-1`. Most of the eigenvectors after 100 explain a
minute portion of the variance.

We can look at this same information graphically with the ` varplot()` function.

varplot.pca(pca.1)

Hit Return to Continue

The first panel shows the variance associated with each vector, and the second shows the cumulative variance as a fraction of the total.

loadings.pca(pca,dim=5)

A partial output looks like the following:

loadings.pca(pca) Comp. 1 Comp. 2 Comp. 3 Comp. 4 Comp. 5 junost ameuta arcpat 0.157 -0.103 arttri 0.219 -0.147 0.154 atrcan 0.203 -0.144 0.154 berfre ceamar 0.134 cerled cermon chrdep chrnau chrpar chrvis -0.200 eurlan juncom 0.100 0.132 pacmyr 0.145 0.111 0.154 . . . . . . . . . . . . . . . . . .

By default, `loadings.pca()` suppresses small values to emphasize the more important
ones. Accordingly, you can see that eigenvector 1 is positively correlated with
`arcpat, ceamar, juncom,` and `pacmyr`, while negatively
associated with `chrvis` (along with many more species not included in
this excerpt). The `dim=5` means only list the first five components;
the default is 5.

It is often the case that loadings or scores from two different computers or programs will come out with the same values but with opposite signs. The orientation of eigenvectors is arbitrary, and the sign is only meaningful with respect to other values on the same axis.

scores.pca(pca.1,dim=5)

Which looks like:

Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 1 -0.303522539 -0.58234332 0.4964863489 0.446266093 0.0005421279 2 0.106991234 -0.86395259 0.7710460128 1.070231898 -0.1001168298 3 -0.325220583 -0.63188537 0.5977936134 0.816265090 -0.0339421175 4 -0.301010236 -0.55085413 0.4413011878 0.356285781 0.0722309177 5 -3.051327186 0.58628755 -0.0316738283 0.669026942 0.0280048460 6 -0.459006189 -0.65657134 0.6296331674 0.562444071 -0.2394758056 7 -2.988245257 0.69125840 -0.2506843565 0.192719353 0.0176306898 8 -1.057166708 -0.14648668 0.1346286599 0.322338430 0.1677946043 9 0.591108689 -1.09106087 0.7925182118 0.648834725 -0.0341675553 10 -3.767674331 1.09240362 -0.7486658969 -0.148511254 -0.1280649255 . . . . . . . . . . . . . . . . . . 156 0.275149769 -0.65904715 0.4639857495 -1.284139318 -0.2769789761 157 0.755560214 -0.98124047 0.5450052352 -1.019513009 -0.2163151320 158 0.320133102 -0.74440896 0.3037215946 -0.663608326 0.0574919737 159 0.818248378 -0.96619683 0.4276821265 -1.105798668 -0.1866257489 160 0.389202051 -0.69695648 0.2635548583 -0.625856568 0.0825366027

These scores are what is typically plotted in a PCA "ordination." This is
easily done with the `plot()` function.
The LabDSV library includes a
special version of the plot function for PCA. In S-Plus and R the default plot
function plots the barplot of variances, but ecologists are usually more
interested in the plot scores. The LabDSV plot function for PCA is designed to
simplify plotting, minimizes typing, and most important, attempts to scale the
axes identically so that
we can compare distances within the ordination
without distortion.
Unfortunately, R/S does not always understand the exact aspect ratio of
computer monitors, and so plots may not be
exactly square. Generally, they're close enough, and hardcopy output will be
exact.

The `plot()` function requires a PCA as its first argument. The default
is to plot axes 1 and 2, but any of the axes computed can be plotted by
specifying x= and y=. In addition, you can add a title. For example (showing R
output)

plot(pca.1,title="Bryce Canyon")

First, we want to center the species abundances around their means and standardize their variance. We can do this fairly directly, by

veg.center <- apply(veg,2,function(x){(x-mean(x))/sd(x)})

As you remember from Lab 1, the veg.center <- scale(veg,center=TRUE,scale=TRUE)

The Next, we want to specify which environmental variables we want to use, and to center them as well. For example:

env <- cbind(elev,av,slope) env.scale <- scale(env,center=TRUE,scale=TRUE)

veg.lm <- lm(veg.center~env.scale)

Now, we create the matrix of fitted values:
veg.fit <- fitted(veg.lm)

It's possible to view the regression coefficients of the models, but every
species is a separate regression, and the details are generally overwhelming.
It's sometimes worth scanning, however, to see if the same variables are judged significant
for a large number of species.
Finally, we calculate the PCA on the fitted values:

rda <- pca(veg.fit,cor=TRUE) varplot.pca(rda,dim=5)

Notice how all of the variation is accounted for in only three eigenvectors. This is necessarily so, as the fitted values subjected to the PCA were the output of a linear regression with only three independent variables. We can view the ordination exactly as any other ordination:

plot(rda,title="Redundancy Analysis")

In our example, the first two axes (`pca.1$scores[,1]` and `
pca.1$scores[,2]`) describe only about 12% of the total variance, but we'll
start with a 2-dimensional approach and go from there. Based on our previous
results, we might hypothesize that elevation plays a big role in the
distribution of vegetation in Bryce Canyon. Let's look:

elev.pca.glm <- glm(elev~pca.1$scores[,1]+pca.1$scores[,2]) summary(elev.pca.glm)

Call: glm(formula = elev ~ pca.1$scores[, 1] + pca.1$scores[, 2]) Deviance Residuals: Min 1Q Median 3Q Max -1123.26 -361.72 19.48 396.99 964.19 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7858.56 39.89 196.982 < 2e-16 *** pca.1$scores[, 1] -56.33 10.96 -5.142 8.01e-07 *** pca.1$scores[, 2] -105.72 14.44 -7.323 1.19e-11 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 254655.1) Null deviance: 60370369 on 159 degrees of freedom Residual deviance: 39980847 on 157 degrees of freedom AIC: 2450.7 Number of Fisher Scoring iterations: 2

If you don't recall the syntax and form of the `glm` command, review Lab 4 first. This command is telling us that
`elev` is negatively related to `pca.1$scores[,1]` and negatively
related to `pca.1$scores[,2]`, and that the relation to
`pca.1$scores[,2]` is stronger than to `pca.1$scores[,1]`, as
determined by looking at the coefficients (` -105 vs 56`) or
t-values (

In addition, there are several other items of interest in the summary. First,
because we did not specify otherwise, the family was taken to be
**Gaussian**, meaning we expected normal errors unrelated to the expected
value. This is correct for a variable like elevation, which is unbounded on
either end (for all practical purposes), and represents a continuous value
rather than a count. You should recall from Lab 4 that fitting a GLM with a
Gaussian error model is equivalent to a least-squares regression. Accordingly,
we could have done:

elev.pca.lm<-lm(elev~pca.1$scores[,1]+pca.1$scores[,2]) summary(elev.pca.lm)

Call: lm(formula = elev ~ pca.1$scores[, 1] + pca.1$scores[, 2]) Residuals: Min 1Q Median 3Q Max -1123.26 -361.72 19.48 396.99 964.19 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7858.56 39.89 196.982 < 2e-16 *** pca.1$scores[, 1] -56.33 10.96 -5.142 8.01e-07 *** pca.1$scores[, 2] -105.72 14.44 -7.323 1.19e-11 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 504.6 on 157 degrees of freedom Multiple R-Squared: 0.3377, Adjusted R-squared: 0.3293 F-statistic: 40.03 on 2 and 157 DF, p-value: 8.927e-15

To see the model,

require(mgcv) # to load the gam() function require(akima) # to load the interp() function

plot(pca.1) contour(interp(pca.1$scores[,1],pca.1$scores[,2],fitted(elev.pca.glm)), add=TRUE,col=2,labcex=0.8)

Let's take this command apart to understand it. First, `contour()` is a
command to plot contour lines on gridded surface. It just so happens that our
contours are straight lines, but that's because we fit a linear model.
`contour()` expects to be given a grid of values, but we only have points.
To generate the grid from the points, we use `interp()`, which interpolates
the points to all locations in the grid. `interp()` expects three values:

- the X axis coordinates
- the Y axis coordinates
- the Z axis values for the grid or surface

It's tempting to contour the actual elevations themselves (substituting
`elev` for `fitted(elev.pca.glm)` in the above `interp`
command, but the results are generally poor. See for yourself:

contour(interp(pca.1$scores[,1],pca.1$scores[,2],elev),add=T,col=3)

The problem is that the actual elevations are too variable at a small scale, and need to be generalized to plot. More importantly, we need some idea of how well the ordination correlates to elevation, and by fitting a formal model we get much better quantification than we get by analysis of the raw data by eye.

Nonetheless, the contours of the raw data give us the impression that the relation of elevation to the axes is non-linear (see the curve for the 7000 foot contour), and that perhaps we should fit a better model. Let's try:

elev.pca2.glm <- glm(elev~pca.1$scores[,1]+I(pca.1$scores[,1]^2)+ pca.1$scores[,2]+I(pca.1$scores[,2]^2)) summary(elev.pca2.glm)

Call: glm(formula = elev ~ pca.1$scores[, 1] + I(pca.1$scores[, 1]^2) + pca.1$scores[, 2] + I(pca.1$scores[, 2]^2)) Deviance Residuals: Min 1Q Median 3Q Max -1222.5 -323.1 60.6 258.5 930.7 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7628.726 64.476 118.320 < 2e-16 *** pca.1$scores[, 1] -117.975 17.264 -6.834 1.78e-10 *** I(pca.1$scores[, 1]^2) 20.813 3.495 5.956 1.67e-08 *** pca.1$scores[, 2] -65.096 14.337 -4.540 1.12e-05 *** I(pca.1$scores[, 2]^2) -6.040 3.050 -1.980 0.0494 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 203030.2) Null deviance: 60370369 on 159 degrees of freedom Residual deviance: 31469682 on 155 degrees of freedom AIC: 2416.4 Number of Fisher Scoring iterations: 2

anova(elev.pca.glm,elev.pca2.glm,test="Chi")

Analysis of Deviance Table Model 1: elev ~ pca.1$scores[, 1] + pca.1$scores[, 2] Model 2: elev ~ pca.1$scores[, 1] + I(pca.1$scores[, 1]^2) + pca.1$scores[, 2] + I(pca.1$scores[, 2]^2) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 157 39980847 2 155 31469682 2 8511165 7.89e-10

Are both axes necessarily quadratic? `pca.1$scores[,2]` is not
convincingly quadratic. Let's check:

anova(elev.pca2.glm,test="Chi")

Analysis of Deviance Table Model: gaussian, link: identity Response: elev Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 159 60370369 pca.1$scores[, 1] 1 6732272 158 53638097 8.492e-09 I(pca.1$scores[, 1]^2) 1 16519266 157 37118831 1.878e-19 pca.1$scores[, 2] 1 4852933 156 32265899 1.013e-06 I(pca.1$scores[, 2]^2) 1 796217 155 31469682 4.767e-02

Then again, I guess a reduction in deviance of 796217 is worth a degree of freedom after all!. To get a clean plot and overlay,

plot(pca.1) contour(interp(pca.1$scores[,1],pca.1$scores[,2],fitted(elev.pca2.glm)),add=T,col=2)

As you can see, the actual relation of elevation to the ordination is highly non-linear, with high elevations at the extremes of the X axis.

To evaluate other environmental variables, we do equivalent analyses, adjusting the model as suitable. For example, to analyze slope, which is bounded at 0 on the low end, we might want to fit a Poisson regression to avoid fitted values with a negative slope.

slope.pca.glm <- glm(slope~pca.1$scores[,1]+pca.1$scores[,2],family=poisson) summary(slope.pca.glm)

Call: glm(formula = slope ~ pca.1$scores[, 1] + pca.1$scores[, 2], family = poisson) Deviance Residuals: Min 1Q Median 3Q Max -3.3070 -1.7456 -1.1582 -0.1780 14.2356 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.63525 0.03523 46.420 < 2e-16 *** pca.1$scores[, 1] -0.03384 0.01050 -3.224 0.001263 ** pca.1$scores[, 2] 0.05057 0.01303 3.880 0.000104 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1145.3 on 159 degrees of freedom Residual deviance: 1121.3 on 157 degrees of freedom AIC: 1606.4 Number of Fisher Scoring iterations: 6

While the z-values for the coefficients are reasonably large (and statistically significant), the model has a very poor fit (D^2=0.02). Does this mean that slope is not related to the ordination, or simply that it's so strongly non-linear that we don't see it. Possibly, we have fit the wrong model. Repeating the analysis with a Gaussian model gives us:

test <- glm(slope~pca.1$scores[,1]+pca.1$scores[,2])

summary(test) Call: glm(formula = slope ~ pca.1$scores[, 1] + pca.1$scores[, 2]) Deviance Residuals: Min 1Q Median 3Q Max -5.4880 -3.5196 -2.4581 -0.4257 59.4864 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.2125 0.6998 7.449 5.89e-12 *** pca.1$scores[, 1] -0.1462 0.1922 -0.761 0.448 pca.1$scores[, 2] 0.2375 0.2532 0.938 0.350 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 78.3473) Null deviance: 12415 on 159 degrees of freedom Residual deviance: 12301 on 157 degrees of freedom AIC: 1156.8 Number of Fisher Scoring iterations: 2 range(fitted(test)) [1] 2.300952 6.962712 1-(12301/12415) [1] 0.00918244

slope.pca.gam <- gam(slope~s(pca.1$scores[,1])+s(pca.1$scores[,2]), family=poisson)

summary(slope.pca.gam) Family: poisson Link function: log Formula: slope ~ s(pca.1$scores[, 1]) + s(pca.1$scores[, 2]) Parametric coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.53611 0.03834 40.06 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Approximate significance of smooth terms: edf Est.rank Chi.sq p-value s(pca.1$scores[, 1]) 8.012 9.000 107.01 < 2e-16 *** s(pca.1$scores[, 2]) 8.832 9.000 73.45 3.20e-12 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = -0.00758 Deviance explained = 17.9% UBRE score = 5.0988 Scale est. = 1 n = 160

The plot is certainly non-linear, and the fit is somewhat better

1-(940.1238/1145.295)

[1] 0.1791427

test <- glm(slope~pca.1$scores[,1]+I(pca.1$scores[,1]^2)+ pca.1$scores[,2]+I(pca.1$scores[,2]^2),family=poisson) summary(test)

Call: glm(formula = slope ~ pca.1$scores[, 1] + I(pca.1$scores[, 1]^2) + pca.1$scores[, 2] + I(pca.1$scores[, 2]^2), family = poisson) Deviance Residuals: Min 1Q Median 3Q Max -3.62409 -1.81801 -0.94481 0.07151 13.03008 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.957678 0.062405 31.370 < 2e-16 *** pca.1$scores[, 1] 0.050336 0.017972 2.801 0.00510 ** I(pca.1$scores[, 1]^2) -0.019874 0.003944 -5.039 4.67e-07 *** pca.1$scores[, 2] 0.037873 0.016818 2.252 0.02433 * I(pca.1$scores[, 2]^2) -0.011350 0.003570 -3.180 0.00147 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1145.3 on 159 degrees of freedom Residual deviance: 1085.2 on 155 degrees of freedom AIC: 1574.3 Number of Fisher Scoring iterations: 6

anova(test,test="Chi")

Analysis of Deviance Table Model: poisson, link: log Response: slope Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 159 1145.30 pca.1$scores[, 1] 1 9.01 158 1136.29 2.687e-03 I(pca.1$scores[, 1]^2) 1 38.33 157 1097.96 5.974e-10 pca.1$scores[, 2] 1 1.54 156 1096.41 0.21 I(pca.1$scores[, 2]^2) 1 11.19 155 1085.22 8.218e-04

anova(test,slope.pca.gam,test="Chi")

Analysis of Deviance Table Model 1: slope ~ pca.1$scores[, 1] + I(pca.1$scores[, 1]^2) + pca.1$scores[, 2] + I(pca.1$scores[, 2]^2) Model 2: slope ~ s(pca.1$scores[, 1]) + s(pca.1$scores[, 2]) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 155.000 1085.22 2 142.156 940.12 12.844 145.10 1.641e-24

plot(pca.1) surf(pca.1,elev)

To assess the quality of the fit in more detail than the simple D^2 and GCV score, the surf function will produce the plots of the fitted values and standard errors that we saw in Lab 4 in a separate window. For example, produces

and

In this case the model is Gaussian, and so the values on the y axis are in the same units (feet). You can see that the model is estimating usually < +/- 250 feet. For models which are poisson or binomial, the y value is the log or logit respectively.

Just like `plot()` you can specify dimensions other than 1 or 2 (but they
MUST match the dimensions of the previous plot), specify the color of the
contours (with `col=`), and change the size of the contour labels (with
`labcex=`). You can plot as many surfaces in a single plot as makes
sense. For example

plot(pca.1) surf(pca.1,elev,labcex=1.0) surf(pca.1,slope,col=4,labcex=1.0)

And, you can add annotation with the `text(x,y,text,...)` command as desired. It's
easiest to show an example.

text(-5,6,"Elevation",col=2,adj=0) text(-5,5,"Slope",col=4,adj=0)

The x and y coordinates come first, then the text desired in quotes, the color you want, and optionally, test justification (adj=0 means left-justified, adj=0.5 means center, and adj=1 means right justified).

In the next few labs we'll use the `surf()` function repeatedly.

As a first alternative, we can still employ eigen analysis, but on a more suitable matrix. If we use a distance (or dissimilarity) matrix instead of correlation or covariance, we get Principal Coordinates Analysis (PCoA or PCO), the subject of the next lab, Lab 7.

On the other hand, PCA is widely employed (often inappropriately). More importantly, it allowed us to work through many areas of ordination analysis that we will draw on in future labs.

pca <- function (mat, cor = FALSE, dim = min(nrow(mat), ncol(mat))) { tmp <- prcomp(mat, retx = TRUE, center = TRUE, scale = cor) out <- list() out$scores <- tmp$x[, 1:dim] out$loadings <- tmp$rotation[, 1:dim] out$sdev <- tmp$sdev[1:dim] out$totdev <- sum(tmp$sdev^2) class(out) <- "pca" return(out) }

loadings.pca <- function (x, dim = length(x$sdev), digits = 3, cutoff = 0.1) { if (dim > ncol(x$loadings)) { cat("Only", ncol(x$loadings), "axes available\n") dim <- ncol(x$loadings) } cat("\nLoadings:\n") cx <- format(round(x$loadings[, 1:dim], digits = digits)) cx[abs(x$loadings[, 1:dim]) < cutoff] <- substring(" ", 1, nchar(cx[1, 1])) print(cx, quote = FALSE) invisible() }

scores.pca <- function (x, labels = NULL, dim = length(x$sdev)) { if (dim > length(x$sdev)) { cat("Only", length(x$sdev), " axes available\n") dim <- length(x$sdev) } if (!is.null(labels)) { cbind(labels, x$scores[, 1:dim]) } else { x$scores[, 1:dim] } }

plot(pca.1)

the generic plot function notices that plot.pca <- function (x, ax = 1, ay = 2, col = 1, title = "", pch = 1, ...) { if (class(x) != "pca") stop("You must specify a an object of class pca") plot(x$scores[, ax], x$scores[, ay], asp = 1, col = col, xlab = paste("PCA", ax), ylab = paste("PCA", ay), pch = pch, main = title) invisible() }

require(mgcv) require(akima)

surf.pca <- function (pca, var, x = 1, y = 2, col = 2, family = gaussian, gof=F, labcex=0.8, ...) { require(mgcv) require(akima) if (missing(pca)) { stop("You must specify a list obkect from cmdscale()") } if (missing(var)) { stop("You must specify a variable to surface") } xval <- pca$scores[, x] yval <- pca$scores[, y] if (is.logical(var)) { tmp <- gam(var ~ s(xval) + s(yval), family = binomial) } else { tmp <- gam(var ~ s(xval) + s(yval), family = family) } if (gof!=F) { x11() plot.gam(tmp) readline("Hit Return to Continue\n") dev.off() } contour(interp(xval, yval, fitted(tmp)), add = T, col = col, labcex=labcex, ...) }

varplot.pca <- function(pca,dim=10) { var <- pca$sdev^2 barplot(var[1:dim],ylab="Variance") readline("Hit Return to Continue\n") barplot(cumsum(var/pca$totdev)[1:dim],ylab="Cumulative Variance") }