PCA is an eigenvector technique that attempts to explain as much variance as possible on each of a series of orthogonal vectors spanning the vegetation space (see Kent and Coker Pp. 186- 203 for a gentle presentation; alternatively Legendre and Legendre 1998 Pp. 391-424 for a more thorough treatment). Eigenanalysis is based in linear or matrix algebra, and has a wide range of uses. Our particular interest is in finding an alternative description of our vegetation data in a low dimensional space. The first vector is chosen to account for as much variance in plot dispersion from the centroid as possible; the second is chosen by the same criterion but subject to the constraint of being orthogonal the first, and so on. The approach can be likened to regression by least squares, except that the residuals from the vector are measured perpendicular to the vector rather than perpendicular to the x or y axis.
In general, PCA is performed on a correlation or variance/covariance matrix, although equivalently, a matrix of sums-of-squares and-cross-products can be used. In R, however, the functions we will use convert the basic vegetation data into the appropriate forms, usually requiring only an argument to be specified in the function.
There are two ways to perform PCA in R: princomp() and prcomp(). Essentially, they compute the same values (technically, princomp computes an eigenanalysis and prcomp computes a singular value decomposition). To make all of the ordination labs more similar, the LabDSV package includes a function called pca() that I will use in the following. pca() simply calls prcomp, but includes different conventions for plotting and analysis that we will use with other ordination techniques in later labs.
So, we first have to load the LabDSV package, if you haven't already. If the packages are installed properly
(see R for Ecologists if you don't remember how to install and access packages in R)
The pca() function accepts taxon matrices or dataframes, computing loadings for each column and scores for each row in the matrix or dataframe. The loadings are the contribution of the column vector to each of the eigenvectors. A large positive component means that that column (species in our case) is positively correlated with that eigenvector; a large negative values is negative correlation; and small values mean that the species is unrelated to that eigenvector. Often, we're more interested in the position of the sample plots in the reduced-dimensional space; these positions are given by the scores, or coordinates along the eigenvectors.
The pca() function allows you to specify a number of parameters concerning the calculation, The first is whether you want to use a correlation or covariance matrix. Again, this is a subject too detailed to expound on adequately in HTML, but generally, PCA is sensitive to the scale of measurement of the data. If all the data are not measured on the same scale, using covariance means that the result will be determined mostly by the variable with the largest values, as it will have the highest variance. Using a correlation matrix treats all variables the same (standardized to mean=0 and std. dev.=1). Even if all species were measured on the same scale (e.g. percent cover), to prevent the dominant species from determining the results, you probably want to use correlation. In pca(), this means specifying cor=TRUE in the function call. In addition, by default pca() will generate one eigenvector for every column (species), and scores for each row (plot) on each of the eigenvectors. Generally, the vast majority of the variance is described on the first few eigenvectors, and we can save space by only calculating the scores for only the first few eigenvectors. This can be specified by including dim=n where n equals the number of dimensions you want scores and loadings for. So, for example
calculates a principal components analysis of the Bryce Canyon vegetation, using a correlation matrix and only calculating scores for the first 10 eigenvectors. The results of a PCA are complex, including:
Importance of components: numeric matrix: 3 rows, 169 columns. Comp. 1 Comp. 2 Comp. 3 Comp. 4 Comp. 5 Standard deviation 3.65275545 2.77224856 2.66016722 2.5459628 2.38617490 Proportion of Variance 0.07895043 0.04547552 0.04187272 0.0383546 0.03369131 Cumulative Proportion 0.07895043 0.12442594 0.16629866 0.2046533 0.23834456 Comp. 6 Comp. 7 Comp. 8 Comp. 9 Comp. 10 Standard deviation 2.31459860 2.1179390 2.06713660 1.90778099 1.85804579 Proportion of Variance 0.03170039 0.0265424 0.02528434 0.02153626 0.02042801 Cumulative Proportion 0.27004496 0.2965874 0.32187170 0.34340796 0.36383598
For reasons known only to R, the top line is the standard deviation (rather than the variance) associated with each component. The next two lines are the items of primary interest, however; proportion of total variance, and cumulative proportion of variance. The dim=10 means to list only the first 10 dimensions; the default is to list ALL of them. Notice that the variance explained is in order from highest to lowest by eigenvector; this is by design. Notice also that the first ten eigenvectors explain a total of 0.3638 of the total variance of the dataset. The number of significant eigenvectors is at most min(#plots, #species)-1. Most of the eigenvectors after 100 explain a minute portion of the variance.
We can look at this same information graphically with the varplot() function.
The first panel shows the variance associated with each vector, and the second shows the cumulative variance as a fraction of the total.
A partial output looks like the following:
loadings.pca(pca) Comp. 1 Comp. 2 Comp. 3 Comp. 4 Comp. 5 junost ameuta arcpat 0.157 -0.103 arttri 0.219 -0.147 0.154 atrcan 0.203 -0.144 0.154 berfre ceamar 0.134 cerled cermon chrdep chrnau chrpar chrvis -0.200 eurlan juncom 0.100 0.132 pacmyr 0.145 0.111 0.154 . . . . . . . . . . . . . . . . . .
By default, loadings.pca() suppresses small values to emphasize the more important ones. Accordingly, you can see that eigenvector 1 is positively correlated with arcpat, ceamar, juncom, and pacmyr, while negatively associated with chrvis (along with many more species not included in this excerpt). The dim=5 means only list the first five components; the default is 5.
It is often the case that loadings or scores from two different computers or programs will come out with the same values but with opposite signs. The orientation of eigenvectors is arbitrary, and the sign is only meaningful with respect to other values on the same axis.
Which looks like:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 1 -0.303522539 -0.58234332 0.4964863489 0.446266093 0.0005421279 2 0.106991234 -0.86395259 0.7710460128 1.070231898 -0.1001168298 3 -0.325220583 -0.63188537 0.5977936134 0.816265090 -0.0339421175 4 -0.301010236 -0.55085413 0.4413011878 0.356285781 0.0722309177 5 -3.051327186 0.58628755 -0.0316738283 0.669026942 0.0280048460 6 -0.459006189 -0.65657134 0.6296331674 0.562444071 -0.2394758056 7 -2.988245257 0.69125840 -0.2506843565 0.192719353 0.0176306898 8 -1.057166708 -0.14648668 0.1346286599 0.322338430 0.1677946043 9 0.591108689 -1.09106087 0.7925182118 0.648834725 -0.0341675553 10 -3.767674331 1.09240362 -0.7486658969 -0.148511254 -0.1280649255 . . . . . . . . . . . . . . . . . . 156 0.275149769 -0.65904715 0.4639857495 -1.284139318 -0.2769789761 157 0.755560214 -0.98124047 0.5450052352 -1.019513009 -0.2163151320 158 0.320133102 -0.74440896 0.3037215946 -0.663608326 0.0574919737 159 0.818248378 -0.96619683 0.4276821265 -1.105798668 -0.1866257489 160 0.389202051 -0.69695648 0.2635548583 -0.625856568 0.0825366027
These scores are what is typically plotted in a PCA "ordination." This is easily done with the plot() function. The LabDSV library includes a special version of the plot function for PCA. In S-Plus and R the default plot function plots the barplot of variances, but ecologists are usually more interested in the plot scores. The LabDSV plot function for PCA is designed to simplify plotting, minimizes typing, and most important, attempts to scale the axes identically so that we can compare distances within the ordination without distortion. Unfortunately, R/S does not always understand the exact aspect ratio of computer monitors, and so plots may not be exactly square. Generally, they're close enough, and hardcopy output will be exact.
The plot() function requires a PCA as its first argument. The default is to plot axes 1 and 2, but any of the axes computed can be plotted by specifying x= and y=. In addition, you can add a title. For example (showing R output)
First, we want to center the species abundances around their means and standardize their variance. We can do this fairly directly, by
Next, we want to specify which environmental variables we want to use, and to center them as well. For example:
env <- cbind(elev,av,slope) env.scale <- scale(env,center=TRUE,scale=TRUE)
Finally, we calculate the PCA on the fitted values:
rda <- pca(veg.fit,cor=TRUE) varplot.pca(rda,dim=5)
Notice how all of the variation is accounted for in only three eigenvectors. This is necessarily so, as the fitted values subjected to the PCA were the output of a linear regression with only three independent variables. We can view the ordination exactly as any other ordination:
In our example, the first two axes (pca.1$scores[,1] and pca.1$scores[,2]) describe only about 12% of the total variance, but we'll start with a 2-dimensional approach and go from there. Based on our previous results, we might hypothesize that elevation plays a big role in the distribution of vegetation in Bryce Canyon. Let's look:
elev.pca.glm <- glm(elev~pca.1$scores[,1]+pca.1$scores[,2]) summary(elev.pca.glm)
Call: glm(formula = elev ~ pca.1$scores[, 1] + pca.1$scores[, 2]) Deviance Residuals: Min 1Q Median 3Q Max -1123.26 -361.72 19.48 396.99 964.19 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7858.56 39.89 196.982 < 2e-16 *** pca.1$scores[, 1] -56.33 10.96 -5.142 8.01e-07 *** pca.1$scores[, 2] -105.72 14.44 -7.323 1.19e-11 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 254655.1) Null deviance: 60370369 on 159 degrees of freedom Residual deviance: 39980847 on 157 degrees of freedom AIC: 2450.7 Number of Fisher Scoring iterations: 2
If you don't recall the syntax and form of the glm command, review Lab 4 first. This command is telling us that elev is negatively related to pca.1$scores[,1] and negatively related to pca.1$scores[,2], and that the relation to pca.1$scores[,2] is stronger than to pca.1$scores[,1], as determined by looking at the coefficients ( -105 vs 56) or t-values ( -7.323277 vs 5.141676)
In addition, there are several other items of interest in the summary. First, because we did not specify otherwise, the family was taken to be Gaussian, meaning we expected normal errors unrelated to the expected value. This is correct for a variable like elevation, which is unbounded on either end (for all practical purposes), and represents a continuous value rather than a count. You should recall from Lab 4 that fitting a GLM with a Gaussian error model is equivalent to a least-squares regression. Accordingly, we could have done:
elev.pca.lm <- lm(elev~pca.1$scores[,1]+pca.1$scores[,2]) summary(elev.pca.lm)
Call: lm(formula = elev ~ pca.1$scores[, 1] + pca.1$scores[, 2]) Residuals: Min 1Q Median 3Q Max -1123.26 -361.72 19.48 396.99 964.19 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7858.56 39.89 196.982 < 2e-16 *** pca.1$scores[, 1] -56.33 10.96 -5.142 8.01e-07 *** pca.1$scores[, 2] -105.72 14.44 -7.323 1.19e-11 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 504.6 on 157 degrees of freedom Multiple R-Squared: 0.3377, Adjusted R-squared: 0.3293 F-statistic: 40.03 on 2 and 157 DF, p-value: 8.927e-15
To see the model,
require(mgcv) # to load the gam() function require(akima) # to load the interp() function
plot(pca.1) contour(interp(pca.1$scores[,1],pca.1$scores[,2],fitted(elev.pca.glm)), add=TRUE,col=2,labcex=0.8)
Let's take this command apart to understand it. First, contour() is a command to plot contour lines on gridded surface. It just so happens that our contours are straight lines, but that's because we fit a linear model. contour() expects to be given a grid of values, but we only have points. To generate the grid from the points, we use interp(), which interpolates the points to all locations in the grid. interp() expects three values:
It's tempting to contour the actual elevations themselves (substituting elev for fitted(elev.pca.glm) in the above interp command, but the results are generally poor. See for yourself:
The problem is that the actual elevations are too variable at a small scale, and need to be generalized to plot. More importantly, we need some idea of how well the ordination correlates to elevation, and by fitting a formal model we get much better quantification than we get by analysis of the raw data by eye.
Nonetheless, the contours of the raw data give us the impression that the relation of elevation to the axes is non-linear (see the curve for the 7000 foot contour), and that perhaps we should fit a better model. Let's try:
elev.pca2.glm <- glm(elev~pca.1$scores[,1]+I(pca.1$scores[,1]^2)+ pca.1$scores[,2]+I(pca.1$scores[,2]^2)) summary(elev.pca2.glm)
Call: glm(formula = elev ~ pca.1$scores[, 1] + I(pca.1$scores[, 1]^2) + pca.1$scores[, 2] + I(pca.1$scores[, 2]^2)) Deviance Residuals: Min 1Q Median 3Q Max -1222.5 -323.1 60.6 258.5 930.7 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7628.726 64.476 118.320 < 2e-16 *** pca.1$scores[, 1] -117.975 17.264 -6.834 1.78e-10 *** I(pca.1$scores[, 1]^2) 20.813 3.495 5.956 1.67e-08 *** pca.1$scores[, 2] -65.096 14.337 -4.540 1.12e-05 *** I(pca.1$scores[, 2]^2) -6.040 3.050 -1.980 0.0494 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 203030.2) Null deviance: 60370369 on 159 degrees of freedom Residual deviance: 31469682 on 155 degrees of freedom AIC: 2416.4 Number of Fisher Scoring iterations: 2
anova(elev.pca.glm,elev.pca2.glm,test="Chi")
Analysis of Deviance Table Model 1: elev ~ pca.1$scores[, 1] + pca.1$scores[, 2] Model 2: elev ~ pca.1$scores[, 1] + I(pca.1$scores[, 1]^2) + pca.1$scores[, 2] + I(pca.1$scores[, 2]^2) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 157 39980847 2 155 31469682 2 8511165 7.89e-10
Are both axes necessarily quadratic? pca.1$scores[,2] is not convincingly quadratic. Let's check:
anova(elev.pca2.glm,test="Chi")
Analysis of Deviance Table Model: gaussian, link: identity Response: elev Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 159 60370369 pca.1$scores[, 1] 1 6732272 158 53638097 8.492e-09 I(pca.1$scores[, 1]^2) 1 16519266 157 37118831 1.878e-19 pca.1$scores[, 2] 1 4852933 156 32265899 1.013e-06 I(pca.1$scores[, 2]^2) 1 796217 155 31469682 4.767e-02
Then again, I guess a reduction in deviance of 796217 is worth a degree of freedom after all!. To get a clean plot and overlay,
plot(pca.1) contour(interp(pca.1$scores[,1],pca.1$scores[,2],fitted(elev.pca2.glm)), add=T,col=2)
As you can see, the actual relation of elevation to the ordination is highly non-linear, with high elevations at the extremes of the X axis.
To evaluate other environmental variables, we do equivalent analyses, adjusting the model as suitable. For example, to analyze slope, which is bounded at 0 on the low end, we might want to fit a Poisson regression to avoid fitted values with a negative slope.
slope.pca.glm <- glm(slope~pca.1$scores[,1]+pca.1$scores[,2],family=poisson) summary(slope.pca.glm)
Call: glm(formula = slope ~ pca.1$scores[, 1] + pca.1$scores[, 2], family = poisson) Deviance Residuals: Min 1Q Median 3Q Max -3.3070 -1.7456 -1.1582 -0.1780 14.2356 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.63525 0.03523 46.420 < 2e-16 *** pca.1$scores[, 1] -0.03384 0.01050 -3.224 0.001263 ** pca.1$scores[, 2] 0.05057 0.01303 3.880 0.000104 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1145.3 on 159 degrees of freedom Residual deviance: 1121.3 on 157 degrees of freedom AIC: 1606.4 Number of Fisher Scoring iterations: 6
While the z-values for the coefficients are reasonably large (and statistically significant), the model has a very poor fit (D^2=0.02). Does this mean that slope is not related to the ordination, or simply that it's so strongly non-linear that we don't see it. Possibly, we have fit the wrong model. Repeating the analysis with a Gaussian model gives us:
test <- glm(slope~pca.1$scores[,1]+pca.1$scores[,2])
summary(test) Call: glm(formula = slope ~ pca.1$scores[, 1] + pca.1$scores[, 2]) Deviance Residuals: Min 1Q Median 3Q Max -5.4880 -3.5196 -2.4581 -0.4257 59.4864 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.2125 0.6998 7.449 5.89e-12 *** pca.1$scores[, 1] -0.1462 0.1922 -0.761 0.448 pca.1$scores[, 2] 0.2375 0.2532 0.938 0.350 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 78.3473) Null deviance: 12415 on 159 degrees of freedom Residual deviance: 12301 on 157 degrees of freedom AIC: 1156.8 Number of Fisher Scoring iterations: 2 range(fitted(test)) [1] 2.300952 6.962712 1-(12301/12415) [1] 0.00918244
slope.pca.gam <- gam(slope~s(pca.1$scores[,1])+s(pca.1$scores[,2]), family=poisson)
summary(slope.pca.gam) Family: poisson Link function: log Formula: slope ~ s(pca.1$scores[, 1]) + s(pca.1$scores[, 2]) Parametric coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.53611 0.03834 40.06 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Approximate significance of smooth terms: edf Est.rank Chi.sq p-value s(pca.1$scores[, 1]) 8.012 9.000 107.01 < 2e-16 *** s(pca.1$scores[, 2]) 8.832 9.000 73.45 3.20e-12 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = -0.00758 Deviance explained = 17.9% UBRE score = 5.0988 Scale est. = 1 n = 160
The plot is certainly non-linear, and the fit is somewhat better
1-(940.1238/1145.295)
[1] 0.1791427
test <- glm(slope~pca.1$scores[,1]+I(pca.1$scores[,1]^2)+ pca.1$scores[,2]+I(pca.1$scores[,2]^2),family=poisson) summary(test)
Call: glm(formula = slope ~ pca.1$scores[, 1] + I(pca.1$scores[, 1]^2) + pca.1$scores[, 2] + I(pca.1$scores[, 2]^2), family = poisson) Deviance Residuals: Min 1Q Median 3Q Max -3.62409 -1.81801 -0.94481 0.07151 13.03008 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.957678 0.062405 31.370 < 2e-16 *** pca.1$scores[, 1] 0.050336 0.017972 2.801 0.00510 ** I(pca.1$scores[, 1]^2) -0.019874 0.003944 -5.039 4.67e-07 *** pca.1$scores[, 2] 0.037873 0.016818 2.252 0.02433 * I(pca.1$scores[, 2]^2) -0.011350 0.003570 -3.180 0.00147 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1145.3 on 159 degrees of freedom Residual deviance: 1085.2 on 155 degrees of freedom AIC: 1574.3 Number of Fisher Scoring iterations: 6
Analysis of Deviance Table Model: poisson, link: log Response: slope Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 159 1145.30 pca.1$scores[, 1] 1 9.01 158 1136.29 2.687e-03 I(pca.1$scores[, 1]^2) 1 38.33 157 1097.96 5.974e-10 pca.1$scores[, 2] 1 1.54 156 1096.41 0.21 I(pca.1$scores[, 2]^2) 1 11.19 155 1085.22 8.218e-04
Analysis of Deviance Table Model 1: slope ~ pca.1$scores[, 1] + I(pca.1$scores[, 1]^2) + pca.1$scores[, 2] + I(pca.1$scores[, 2]^2) Model 2: slope ~ s(pca.1$scores[, 1]) + s(pca.1$scores[, 2]) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 155.000 1085.22 2 142.156 940.12 12.844 145.10 1.641e-24
General Surface Plotter
Plotting a variable as a surface on an ordination is such a common task that it
is worthwhile to create a general ordination surface routine. Therefore, the LabDSV
library includes not only a plotting routine, but a surfacing routine. It
employs GAM to fit the surface (so you don't have to know a priori the
shape of the response, and reports back D^2 for the fit. To use it, simply plot
first, then surf(). E.g.
plot(pca.1) surf(pca.1,elev)
Just like plot() you can specify dimensions other than 1 or 2 (but they MUST match the dimensions of the previous plot), specify the color of the contours (with col=), and change the size of the contour labels (with labcex=). You can plot as many surfaces in a single plot as makes sense. For example
plot(pca.1) surf(pca.1,elev,labcex=1.0) surf(pca.1,slope,col=4,labcex=1.0)
And, you can add annotation with the text(x,y,text,...) command as desired. It's easiest to show an example.
text(-5,6,"Elevation",col=2,adj=0) text(-5,5,"Slope",col=4,adj=0)
The x and y coordinates come first, then the text desired in quotes, the color you want, and optionally, test justification (adj=0 means left-justified, adj=0.5 means center, and adj=1 means right justified).
In the next few labs we'll use the surf() function repeatedly.
As a first alternative, we can still employ eigen analysis, but on a more suitable matrix. If we use a distance (or dissimilarity) matrix instead of correlation or covariance, we get Principal Coordinates Analysis (PCoA or PCO), the subject of the next lab, Lab 7.
On the other hand, PCA is widely employed (often inappropriately). More importantly, it allowed us to work through many areas of ordination analysis that we will draw on in future labs.
pca <- function (mat, cor = FALSE, dim = min(nrow(mat), ncol(mat))) { tmp <- prcomp(mat, retx = TRUE, center = TRUE, scale = cor) out <- list() out$scores <- tmp$x[, 1:dim] out$loadings <- tmp$rotation[, 1:dim] out$sdev <- tmp$sdev[1:dim] out$totdev <- sum(tmp$sdev^2) class(out) <- "pca" return(out) }
loadings.pca <- function (x, dim = length(x$sdev), digits = 3, cutoff = 0.1) { if (dim > ncol(x$loadings)) { cat("Only", ncol(x$loadings), "axes available\n") dim <- ncol(x$loadings) } cat("\nLoadings:\n") cx <- format(round(x$loadings[, 1:dim], digits = digits)) cx[abs(x$loadings[, 1:dim]) < cutoff] <- substring(" ", 1, nchar(cx[1, 1])) print(cx, quote = FALSE) invisible() }
scores.pca <- function (x, labels = NULL, dim = length(x$sdev)) { if (dim > length(x$sdev)) { cat("Only", length(x$sdev), " axes available\n") dim <- length(x$sdev) } if (!is.null(labels)) { cbind(labels, x$scores[, 1:dim]) } else { x$scores[, 1:dim] } }
plot.pca <- function (x, ax = 1, ay = 2, col = 1, title = "", pch = 1, ...) { if (class(x) != "pca") stop("You must specify a an object of class pca") plot(x$scores[, ax], x$scores[, ay], asp = 1, col = col, xlab = paste("PCA", ax), ylab = paste("PCA", ay), pch = pch, main = title) invisible() }
require(mgcv) require(akima)
surf.pca <- function (ord, var, ax = 1, ay = 2, col = 2, labcex = 0.8, family = gaussian, thinplate = TRUE, grid = 50, gamma = 1, ...) { if (class(ord) != "pca") stop("You must specify an object of class pca") if (missing(var)) { stop("You must specify a variable to surface") } x <- ord$scores[, ax] y <- ord$scores[, ay] if (is.logical(var)) { if (thinplate) { tmp <- gam(var ~ s(x, y), family = binomial, gamma = gamma) } else { tmp <- gam(var ~ s(x) + s(y), family = binomial, gamma = gamma) } } else { if (thinplate) { tmp <- gam(var ~ s(x, y), family = family, gamma = gamma) } else { tmp <- gam(var ~ s(x) + s(y), family = family, gamma = gamma) } } new.x <- seq(min(x), max(x), len = grid) new.y <- seq(min(y), max(y), len = grid) xy.hull <- chull(x, y) xy.hull <- c(xy.hull, xy.hull[1]) new.xy <- expand.grid(x = new.x, y = new.y) inside <- as.logical(pip(new.xy$x, new.xy$y, x[xy.hull], y[xy.hull])) fit <- predict(tmp, type = "response", newdata = as.data.frame(new.xy)) fit[!inside] <- NA contour(x = new.x, y = new.y, z = matrix(fit, nrow = grid), add = TRUE, col = col) print(tmp) d2 <- (tmp$null.deviance - tmp$deviance)/tmp$null.deviance cat(paste("D^2 = ", formatC(d2, width = 4), "\n")) invisible(tmp) }
varplot.pca <- function(pca,dim=10) { var <- pca$sdev^2 barplot(var[1:dim],ylab="Variance") readline("Hit Return to Continue\n") barplot(cumsum(var/pca$totdev)[1:dim],ylab="Cumulative Variance") }
brokenstick <- function (n) { stick <- rep(0,n) for (i in 1:n) { for (j in i:n) stick[i] <- stick[i] + 1/j } stick }