In this lab we want to know the degree to which plot membership in a cluster is predictable from environmental variables not used in the classification. There are several reasons we might want to pursue this effort. Similar to our efforts with fitting surfaces on ordinations, one important reason is to determine the relationship between environmental variability and vegetation composition. Here we approaching the question with a less continuous variability. More important than interpretation, however, is the potential for prediction. That is, given environmental variables, can we predict vegetation composition? Back in Labs 4 and 5 we tried to predict the presence/absence or abundance of single species, but here we're predicting total community composition within classes or types. Specifically, rather than try to predict a single value (presence or abundance), we're trying choose among a set of possible alternatives. Every plot must be assigned to a type.

Among the variety of techniques available to us (e.g. linear discriminant
analysis (LDA) and quadratic discriminant analysis (QDA)) in this lab we will
emphasize tree-based classifiers. Tree classifiers are often called CART (for
Classification And Regression Trees), but CART is actually a specific
(copyrighted and trade-marked) example of such approaches. In R the tree
classifier is called `tree()`

hcl1 <- hclust(dis.bc,method="complete") plot(hcl1)

As we've seen before, using the "complete" method leads to discrete clusters at 1.0 dissimilarity. I'll slice the tree at 0.99 to generate clusters.

hcl1.99 <- cutree(hcl1,h=0.99) table(hcl1.99)

1 2 3 4 5 6 7 8 38 46 17 2 14 32 7 4

opt5 <- optpart(dis.bc,numclu=5) table(opt5$clusid)

1 2 3 4 5 38 92 14 12 4

tree1 <- tree(factor(hcl1.99) ~ elev + av + slope + depth + pos)

The plot(tree1) text(tree1)

The tree is shown with the root at the top, and each sequential split along each branch is labeled with the respective criterion for splitting. Values that are true go left; values that are false go right. The height of the vertical bar above each split reflects the decrease in deviance associated with that split. To see the output, just type it's name.

tree1

node), split, n, deviance, yval, (yprob)
* denotes terminal node
1) root 145 501.900 2 ( 0.25517 0.27586 0.08966 ... 0.04828 0.01379 )
2) elev < 7980 83 273.200 6 ( 0.00000 0.18072 ... 0.08434 0.02410 )
4) elev < 7580 34 87.550 3 ( 0.00000 0.14706 ... 0.00000 0.05882 )
8) elev < 6980 14 7.205 5 ( 0.00000 0.00000 ... 0.00000 ) *
9) elev > 6980 20 41.320 3 ( 0.00000 0.25000 ... 0.05000 0.10000 )
18) pos: bottom,low_slope,mid_slope 8 19.410 2 ( ... 0.12500 ) *
19) pos: ridge,up_slope 12 13.590 3 ( 0.00000 ... 0.08333 ) *
5) elev > 7580 49 95.200 6 ( 0.00000 0.20408 00000 ... 0.00000 )
10) slope < 2.5 20 48.230 2 ( 0.00000 0.40000 ... 0.00000 )
20) pos: bottom 8 6.028 7 ( 0.00000 0.12500 ... 0.00000 ) *
21) pos: low_slope,mid_slope,ridge,up_slope 12 21.300 2 ( ...)
42) av < 0.15 5 9.503 6 ( 0.00000 0.20000 ... 0.00000 ) *
43) av > 0.15 7 5.742 2 ( 0.00000 0.85714 ... 0.00000 ) *
11) slope > 2.5 29 14.560 6 ( 0.00000 0.06897 ... 0.00000 )
22) av < 0.94 24 0.000 6 ( 0.00000 0.00000 ... 0.00000 ) *
23) av > 0.94 5 6.730 6 ( 0.00000 0.40000 ... 0.00000 ) *
3) elev > 7980 62 83.610 1 ( 0.59677 0.40323 ... 0.00000 )
6) elev < 8650 45 61.830 2 ( 0.44444 0.55556 ... 0.00000 0.00000 )
12) pos: low_slope,mid_slope,up_slope 38 52.570 1 ( 0.52632 ... )
24) av < 0.7 22 25.780 2 ( 0.27273 0.72727 0.00000 ... 0.00000 )
48) pos: low_slope,up_slope 8 10.590 1 ( 0.62500 ... 0.00000 ) *
49) pos: mid_slope 14 7.205 2 ( 0.07143 0.92857 ... 0.00000 ) *
25) av > 0.7 16 12.060 1 ( 0.87500 0.12500 0.00000 ... 0.00000 ) *
13) pos: ridge 7 0.000 2 ( 0.00000 1.00000 0.00000 ... 0.00000 ) *
7) elev > 8650 17 0.000 1 ( 1.00000 0.00000 0.00000 ... 0.00000 ) *

That takes a little explaining (and a little shoehorning to fit, the ellipses
represent class probabilities I've deleted to make it fit).
The tree is a dichotomous key (just
like for plant or animal taxa). The first line is the root node (or trunk of
the tree)
which says that
- there are 145 plots (plots with missing values were dropped out, so it's less than 160),
- that the null deviance is 501.900,
- that the null type (most likely) is 2
- the respective probabilities for the 8 clusters are 0.225, 0.726, ...

- there are 62 plots
- the deviance is 83.610
- the most likely type is 1

Once we split, we simply proceed with the same objective independently for each branch on the remaining data. For example, on branch 2, where we now have 83 plots, the next split is

4) elev < 7580 34 87.550 3 ( 0.00000 0.14706 0.38235 ... 0.00000 0.05882) 5) elev > 7580 49 95.200 6 ( 0.00000 0.20408 0.00000 ... 0.14286 0.00000)

To see a simple summary of the results, enter

summary(tree1)

Classification tree: tree(formula = factor(hcl1.99) ~ elev + av + slope + depth + pos) Variables actually used in tree construction: [1] "elev" "pos" "slope" "av" Number of terminal nodes: 13 Residual mean deviance: 0.7428 = 98.05 / 132 Misclassification error rate: 0.131 = 19 / 145

- although we included five variables in the model, only
`elev, pos, slope,`and`av`were used. Variables that are never used in a split are dropped. - although there are only 8 types to predict, there are 13 terminal nodes.
- the misclassification error is 0.131 or 19 out of 145

If there are eight types, why are there 13 terminal nodes? For some types there is more than one route to a terminal node. For example, nodes 22, 23, and 42 all predict cluster 6. Nodes 22 and 23 are a single split, and yet both predict cluster 6. Why is that? Node 22 predicts type 6 perfectly (p=1.0, dev.=0) while node 23 is less clean (p=0.60, dev.=6.73). Splitting node 11 into 22 and 23 achieved a reduction in deviance of 14.560 - (6.730+0.000) = 7.83 with a single split, even though both branches predicted the same type. Less obvious at first glance, is that some types or clusters were never predicted at all. None of the terminal nodes predicts types 4 or 8. There are simply too few plots in those types to predict accurately, and they fall below the minimum splitting rule of at least 5 plots/type.

The misclassification error is 19 out of 145, or 0.131 or 13%. That's good enough for practical purposes probably. We could calculate a probability of getting 126/145 correct given our initial distribution, but it;s bound to be very significant. Of more interest perhaps is which types were confused with which other types. We can view this in a table called a "confusion matrix." First, we need the vector of predicted type for each plot. This we obtain as follows:

tree1.pred <- predict.tree(tree1,newdata=site,type="class") tree1.pred

[1] 1 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1 6 2 2 2 1 1 ... 2 2 1 1 1 1 1 1 1 2 2 [38] 7 1 2 2 2 2 2 2 1 2 1 2 2 6 2 3 2 2 1 2 1 1 ... 1 1 1 1 1 1 3 2 2 2 2 [75] 2 2 2 1 2 6 6 2 2 2 1 1 1 5 5 5 5 5 5 5 5 5 ... 5 6 2 6 6 6 6 6 2 6 6 [112] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 ... 7 6 6 7 3 3 3 3 3 3 2 [149] 2 3 5 3 3 3 2 2 5 5 5 3 Levels: 1 2 3 4 5 6 7 8

table(tree1.pred,hcl1.99)

hcl1.99 tree1.pred 1 2 3 4 5 6 7 8 1 37 5 0 0 0 0 0 0 2 1 34 3 1 0 2 0 3 3 0 2 10 0 0 0 0 1 4 0 0 0 0 0 0 0 0 5 0 0 4 0 14 0 0 0 6 0 4 0 1 0 30 0 0 7 0 1 0 0 0 0 7 0 8 0 0 0 0 0 0 0 0

How about our other classification using optpart?

tree2 <- tree(factor(opt5$clustering) ~ elev + av + slope + depth + pos) plot(tree2) text(tree2)

tree2

node), split, n, deviance, yval, (yprob) * denotes terminal node 1) root 145 326.600 2 ( 0.25517 0.57931 0.08966 0.04828 0.02759 ) 2) elev < 7980 83 225.300 1 ( 0.44578 0.26506 0.15663 ... 0.04819 ) 4) elev < 7580 34 71.970 2 ( 0.00000 0.41176 0.38235 ... 0.00000 ) 8) elev < 6980 14 7.205 3 ( 0.00000 0.07143 0.92857 ... 0.00000 ) * 9) elev > 6980 20 25.900 2 ( 0.00000 0.65000 0.00000 ... 0.00000 ) 18) pos: bottom,low_slope,mid_slope,ridge 12 6.884 2 ( 0.00000 0.91667 0.00000 0.08333 0.00000 ) * 19) pos: up_slope 8 8.997 4 ( 0.00000 0.25000 ... 0.00000 ) * 5) elev > 7580 49 69.830 1 ( 0.75510 0.16327 0.00000 ... 0.08163 ) 10) slope < 2.5 20 42.200 1 ( 0.40000 0.40000 0.00000 ... 0.20000 ) 20) elev < 7845 10 12.220 2 ( 0.30000 0.70000 ... 0.00000 ) * 21) elev > 7845 10 18.870 1 ( 0.50000 0.10000 ... 0.40000 ) 42) av < 0.58 5 5.004 1 ( 0.80000 0.00000 ... 0.20000 ) * 43) av > 0.58 5 9.503 5 ( 0.20000 0.20000 ... 0.60000 ) * 11) slope > 2.5 29 0.000 1 ( 1.00000 0.00000 ... 0.00000 ) * 3) elev > 7980 62 0.000 2 ( 0.00000 1.00000 ... 0.00000 0.00000 ) *

summary(tree2)

Classification tree: tree(formula = factor(opt5$clusid) ~ elev + av + slope + depth + pos) Variables actually used in tree construction: [1] "elev" "pos" "slope" "av" Number of terminal nodes: 8 Residual mean deviance: 0.3636 = 49.81 / 137 Misclassification error rate: 0.06897 = 10 / 145

tree2.pred <- predict.tree(tree2,newdata=site,type="class") table(tree2.pred,opt5$clusid)

tree2.pred 1 2 3 4 5 1 33 1 0 0 1 2 4 86 0 3 0 3 0 1 14 3 0 4 0 3 0 6 0 5 1 1 0 0 3

classmatch <- function (x,y,type='full') { if (inherits(x,c('partana','clustering','partition'))) x <- x$clustering if (inherits(y,c('partana','clustering','partition'))) y <- y$clustering tab <- table(x,y) total <- sum(tab) if (type == 'full') { match <- sum(tab>0) pairs <- matrix(0,nrow=match,ncol=3) partial <- rep(0,match) combo <- rep(0,length(x)) ord <- matrix(0,nrow=nrow(tab),ncol=ncol(tab)) running <- 0 for (i in 1:match) { find <- max(tab) for (j in 1:nrow(tab)) { test <- max(tab[j,]) if (test == find) { col <- which.max(as.vector(tab[j,])) pairs[i,] <- c(j,col,tab[j,col]) tab[j,col] <- 0 ord[j,col] <- i break } } } for (i in 1:length(x)) { for (j in 1:nrow(pairs)) { if (x[i] == pairs[j,1] && y[i] == pairs[j,2]) { combo[i] <- j break } } } partial <- cumsum(pairs[,3])/total pairs <- data.frame(pairs) names(pairs) <- c('row','column','n') res <- list(tab=table(x,y),pairs=pairs,partial=partial,ord=ord,combo=combo) return(res) } else if (type == 'direct') { match <- rep(0,nrow(tab)) ord <- matrix(0,nrow=nrow(tab),ncol=ncol(tab)) pairs <- matrix(0,nrow=nrow(tab),ncol=3) for (i in 1:nrow(tab)) { cols <- max.col(tab) vals <- rep(NA,ncol(tab)) for (j in 1:nrow(tab)){ vals[j] <- tab[j,cols[j]] } row <- which.max(vals) col <- which.max(as.vector(tab[row,])) match[i] <- tab[row,col] pairs[i,] <- c(row,col,match[i]) ord[row,col] <- i tab[row,] <- 0 tab[,col] <- 0 } res <- list(tab=tab,pairs=pairs,partial=cumsum(match)/total,ord=ord) return(res) } else stop('argument "type" must be either "full" or "direct"') }

fuzconfus <- function (part, fitted, dis=NULL) { if (inherits(part,'partana')) class <- part$clustering else if (inherits(part,c('clustering','pam')) && !is.null(dis)) { class <- part$clustering part <- partana(class,dis) } else stop("you must supply an object of class 'partana' \n or an object of class 'clustering' or 'pam' AND an object of class 'dist'") numplt<- length(class) numclu <- length(table(class)) pred <- apply(fitted,1,which.max) tmp <- part$ctc/diag(part$ctc) fuzerr <- 1- matrix(pmin(1,tmp),ncol=ncol(tmp)) diag(fuzerr) <- 1 for (j in 1:ncol(res)) { if (i != j) { fuzres[i,j] <- res[i,j] * fuzerr[i,j] fuzres[i,i] <- fuzres[i,i] + res[i,j] * (1-fuzerr[i,j]) } } } correct <- sum(diag(fuzres)) percent <- correct/numplt rowsum <- apply(fuzres,1,sum) colsum <- apply(fuzres,2,sum) summar <- sum(rowsum*colsum) kappa <- ((numplt*correct) - summar) / (numplt^2 - summar) out <- list(confus=fuzres,correct=correct,percent=percent,kappa=kappa) out }

altconfus <- function (clust, fitted) { if (inherits(clust,c('clustering','partana','pam'))) class <- clust$clustering if (is.logical(clust)) class <- as.numeric(factor(clust)) numplt<- length(class) numclu <- length(table(class)) pred <- apply(fitted,1,which.max) res <- matrix(0,nrow=numclu,ncol=numclu) for (i in 1:numplt) { res[class[i],pred[i]] <- res[class[i],pred[i]] + 1 } correct <- sum(diag(res)) percent <- correct/numplt rowsum <- apply(res,1,sum) colsum <- apply(res,2,sum) summar <- sum(rowsum*colsum) kappa <- ((numplt*correct) - summar) / (numplt^2 - summar) out <- list(confus=res,correct=correct,percent=percent,kappa=kappa) out }