Non-metric Multidimensional Scaling

Last lab (Lab 8) we employed an eigenvector technique to project a dissimilarity/distance matrix to fewer dimensions. It's more than I can explain here, but it's possible to prove that this projection is the best possible rigid geometric projection. That means that it is not possible to obtain a projection of strictly parallel lines from a higher dimension to a lower dimension that maximizes the variance of points along the axes as well as PCO.

However, it's not necessarily true that a rigid geometric projection is the best MAPPING, as opposed to projection, from a higher space to a lower. By mapping, it is meant simply that there is a one-to-one correspondence between the points in high-dimensional space and low dimensional space, but that we can violate the parallel lines of geometry at will. Rather than attempting to maximize variance along the axes, instead we can try to maximize the correlation of distances in the graph to the calculated dissimilarities/distances in the matrix.

In contrast to PCO, there is not a single best algorithm for achieving the maximum correlation, and many algorithms have been developed as solutions to this problem. We will focus on just one: Kruskal's non-metric multidimensional scaling (or NMDS). Kruskal's NMDS attempts to minimize something called "stress", the square root of the ratio of the squared differences between a monotonic transformation of the calculated dissimilarities/distances and the plotted distances and the sum of the plotted distances squared. It sound more complicated than it is. Essentially, it tries to maximize the rank correlation between the calculated dissimilarities/distances and the plotted distances, allowing tied distances to not have identical plotted distances, only sequential ranks.

NMDS employs an iterative algorithm where initial estimates of the positions of samples are adjusted to minimize the stress until further iterations do achieve a sufficient improvement. Accordingly, NMDS is somewhat sensitive to the initial positions, and in fact sometimes settles into a local optimum that is not the best solution. This seems especially common with high dimensional solutions (k>4). There are several approaches to minimize the problem. One approach is to try multiple random starts, and to keep the best result (lowest stress). You are never guaranteed that you have found the best solution, but if the majority of your results achieve similar low levels of stress, then you can be reasonably certain that you have at least found a GOOD solution, if not the theoretical best. Another alternative, and one used as the default in S-Plus and R, is to use a PCO as the initial position, and then to improve on that. We already know that PCO is the optimal geometric projection, so that starting there should be a good start. It also guarantees that the NMDS solution will be no worse than the PCO.

NMDS was contributed to S-Plus and R by Venables and Ripley as part of the MASS library, and is called isoMDS in their library. It is their efficient isoMDS routine that establishes the default of starting with a PCO. Once again, to make the syntax of methods more similar, and to simplify plotting, I have written a wrapper function called nmds() that calls isoMDS with the appropriate values.

If you haven't already completed Lab 8, go there now for the discussion of dissimilarity/distance matrices in R. NMDS works on a dissimilarity/distance matrix as input, and I won't repeat that section here.

Calculating a NMDS

To calculate a NMDS, simply use the nmds() function on a dissimilarity/distance matrix as follows.
bc.nmds <- nmds(dis.bc)
initial  value 28.310807 
iter   5 value 19.726103
iter  10 value 19.300332
iter  10 value 19.283586
iter  10 value 19.278411
final  value 19.278411 
converged
I continue the convention of naming ordination objects with an indication of the matrix used (dis.bc, thus bc) and the ordination algorithm (NMDS, thus .nmds). You can name your ordination object anything you like. The default output is to print the stress every five iterations until the algorithm converges, in this case at 19.278. nmds() used the Venables and Ripley MASS isoMDS() function to do the actual calculations, which calculates stress as: \[ s = \sqrt{{\sum_{i>j}^n( d_{ij} - D_{ij})^2} \over \sum_{i>j}^n D{ij}^2} \] where \(d_{ij}\) is dissismilarity of sample \(i\) to \(j\) and \(D\) is the distance between samples \(i\) and \(j\) in the Cartesian space of the ordination

To plot it, simply use the plot() function.

plot(bc.nmds)

As for PCO, when you calculate the NMDS you can specify the number of dimensions you want. In contrast to PCO, however, the first 2 dimensions of a 3 dimensional NMDS are not the same as a 2 dimensional NMDS. Remember, it's trying to minimize stress, and it will take advantage of however many dimensions you give it, and it's not a geometric projection. To specify more dimensions (default=2), simply specify the desired dimension as the second argument, e.g.

bc4d.nmds <- nmds(dis.bc,4)
initial  value 20.167548 
iter   5 value 11.744486
iter  10 value 10.632334
iter  15 value 10.444123
iter  15 value 10.434063
iter  15 value 10.433039
final  value 10.433039 
converged
plot(bc4d.nmds)

In this case, the first two dimensions of a four dimensional NMDS look remarkably like the PCO from last lab. A little later we'll learn how to compare them with Procrustes analysis.

To plot dimensions higher than 1 and 2 (the default), simply include the dimension number in the plot() function. E.g.

plot(bc4d.nmds,2,3)

How good a job does NMDS do? Well, one way to determine this is to look at the correlation between the calculated dissimilarities and the plotted values (after all, that's what it's trying to maximize). We can do that with the LabDSV ordcomp() function.

ordcomp(bc4d.nmds,dis.bc,dim=4)

The dots are plotted for each pair of sample plots, and the correlation is printed in the upper left corner (r = 0.946, not bad at all!). It's not quite linear, but remember, it's not a projection. In only two dimensions, it's

ordcomp(bc.nmds,dis.bc)
0.875, which is still pretty good. For comparison, the two dimensional PCO was 0.788. So, that's an improvement, but how does it translate to interpretation? Just as before, we can use a surf() function to look at the distribution of environmental variables on the ordination.
plot(bc.nmds)
surf(bc.nmds,elev)
Family: gaussian 
Link function: identity 

Formula:
var ~ s(x, y)


Estimated degrees of freedom:
18.643  total = 19.64347 

GCV score: 64371.64
D^2 =  0.8687 

That's a small increase in D^2 compared to the PCO (D^2 = 0.846). In my experience, the better configuration of NMDS generally leads to better fits of environment variables compared to PCO. Twenty tries from random starts produced one result with slightly lower stress than the PCO start.

To simplify mutliple random starts you can use the bestnmds() function in labdsv.

test <- bestnmds(dis.bc,k=4,itr=10)
initial  value 30.726818 
iter   5 value 27.140114
iter  10 value 26.822939
iter  15 value 24.819618
iter  20 value 21.003132
iter  25 value 18.854345
 .     .   .     .
 .     .   .     .
 .     .   .     .
initial  value 30.214216 
iter   5 value 27.136052
final  value 27.048426 
converged
 [1] 12.56452 27.07433 27.06398 26.98964 27.03203 27.03862 27.05357 27.01441
 [9] 10.14390 27.04843

best result = 9
with stress = 10.1439 

The y=matrix(runif(k*attr(dis,'Size')),ncol=k) means to create a matrix of uniform random numbers with k columns. The attr(dis,'Size') part says to have as many rows as the "Size" of the dissimilarity matrix. The maxit=100 means that the maximum number of iterations is 100. The default of isoMDS is 50, which is usually plenty for a PCO initial configuration, but random starts often take a while to settle down.
test <- bestnmds(dis.bc,k=4,itr=10)
initial  value 30.628838
iter   5 value 27.153214
final  value 27.020718
converged
initial  value 29.565576
iter   5 value 27.083334
final  value 27.006409
converged
   .       .       .
   .       .       .
   .       .       .
initial  value 30.168961
iter   5 value 27.148995
iter  10 value 26.891695
iter  15 value 24.792744
iter  20 value 19.792547
iter  25 value 16.762017
iter  30 value 14.486813
iter  35 value 13.165037
iter  40 value 12.296595
iter  45 value 11.683685
iter  50 value 11.234595
iter  55 value 10.885635
iter  60 value 10.606514
iter  65 value 10.378953
iter  70 value 10.253719
final  value 10.173393
converged
initial  value 30.453820
iter   5 value 27.152185
final  value 27.030635
converged
[1] "best result =  5"

The best result of the set is saved into whatever appears on the left side of the assignment arrow, in this case test.

The vegan library has a function called metaMDS that can also be used.

Functions Used In This Lab

nmds()

The nmds() function is simply a wrapper for Venables and Ripley's isoMDS function from the MASS library. The purpose is to give the result class "nmds" so we can use simple plotting and summary functions with the results. The number of dimension desired is specified with the k= argument. The maxit=50 establishes a maximum number of iterations. The default of 50 is usually sufficient if starting from the default PCO configuration, but may need to be increased if starting from a random configuration.

nmds <- function(dis,k=2,y=cmdscale(dis,k),maxit=50)
{
    tmp <- isoMDS(dis,y=y,k=k,maxit=maxit)
    class(tmp) <- "nmds"
    return(tmp)
}

plot.nmds

The plot() function for objects of class "nmds" provides a plot with an aspect ratio of 1.0, with axes 1 and 2 as the defaults. Other axes can be specified by number, the plotting symbol can be changed with pch=, the color can be specified with col=, and a title can be added with title=.

plot.nmds <- function (x, ax = 1, ay = 2, col = 1, title = "", pch = 1, ...) { if (class(x) != "nmds") stop("You must supply an object of class nmds from nmds") plot(x$points[, ax], x$points[, ay], asp = 1, col = col, xlab = paste("NMDS", ax), ylab = paste("NMDS", ay), pch = pch, main = title, ...) invisible() }

points.nmds()

The points function is used to identify specific points in the ordination by color col=, symbol pch= or symbol size cex= arguments. The axes specified (1 and 2 by default) must match those given [previously in the plot() command.
points.nmds <- function (x, which, ax = 1, ay = 2, col = 2, pch = 1, cex = 1, 
    breaks = FALSE, ...) 
{
    if (class(x) != "nmds") 
        stop("You must supply an object of class nmds from nmds")
    if (is.logical(which)) {
        points(x$points[, ax][which], x$points[, ay][which], 
            col = col, pch = pch, cex = cex, ...)
    }
    else if (is.numeric(which)) {
        if (breaks) {
            mask <- !is.na(which)
            cex = (which - min(which[mask]))/(max(which[mask]) - 
                min(which[mask])) * 5
        }
        else {
            cex = 1
        }
        points(x$points[, ax], x$points[, ay], col = col, pch = pch, 
            cex = cex, ...)
    }
}

bestnmds()

bestnmds() is a wrapper to function to facilitate running multiple random dtarts for nmds(). The itr = 20 argument says the default is to run 20 random starts and can be over-ruled by entering any desired number of random starts. The maxit = 100 specifies the masimum number of times each run can iterate.

bestnmds <- function (dis, k = 2, itr = 20, maxit = 100) 
{
    best <- 0
    strss <- rep(0, itr)
    minstr <- 99999
    for (i in 1:itr) {
        tmp <- nmds(dis, k = k, y = matrix(runif(k * attr(dis, 
            "Size")), ncol = k), maxit = maxit)
        strss[i] <- tmp$stress
        if (tmp$stress < minstr) {
            minstr <- tmp$stress
            best <- i
            out <- tmp
        }
    }
    print(strss)
    cat(paste("\nbest result =", best))
    cat(paste("\nwith stress =", format(out$stress, 4), "\n"))
    out
}

alt.bestnmds()

alt.bestnmds() is like bestnmds() except that it runs Peter Minchin's monoMDS function (see just below) instead of Ripley's isoMDS algorithm.

alt.bestnmds <- function (dis, k = 2, itr = 20, maxit = 100)
{
    best <- 0
    strss <- rep(0, itr)
    minstr <- 99999
    for (i in 1:itr) {
        tmp <- monomds(dis, k = k, y = matrix(runif(k * attr(dis,
            "Size")), ncol = k), maxit = maxit)
        strss[i] <- tmp$stress
        if (tmp$stress < minstr) {
            minstr <- tmp$stress
            best <- i
            out <- tmp
        }
    }
    print(strss)
    cat(paste("\nbest result =", best))
    cat(paste("\nwith stress =", format(out$stress, 4), "\n"))
    out
}

monomds()

monomds() is an alternative nmds algorithm written by Peter Minchin as implemented in the metaMDS() function in package vegan. It offers the advantage of weak treatment of ties, and often works better on highly heteogeneous datasets where many dissimilarities = 1.0. It requires package vegan to be loaded to run.

monomds <- function (dis,k=2,y=cmdscale(d=dis,k=k),maxits=200) 
{
    nobj <- attr(dis,'Size')
    nfix <- 0
    ndim <- k
    ndis <- length(dis)
    ngrp <- 1
    diss <- dis
    mat <- as.matrix(dis)
    iidx <- row(mat)[lower.tri(mat)]
    jidx <- col(mat)[lower.tri(mat)]
    #xinit <- runif(nobj*k)-0.5
    xinit <- y
    istart <- 1
    isform <- 1
    ities <- 1
    iregn <- 1
    iscal <- 1
    sratmx <- 0.99999
    strmin <- 1e-07
    sfgrmn <- 1e-05
    dist <- rep(0,ndis)
    dhat <- rep(0,ndis)
    x <- matrix(0,nrow=nobj,ncol=k)
    stress <- 1
    strs <- ngrp
    iters <- 1
    icause <- 1 
    out <- .Fortran('monoMDS',
           as.integer(nobj),
           as.integer(nfix),
           as.integer(ndim),
           as.integer(ndis),
           as.integer(ngrp),
           as.double(diss),
           as.integer(iidx),
           as.integer(jidx),
           as.double(xinit),
           as.integer(istart),
           as.integer(isform),
           as.integer(ities),
           as.integer(iregn),
           as.integer(iscal),
           as.integer(maxits),
           as.double(sratmx),
           as.double(strmin),
           as.double(sfgrmn),
           dist=as.double(dist),
           dhat=as.double(dhat),
           points=as.double(x),
           stress=as.double(stress),
           as.double(strs),
           iters=as.integer(iters),
           cause=as.integer(icause))
    res <- list(points=matrix(out$points,ncol=k),stress=out$stress)
    class(res) <- 'nmds'
    res
}

adpoints.nmds()

addpoints.nmds is a fucntion to add new points to an existing NMDS without changing the locations of the existing points.

addpoints.nmds <- function (nmds,dis,neighbors=5,maxits=200)
{
# bring list component to local matrix

    points <- nmds$points

# get sizes

    oldn <- nrow(points)
    ndim <- ncol(points)
    totn <- attr(dis,'Size')
    newn <- totn - oldn

# test correspondence

    if (!identical(dimnames(points)[[1]],attr(dis,'Labels')[1:oldn]))
         stop('ordination and dissimilarity matrix do not match')

# decompose disimilarity object to isolate new values

    diss <- as.matrix(dis)[1:oldn,(oldn+1):totn]
    
# set up initial conditions

    ndis <- oldn * newn
    tmp <- matrix(rep(0,newn*ndim),ncol=ndim)
    for (i in 1:newn) {
        pnt <- seq(1,oldn)[order(diss[,i])][1:neighbors]
        weight <- 1-diss[pnt,i]
        for (j in 1:ncol(points)) {
            tmp[i,j] <- weighted.mean(points[pnt,j],w=weight)
        }
    }
    
    xinit <- rbind(points,tmp)
    dimnames(xinit)[[1]] <- attr(dis,'Labels')
# set up indices

    iidx <- rep((1:oldn),newn)
    jidx <- NULL
    for (i in (oldn+1):totn) jidx <- c(jidx,rep(i,oldn))
#set up ordination

    nfix <- oldn
    ngrp <- 
    istart <- 1
    isform <- 2
    ities <- 1
    iregn <- 1
    iscal <- 0
    sratmx <- 0.99999
    strmin <- 1e-07
    sfgrmn <- 1e-05
    dist <- rep(0,ndis)
    dhat <- rep(0,ndis)
    x <- matrix(0,nrow=totn,ncol=ndim)
    stress <- 1
    strs <- ngrp
    iters <- 1
    icause <- 1 
    maxits <- maxits
    iwork <- rep(0,ndis)
    grad <- matrix(0,nrow=totn,ncol=ndim)
    grlast <- matrix(0,nrow=totn,ncol=ndim)

    out <- .Fortran('monoMDS',
           nobj=as.integer(totn),
           nfix=as.integer(nfix),
           ndim=as.integer(ndim),
           ndis=as.integer(ndis),
           ngrp=as.integer(ngrp),
           diss=as.double(diss),
           iidx=as.integer(iidx),
           jidx=as.integer(jidx),
           xinit=as.double(xinit),
           istart=as.integer(istart),
           isform=as.integer(isform),
           ities=as.integer(ities),
           iregn=as.integer(iregn),
           iscal=as.integer(iscal),
           maxits=as.integer(maxits),
           sratmx=as.double(sratmx),
           strmin=as.double(strmin),
           sfgrmn=as.double(sfgrmn),
           dist=as.double(dist),
           dhat=as.double(dhat),
           points=as.double(x),
           stress=as.double(stress),
           strs=as.double(strs),
           iters=as.integer(iters),
           cause=as.integer(icause))
    res <- list(points=matrix(out$points,ncol=ndim),
                newpoints=matrix(out$points,ncol=ndim)[(oldn+1):totn,],
                stress=out$stress,
                iters=out$iters,cause=out$cause)
    dimnames(res$points)[[1]] <- attr(dis,'Labels')       
    dimnames(res$newpoints)[[1]] <- attr(dis,'Labels')[(oldn+1):totn]
    class(res) <- 'nmds'
    res 
}

ellip.nmds()

ellip.nmds() is a function to replace chullord by plotting minimum area ellipses that contain all points of a given level of a factor or classification. It was written in response to students in my class requesting an aternative to chullord.

ellip.nmds <- function (ord, overlay, ax = 1, ay = 2, cols = c(2, 3, 4, 5, 
    6, 7), ltys = c(1, 2, 3), ...) 
{
    if (class(ord) != "nmds") 
        stop("You must pass an object of class nmds")
    require(cluster)
    if (inherits(overlay, c("partana", "pam", "clustering"))) {
        overlay <- overlay$clustering
        if (min(overlay) < 0 || (length(table(overlay)) != max(overlay))) {
            cat("WARNING: renumbering clusters to consecutive integers\n")
            overlay <- match(overlay, sort(unique(overlay)))
        }
    }
    else if (is.logical(overlay)) 
        overlay <- as.numeric(overlay)
    else if (is.factor(overlay)) 
        overlay <- as.numeric(overlay)
    pass <- 1
    layer <- 0
    lty <- ltys[pass]
    for (i in 1:max(overlay, na.rm = TRUE)) {
        x <- ord$points[, ax][overlay == i & !is.na(overlay)]
        y <- ord$points[, ay][overlay == i & !is.na(overlay)]
        pts <- chull(x, y)
        layer <- layer + 1
        if (layer > length(cols)) {
            layer <- 1
            pass <- min(pass + 1, length(ltys))
        }
        col <- cols[layer]
        lty = ltys[pass]
        x <- as.matrix(cbind(x[pts], y[pts]))
        elp <- ellipsoidhull(x)
        lines(predict(elp),col=col)
    }
}