1. Package fso
  2. FSO
    1. Calculating an FSO
    2. Plotting an FSO
    3. Statistical Significance
  3. MFSO
    1. Multidimensional FSO
    2. Calculating an MFSO
    3. Plotting an MFSO
  4. Iterative Improvement
  5. A Priori/Screening
  6. Summary
  7. References
  8. Functions

Fuzzy Set Ordination

Fuzzy Set Ordination (FSO) is a technique of ordination quite different from those we have explored so far. First, as the name implies, it is based on the mathematics of fuzzy set theory, rather than matrix or linear algebra. In addition, the protocol for FSO encourages a more interactive approach, where successive refinement of initial hypotheses about the controls on vegetation composition can be employed.

Up to this point we have generally been preforming ordination analyses according to the protocol of:

With the exception of RDA (Lab 7), the configuration of the points in PCA, PCO, and NMDS ordinations has been completely determined by the vegetation compositional data. Alternatively, for CCA and DB-RDA the configuration of points is first calculated as a CA or PCO respectively, which is then subjected to weighted regression against environmental or experimental variables, keeping the fitted values of the regression as coordinates. Accordingly, CCA and DB-RDA are referred to as "constrained ordinations" as they constrain the values of the underlying ordination to achieve what is typically referred to as their "canonical" axes. FSO is the first technique that directly incorporates the environmental data into the calculation of the configuration.

The initial application of FSO to single variables bears more resemblance to the surface fitting routines we have employed so far than to the dimensional mapping routines. That is, when we calculate an FSO on elevation, for example, the result is an estimate of the relative elevation for each plot, based on its composition. If a variable is influential in determining or constraining the distribution of vegetation, then we ought to be able to estimate the value of that variable based on the composition. This is an approach called "calibration" by Ter Braak and colleagues (19xx). Whether or not the ordination achieves an efficient mapping from high-dimensional space to low dimensional space is a secondary consideration, and generally requires solving for multiple dimensions. All this is more easily understood by example.

The FSO is calculated with a function called fso(), which requires two arguments: a variable (or multiple variables in a formula or data.frame) and a similarity matrix. Because the convention in R is to work with dist objects, fso() automatically converts the dist object from dist(), dsvdis() or vegdist to a full similarity matrix as required. From our previous work with other methods, we have good reason to expect that elevation and slope may be important, so let's start with elevation.

First, let's get the fso package with the necessary functions. If you are running Windows, simply use the "packages" drop-down menu feature to get the most recent version, and then enter library(fso). If you are running linux/unix, got to CRAN and select the "packages" link. Scroll down to fso_2.0-1 (unless there is a newer version) and click on the link to fetch the package. As root (if possible), enter R CMD INSTALL fso_2.0-1.tar.gz to install. Then inside of R, enter library(fso).

As we have throughout the previous labs, we'll analyze the Bryce Canyon data. So, if you haven't already

dis.bc <- dsvdis(bryceveg,'bray/curtis')
elev.fso <- fso(elev,dis.bc)

fso() does its work silently. to examine the results, use

plot(elev.fso,title="Bryce Canyon",xlab="Elevation",ylab="Apparent Elevation")

As you can see, the relationship between elevation and "apparent elevation" is pretty good, with a correlation of 0.83 (upper left corner). That means that on average we can predict the elevation of a sample from its composition fairly well. There are some interesting outliers, though, that bear looking into. How about slope?

slope.fso <- fso(slope,dis.bc)
plot(slope.fso,title="Bryce Canyon")

Hmmm. While a correlation of 0.35 is not too bad, the distribution leaves something to be desired. The problem is that the distribution of sample plots is not remotely uniform for slope, with many more plots on gentle slopes than steep. As a possible fix, we can try

slope.log.fso <- fso(log(slope+1),dis.bc)
The distribution gets better (try it and see), but the correlation actually declines a little.

Despite the emphasis on a priori concepts about the control of vegetation composition by environment it has proven helpful to write routines that facilitate rapid screening and analysis of a broad range of possible variables. fso() will accept a formula or dataframe as its first argument, and perform sequential FSOs on the variables. For example,

grads.fso <- fso(~elev+av+slope+annrad+grorad,dis.bc)
Notice that there is no dependent variable. The formula starts out with the ~.

Fuzzy set statistics

  col variable         r            p         d
1   1     elev 0.8303255 0.000000e+00 0.6119908
2   3    slope 0.3495267 5.891172e-06 0.3208229
3   4   annrad 0.3050769 8.758970e-05 0.2255343
4   5   grorad 0.3013482 1.078098e-04 0.5811717
5   2       av 0.1402331 7.694954e-02 0.1985226

Correlations of fuzzy sets

             elev         av      slope     annrad     grorad
elev    1.0000000  0.5936768 -0.5018290 -0.7666193 -0.8255718
av      0.5936768  1.0000000 -0.6119128 -0.9184280 -0.5206253
slope  -0.5018290 -0.6119128  1.0000000  0.6292742  0.1255797
annrad -0.7666193 -0.9184280  0.6292742  1.0000000  0.7297448
grorad -0.8255718 -0.5206253  0.1255797  0.7297448  1.0000000
Notice that the summary lists the variables in order by their correlation coefficient, and includes the probability of obtaining the correlation by chance (more on this below). It also includes the correlation of the pair-wise distances among points in the ordination with the dissimilarity matrix (d).

Finally, in the bottom matrix it presents a correlation matrix of the derived fuzzy sets.

To plot one of the FSOs, specify it by name or column number.


The estimated p-values are computed from a Normal approximation with the appropriate numbers of degrees of freedom. Nonetheless, in my experience they have proven somewhat problematic (slightly high type I error rate), and I have included a permutation-based statistic as an alternative. Essentially, the routine permutes the input vector (permute -1 times), calculates an FSO, and stores the correlation in a vector. The statistic reported is

sum(cors >= obs) + 1 / (number of reps)
This is a conservative estimate of the p-value. If you ask for 1000 permutations, and never in 999 permutations is the correlation equaled or exceeded, then
(0 + 1) / 1000 = 0.001
The number of desired permutations is given with the permute= argument, so that
grads.fso <- fso(~elev+av+slope+annrad+grorad,dis.bc,permute=1000)
Fuzzy set statistics

  col variable         r     p         d
1   1     elev 0.8303255 0.001 0.6119908
2   3    slope 0.3495267 0.001 0.3208229
3   4   annrad 0.3050769 0.001 0.2255343
4   5   grorad 0.3013482 0.001 0.5811717
5   2       av 0.1402331 0.064 0.1985226
gives the permuted probabilities rather than the Normal approximation. In this case it doesn't change our interpretation any. It is, however, much slower to compute.

Multidimensional Plots

The typical FSO plot so far shows the calculated fuzzy set membership values against the original data, similar to a regression plot. How do we get a multidimensional plot? Use the function mfso() for "multi-dimensional fuzzy set ordination."
grads.mfso <- mfso(~elev+slope+grorad,dis.bc)
There are several options to control the relative scaling of MFSOs. The default (scaling=1, lm=TRUE) plots the actual mu values (memberships in fuzzy sets) for the first variable, and then the regression residuals of subsequent mu values on all previous axes, similar to a Gramm-Schmidt algorithm. The last step guarantees a second axis uncorrelated with the first. The alternative scalings are:

1no scaling, use actual mu values
2relativize both fuzzy sets [0,1]
3center and scale proportional to their correlation coefficient

In either case, the regression step (and subsequent rescaling of the second axis to the residuals of the regression) is controlled by lm=.

To see the result, simple use the plot function.


Because in this case the two sets were not very strongly correlated, the second set retains a fairly broad range of mu values. For sets that are more strongly correlated, the Y axis may have a much more restricted range. How good a job does this mapping do? The dis= argument allows you to specify a dissimilarity/distance matrix to compare the ordination distances to. If specified, the values show up in the last plot.


In this case, the correlation between ordination distances and dissimilarities in the original matrix is 0.721. Looking back to Lab 8, the PCO two-dimensional projection of the Bray/Curtis dissimilarity matrix achieved 0.788, so our two-dimensional FSO is almost as good as a PCO in terms of faithful representation of dissimilarities, and has the added benefit of an obvious ecological interpretation.

The alternative, using scaling=2 turns out slightly better.

grads2.mfso <- mfso(~elev+slope+grorad,dis.bc,scaling=2)

Slope and grorad is also an interesting combination.

fso.2 <-mfso(~slope+grorad,dis.bc) 

Iterative Improvement

Each FSO is an estimate of the environmental characteristics we would expect for a site based on its vegetation composition. If we view the operation as similar to regression, then we can ask, "what explains the variability along the FSO," or what explains the "residuals." To find out, we'll use the lm() linear model routine included in R.
elev.fso.lm <- lm(elev.fso$mu~elev)
lm(formula = elev.fso$mu ~ elev)

              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.819e-01  3.856e-02  -4.717 5.23e-06 ***
elev         9.162e-05  4.892e-06  18.729  < 2e-16 ***
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1 

Residual standard error: 0.03801 on 158 degrees of freedom
Multiple R-Squared: 0.6894,     Adjusted R-squared: 0.6875 
F-statistic: 350.8 on 1 and 158 degrees of freedom,     p-value:     0 
Obviously, the relation is pretty close (R^2=0.6875), but we already knew that. What we want to do is to explain the scatter of the residuals.
elev.fso.res <- residuals(elev.fso.lm)
Positive residuals are plots that appear to be at a higher elevation than they really are, and negative residuals are plots that appear to be at a lower elevation than they really are. One possible explanation is that sites with high radiation loads (e.g. south-facing slopes) would appear to be lower than sites with low radiation loads (e.g. north-facing slopes). We'll compare growing season radiation (grorad) with the elevation FSO residuals using the lm() routine again.
lm(formula = elev.fso.res ~ grorad)

             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.401664   0.078248   5.133 8.27e-07 ***
grorad      -0.002460   0.000479  -5.136 8.15e-07 ***
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1 

Residual standard error: 0.03519 on 158 degrees of freedom
Multiple R-Squared: 0.1431,     Adjusted R-squared: 0.1377 
F-statistic: 26.38 on 1 and 158 degrees of freedom,     p-value: 8.148e-07 
As we expected, high radiation is negatively associated with the residuals, and highly significant. Thus, while growing season radiation was not a strong relation on its own (r=0.301) it is a strong modifier of elevation effects. To look at the effect, we can compare the largest effect of growing season radiation to the distribution of the residuals.
(-0.002460 * (max(grorad)-min(grorad))) / (max(elev.fso.res)-min(elev.fso.res))
[1] -0.5161084
Growing season radiation is not uniformly distributed, so its total effect size is obviously smaller than the maximum contribution calculated (51% of the range of the residuals), but this gives us a basis for comparison.

How about topographic position? We'll add it to growing season radiation to look for conditional significance. The R lm() routine handles categorical variables like pos without recoding, establishing contrasts to the first category.

lm(formula = elev.fso.res ~ grorad + pos)

       Min         1Q     Median         3Q        Max 
-0.0665706 -0.0239119 -0.0006207  0.0232542  0.0939038 

               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.3714292  0.0816540   4.549 1.09e-05 ***
grorad       -0.0023628  0.0004907  -4.816 3.48e-06 ***
poslow_slope  0.0233538  0.0099608   2.345   0.0203 *  
posmid_slope  0.0148793  0.0093715   1.588   0.1144    
posridge      0.0165458  0.0113768   1.454   0.1479    
posup_slope   0.0118876  0.0098836   1.203   0.2309    
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1 

Residual standard error: 0.03499 on 154 degrees of freedom
Multiple R-Squared: 0.174,      Adjusted R-squared: 0.1472 
F-statistic: 6.488 on 5 and 154 degrees of freedom,     p-value: 1.675e-05 
The adjusted R^2 only improved a little, but ecologically it makes some sense. The biggest contributor is low_slope, which causes plots to appear to be at a higher elevation than they are, due I suspect to increased soil depth and finer soil textures. Not too surprisingly, slope is not important (not shown). Collectively, though, topographic position doesn't look to be significant. Let's test.
Analysis of Variance Table

Response: elev.fso.res
           Df   Sum Sq  Mean Sq F value    Pr(>F)    
grorad      1 0.032666 0.032666 26.6776 7.337e-07 ***
pos         4 0.007056 0.001764  1.4406    0.2232    
Residuals 154 0.188566 0.001224                      
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1 
Collectively, topographic position is not significant.

To look at the effects of soil depth, remembering from Lab 2 that soil depth is correlated with elevation, we need to correct for elevation first. We'll run the summary of the regression first to get the signs and coefficients and then the anova to check for sequential significance.


lm(formula = elev.fso.res ~ elev + depth)

      Min        1Q    Median        3Q       Max 
-0.075054 -0.024821 -0.002617  0.020813  0.092883 

               Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -1.120e-02  3.683e-02  -0.304    0.761    
elev         -1.016e-06  4.729e-06  -0.215    0.830    
depthshallow  2.829e-02  5.839e-03   4.845 3.27e-06 ***
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1 

Residual standard error: 0.03325 on 142 degrees of freedom
Multiple R-Squared: 0.1467,     Adjusted R-squared: 0.1347 
F-statistic:  12.2 on 2 and 142 degrees of freedom,     p-value: 1.285e-05 
Analysis of Variance Table

Response: elev.fso.res
           Df   Sum Sq  Mean Sq F value    Pr(>F)    
elev        1 0.001033 0.001033  0.9341    0.3354    
depth       1 0.025951 0.025951 23.4743 3.274e-06 ***
Residuals 142 0.156981 0.001105                      
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1 
By entering elevation first we eliminate the effect of correlation with elevation, and see that soil depth is indeed significant. Surprisingly, though, it's shallow soils that appear to be at a higher elevation than they really are. This, I believe, is due to a correlation between soil depth and texture (shallow soils are finer texture) but the data are not present to test that hypothesis. Looking at the coefficient, and remembering that categorical variables are coded as 0 or 1, the effect of shallow soil on elevation residuals is only +0.029. It's statistically significant, but ecologically small. Compared to the range of residuals,
[1] -0.06672459 0.10010083
0.0289 / (0.10010083 - -0.06672459)
[1] 0.173235
it's about 17% of the range.

A Priori hypotheses and variable screening

In the original paper (Roberts 1986) Roberts emphasized the use of FSO in hypothesis testing and iterative analysis. That is, unlike constrained ordinations, in the absence of a specified variable to test, there is no ordination to view or analyze. Nonetheless, FSO has proven very useful for rapid screening of potential variables and step-wise model fits. Accordingly, the fso package now includes a forward step-wise model fitting function called step.mfso(). step.mfso() is a little primitive compared to the typical R step functions, but is designed to handle MFSO directly. To use step.mfso() there are five arguments.

disthe dissimilarity/distance objectno default
startthe variables in the initial model as a data.frameno default, can be NULL
addthe variables to consider for addition to the model as a data.frameno default
numitrthe number of random permutations to use to calculate p-valuesdefault = 100
scalingthe scaling to use on MFSO (see above)default = 1

Here's an example of step.mfso() in use

Baseline =  0 
  variable delta_cor p_val
1     elev 0.6119908  0.08
2    slope 0.3208229  0.66
3       av 0.1985226  0.90
4   grorad 0.5811717  0.08
5   annrad 0.2255343  0.82
The results suggest that elev gives the best result, resulting in a correlation between the differences in fuzzy elevation and the Bray-Curtis dissimilarity of 0.612. It's only marginally significant (p=0.08), but it is often the case that a single variable is insufficient to structure a high-dimensional dissimilarity matrix. To go to the next step, move elev from the add data.frame to the start data.frame and rerun the function.
Baseline =  0.6119908 
  variable    delta_cor p_val
1    slope  0.101305309  0.01
2       av -0.001551554  0.98
3   grorad  0.050204388  0.01
4   annrad -0.002196274  1.00
Notice how when we began with the null model, the baseline was 0. After adding elev to the model the baseline went to 0.6119908, and now variables are considered in addition to that value. slope shows the strongest effect (0.101305309, p-value < 0.01). Notice also how av has a negative value associated with it. In contrast to multiple linear regression (as employed in CCA and DB-RDA) adding a variable will not always improve the model, and may make it worse. To continue, move slope from the add data.frame to the start data.frame.
Baseline =  0.7132961 
  variable     delta_cor p_val
1       av -0.0014125421  0.96
2   grorad  0.0078175173  0.04
3   annrad -0.0009994769  0.99
Notice how the baseline has increased from 0.6119908 to 0.7132961. This is the 0.6119908 attributable to elev plus the 0.101305309 from slope. Adding grorad will increase the correlation by 0.0078175173, which is a little over 1%. Whether or not it is worth adding a third dimension to achieve a 1% improvement is debatable. In addition, the p-values are estimated by permutation tests, and I know that if I rerun the step function a borderline p-value may change, so I'll up the number of iterations to get a more stable estimate of p.
Baseline =  0.7132961 
  variable     delta_cor p_val
1       av -0.0014125421 0.906
2   grorad  0.0078175173 0.035
3   annrad -0.0009994769 0.993
That took several minutes to run on a fast computer, but did clarify somewhat the p-value for grorad.

The outcome of the step-wise function in this case gives us the same model we had analyzed above (elev then slope) with arguably a third dimension for grorad. To screen large numbers of variables it is sometimes advisable to set numitr=1 during the variable selection stage, and then put just the selected variable in the add data.frame for permutation testing.


We have a primary gradient of elevation, with local effects of solar radiation and soil depth on the importance of elevation. Together apparent elevation and apparent slope achieve a 0.747 correlation of ordination distance to dissimilarity. The effect of growing season radiation is stronger than soil depth (Sum Sq. 0.032666 vs 0.026932, maximum effect 51 vs 17%).

We also have a primary gradient for slope which is nearly independent of the elevation gradient. Regressing the residuals of the slope FSO (not shown) only showed one statistically significant effect (annrad), such that sites with high annual radiation appeared to be slightly steeper than they really are. This appears to be because radiation is an asymmetric function of aspect, where level sites have radiation budgets more similar to south slopes than to north slopes.

We don't have sufficient climate data to tease apart the elevation effect; it combines both temperature and presumably precipitation, but we can't partition the two. It's an example of what Austin and Smith (19xx) would call a "complex gradient." The slope gradient is also probably best viewed as a complex gradient. The ecological characteristics of slope (apart from aspect effects ON slopes) is likely soil erosion and consequent soil texture effects. Steep slopes are likely to suffer from removal of fine-textured soil particles and run-off of precipitation, leading to drier sites than level sites at equivalent elevations.

FSO is thus useful for analysis of primary gradients, as well as nested secondary effects within primary gradients. The multi-FSO plots are nearly as efficient as PCO, with much more immediate ecological interpretation possible.


Roberts, D.W. 1986. Ordination on the basis of fuzzy set theory. Vegetatio 66:123–131.

Roberts, D.W. 2008. Statistical analysis of multidimensional fuzzy set ordination. Ecology 89:1256-1260.

Roberts, D.W. 2009. Comparison of multidimensional fuzzy set ordination with CCA and DB-RDA. Ecology 90:2622-2634.

plot.fso <- function (x, which = "all", xlab = x$var, ylab = "mu(x)", title = "", 
    r = TRUE, pch = 1, ...) 
    if (class(x) != "fso") {
        stop("You must specify n object of class fso from fso()")
    if (is.matrix(x$mu)) {
        if (which == "all") {
            for (i in 1:ncol(x$mu)) {
                if (is.na(x$r[i])) {
                  cat(paste("variable ", x$var[i], " has missing values \n"))
                else {
                  plot(x$data[, i], x$mu[, i], xlab = xlab[i], ylab = ylab, 
                    main = title)
                  if (r) {
                    ax <- min(x$data[, i])
                    ay <- max(x$mu[, i])
                    text(ax, ay, paste("r = ", format(x$r[i], 
                      digits = 3)), pos = 4)
                readline("\nHit Return for Next Plot\n")
        else if (is.numeric(which)) {
            for (i in which) {
                plot(x$data[, i], x$mu[, i], xlab = xlab[i], ylab = ylab, 
                  main = title)
                if (r) {
                  ax <- min(x$data[, i])
                  ay <- max(x$mu[, i])
                  text(ax, ay, paste("r = ", format(x$r[i], digits = 3)), 
                    pos = 4)
                readline("\nHit Return for Next Plot\n")
        else {
            for (j in 1:ncol(x$mu)) {
                if (which == x$var[j]) 
                  plot(x$data[, j], x$mu[, j], xlab = xlab[j], ylab = ylab, 
                    main = title)
                if (r) {
                  ax <- min(x$data[, j])
                  ay <- max(x$mu[, j])
                  text(ax, ay, paste("r = ", format(x$r[j], digits = 3)), 
                    pos = 4)
    else {
        plot(x$data, x$mu, xlab = xlab, ylab = ylab, main = title)
        if (r) {
            ax <- min(x$data)
            ay <- max(x$mu)
            text(ax, ay, paste("r = ", format(x$r, digits = 3)), 
                pos = 4)
dummify <- function (var) 
    if (!is.factor(var)) stop('You must pass a factor variable')

    cols <- length(levels(var))
    out <- matrix(0,nrow=length(var),ncol=cols)
    for (i in 1:cols) {
        out[,i] <- var == levels(var)[i]
    out <- data.frame(out)
    names(out) <- levels(var)
ellip.mfso <- function (ord, overlay, ax = 1, ay = 2, cols = c(2, 3, 4, 5, 
    6, 7), ltys = c(1, 2, 3), ...) 
    if (class(ord) != "mfso") 
        stop("You must pass an object of class mfso")
    if (inherits(overlay, c("partana", "pam", "clustering"))) {
        overlay <- overlay$clustering
        if (min(overlay) < 0 || (length(table(overlay)) != max(overlay))) {
            cat("WARNING: renumbering clusters to consecutive integers\n")
            overlay <- match(overlay, sort(unique(overlay)))
    else if (is.logical(overlay)) 
        overlay <- as.numeric(overlay)
    else if (is.factor(overlay)) 
        overlay <- as.numeric(overlay)
    pass <- 1
    layer <- 0
    lty <- ltys[pass]
    for (i in 1:max(overlay, na.rm = TRUE)) {
        x <- ord$mu[, ax][overlay == i & !is.na(overlay)]
        y <- ord$mu[, ay][overlay == i & !is.na(overlay)]
        pts <- chull(x, y)
        layer <- layer + 1
        if (layer > length(cols)) {
            layer <- 1
            pass <- min(pass + 1, length(ltys))
        col <- cols[layer]
        lty = ltys[pass]
        x <- as.matrix(cbind(x[pts], y[pts]))
        elp <- ellipsoidhull(x)