consensusCluster {compdiagTools}R Documentation

Consensus Clustering

Description

This function performs k-means clustering and generates a consensus matrix.

Usage

consensusCluster(e, nclass = 2, nsamplings = 100, npersample = NULL, 
                 hclust.init=TRUE, ...)

Arguments

exprs A matrix of expression levels, rows are considered features, and columns represent observations. Clustering is performed on the observations.
nclass Number of classes to split the data into.
nsamplings Number of subsamplings to draw to compute frequencies in the consenmsus matrix.
npersample Number of samples to draw per sampling. Default is 80
hclust.init
logical. Whether kmeans is to be initialized to centroids determined using hierarchical clustering.
...
Futher parameters passed on to kmeans
consensusMatrix The consensus matrix itself
clusterConsensus consensus summary by cluster
itemConsensus consensus summary by data item

{ references{ S. Monti, P. Tamayo, J. P. Mesirov, T. R. Golub. Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52:1-2, 2003, pp. 91-118. }

Details

This function performs standard k-means clustering on the data matrix provided. kmeans is initialized using the top most branches of a hierarchical clustering dendrogramm. It returns the result as is from the k-means method. In addition a consensus matrix is computed from splits of subsamplings of the data. In this matrix each row corresponds to a observation and so does each column. Rows and columns are ordered in same way such that observations from the same overall cluster are adjacent. In each position of the matrix, the ratio is stored, how often the two corresponding samples are attributed to the same cluster by k-means.

For stable clusterings, we expect that the consensus matrix just holds values close to zero and close to one. square blocks of ones are expected along the diagonal of the matrix. As summary statistics, a value per cluster and a value per observation and cluster is computed. To measure the stability of a cluster, the average consensus values from the consensus matrix are taken for all pairs of observations where both partners belong to the cluster. Moreover, for observation o and cluster C, the average consensus values is computed for all other members of cluster C.

Value

The function returns an object of class consensusCluster. This object contains all items returned by k-means and adds:

consensusMatrix The consensus matrix itself
clusterConsensus consensus summary by cluster
itemConsensus consensus summary by data item

Author(s)

Claudio Lottaz

References

S. Monti, P. Tamayo, J. P. Mesirov, T. R. Golub. Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52:1-2, 2003, pp. 91-118.

See Also

image.consensusCluster

Examples

# load data
library(ALL)
data(ALL)

# very clear cluster
samples <- 63:128
vars <- apply(exprs(ALL)[,samples], 1, var)
re <- exprs(ALL)[rank(vars)>12525, samples]
consClust <- consensusCluster(e, nclass=2, nsamplings=100)
image(consClust, col=grey(seq(1,0,-0.01)))

# blurry cluster
samples <- 1:66
vars <- apply(exprs(ALL)[,samples], 1, var)
e <- exprs(ALL)[rank(vars)>12525, samples]
consClust <- consensusCluster(e, nclass=2, nsamplings=100)
image(consClust, col=grey(seq(1,0,-0.01)))

[Package compdiagTools version 1.5.3 Index]