A Practical Approach to Inferring Graphical Models from Short Microarray Time Series Data

Juliane Schäfer, Dept. of Statistics, University of Munich.

Graphical models are convenient tools to describe genetic networks as they allow to separate direct from indirect genetic interactions and at the same time provide a unifying statistical framework for inference. Unfortunately, graphical models are not straightforward to apply to microarray time series data. A major problem with these data is the dimensionality problem, i.e. the number of considered genes (= nodes in the resulting graph) typically exceeds by far the available sample size (usually of order 10-20). This renders many standard algorithms for graphical models impractical or inefficient. As a consequence, graphical models are often used only on clusters representing "super" or "meta" genes.

Here we suggest an alternative strategy: focusing on one of the simplest classes of graphical models (Graphical Gaussian Models) we develop a new short sample size estimator for the partial correlation between two genes. Using simulated data, where we make the realistic assumption of a sparse "true" network, we then show that our estimator exhibits a remarkably low error even when the actual sample size is small compared to the size of the net. Subsequently, the complete network structure may then be inferred using a multiple testing procedure based on an exact correlation test where the effective degree of freedom is estimated from the data. The latter step also allows to take account of possible longitudinal correlation (i.e. it also works for non iid data). Moreover, comparative to other approaches this inference procedure is also computationally very efficient. All algorithms and tests have been implemented in the R package "GeneTS" (version 2.0), available from the authors homepage.