Anja Wille, Seminar für Statistik, ETH Zürich.
Recent advances in DNA microarray technologies have made it possible to simultaneously measure expression levels of thousands of genes. Elucidating patterns from these gene expression profiles is expected to provide great insight into gene function and regulatory systems.
Among commonly used models for genetic regulatory networks (Boolean networks, differential equations, and probabilistic approaches), graphical gaussian models (GGM) form one tool to generate and evaluate hypotheses on complex genetic control mechanisms and network topologies. In the model, partial correlation coefficients are estimated to draw conclusions on conditional dependence/independence between genes.
Graphical models are very powerful for a small number of genes. However, as the number of genes increases conditioning on all variables requires a large number of observations - a rarely fulfilled prerequisite in gene expression profiling. Also, an increasing number of genes often entails a large number of edges in the model which makes an interpretation of the graph rather difficult. The major appeal of graphical models, i.e. to use conditional independence to describe the network topology, is then lost. To overcome these problems, one may focus on graphical modeling of small subnetworks with only three genes and then combine these subnetworks to draw conclusions on the original network. I will describe three different versions of this modified GGM approach and apply them to genetic networks in the plant Arabidopsis thaliana.