Home
Program
Abstracts


Overnight accomodations
Location
Visiting Berlin


Computational Diagnostics group

Berlin Center for Genom Based Bioinformatics
NGFN
GMDS
IBG

International BCB-Workshop
on Machine Learning in Bioinformatics

October 10, 2005

Magnus-Haus, Am Kupfergraben 7, 10117 Berlin

GMDS IBG

Cesare Furlanello, ITC-irst Trento

Semi-supervised learning for molecular profiling

Abstract:

Class prediction and feature selection are two learning tasks that are strictly paired in the search for molecular profiles. It is easy to incur a selection bias effect when dealing with high throughput data. Complex validation setups and computational resources are thus required to avoid overly optimistic estimates of the predictive accuracy on novel data and the selection of incorrect gene lists. In this talk, I will discuss how to reuse and analyze the by-products of the validation of microarray studies when profiling with supervised machine learning. A new class of semi-supervised methods for analyzing the stability of the gene lists and for discovering outliers and potential subtypes will be presented. (Joint work with G. Jurman and S. Merler)
Alex Hartemink, Durham, NC

Bayesian machine learning and informative priors in computational systems biology

Abstract:

I have primarily been interested in two different kinds of learning tasks at the intersection of machine learning and computational systems biology: network inference and classification. This talk will present a collection of results in both, omitting some of the details as they are available in our papers over the years. Instead, I will present the methodological highlights and biological results, as part of an effort to summarize the whole endeavor. In particular, I believe the Bayesian statistical paradigm to be compelling for a variety of reasons, not least of which is the ability to naturally incorporate prior information into the learning process. Yet, the vast majority of practitioners of Bayesian methods in computational biology formulate and learn models with uninformative priors (e.g., a uniform distribution, which turns a maximum a posteriori criterion into a traditional maximum likelihood criterion). I will attempt to make a case for the use of informative priors of various kinds, and demonstrate how they can be used to guide and improve the accuracy of our models in a variety of settings.
Jason Hsu, The Ohio State University

Statistical Design and Multiple Testing Analysis of Gene Expression Levels from Microarray Experiments

Abstract:

Microarray experiments are no longer strictly exploratory in nature. There are microarray-based products marketed for cancer recurrence prognosis. These devices might have to meet regulatory sensitivity/specificity requirments for medical devices. It has been observed that reported gene signatures hardly overlap. We believe a contributing factor might be statistical design principles have not been applied to microarray fabrication and sample hybridization. I will describe preliminary sensitivity/specificity results from a collaborative cancer recurrence prognosis project which designs microarrays and sample hybridization statistically. I will also discuss the statistical issues of (1) conditions for shortcutting closed/partition tests to step-down and step-up methods, (2) for which hypotheses are permutation tests appropriate, and (3) for which microarray design and analysis of gene expression levels is control of the False Discovery Rate (FDR) or generalized Familywise Error Rate (gFWER) more appropriate.
Gunnar Rätsch, Friedrich Miescher Laboratory, MPI Tuebingen

Detection of Alternative Splicing Events Using Machine Learning

Abstract:

Eukaryotic pre-mRNAs are spliced to form mature mRNA. Pre-mRNA alternative splicing greatly increases the complexity of gene expression. Estimates show that more than half of the human genes and at least a third of the genes of less complex organisms such as nematodes or flies are alternatively spliced. In the talk I will present some recent results on employing state-of-the-art machine learning techniques to the problem of in silico predictions of alternative splicing events.
James F. Reid

Predicting clinical outcome of breast cancer patients treated with Tamoxifen using gene expression data

Abstract:

I will report on recent work in our lab that aims at developing predictive models for tamoxifen treatment of breast cancer. Results using public microarray data and our own will be presented as well as an attempt at validating a recently published two-gene predictor.
Anthony Rossini, Novartis Pharma AG and University of Washington

Machine Learning, Statistical Computing, and Immunology: Mining Flow Cytometry Data

Abstract:

Analytic flow cytometry (aka FACS: Flourescent-activated cell sorting) has been used to immunologically characterize cell populations in tissue and blood compartments. The data characteristics (large numbers of observations, medium dimensional readouts) present a different form of challenge for statistical computing. We describe the family of assays, some issues around preprocessing, and finish with some work in progress using PRIM to discover populations for characterizing a clinical trial subject's immune status. The challenges faced include processing and statistical inference for very large datasets, normalization and comparability, and finally, discovery.
Alessandro Verri, Universita' di Genova

Functional methods for learning

Abstract:

In this talk I'll review some of the work we have been doing in our group on the mathematical foundations of learning in the last few years. The emphasis is on the use of functional methods to derive rigorous results on mathematical properties of a large class of learning algorithms. In the final part of the talk, I'll argue for the need of a deeper understanding of the theoretical aspects and foundations of problems like feature selection and use of unlabeled data for bioinformatics applications.
David L. Wild, Keck Graduate Institute of Applied Life Sciences

Graphical Models and Bayesian Methods in Bioinformatics: From Structural to Systems Biology

Abstract:

Graphical models and Bayesian methods can be used for a variety of modeling problems in bioinformatics. They allow robust statistical models to be learned and sources of noise and uncertainty to be included in a principled manner. I will describe the application of these methods in two on-going projects which span the fields of structural and systems biology: the inference of gene regulatory networks from microarray data and protein structure prediction.

Comments on our webpages please to jaeger@molgen.mpg.de