|
International BCB-Workshop
on Machine Learning in Bioinformatics
October 10, 2005
Magnus-Haus,
Am Kupfergraben 7, 10117 Berlin
|
|
Cesare Furlanello, ITC-irst Trento
Semi-supervised learning for molecular profiling
Abstract:
Class prediction and feature selection are two learning tasks that are
strictly paired in the search for molecular profiles. It is easy to incur a
selection bias effect when dealing with high throughput data. Complex
validation setups and computational resources are thus required to avoid
overly optimistic estimates of the predictive accuracy on novel data and the
selection of incorrect gene lists. In this talk, I will discuss how to reuse
and analyze the by-products of the validation of microarray studies when
profiling with supervised machine learning. A new class of semi-supervised
methods for analyzing the stability of the gene lists and for discovering
outliers and potential subtypes will be presented. (Joint work with G.
Jurman and S. Merler)
|
Alex Hartemink, Durham, NC
Bayesian machine learning and informative priors in computational systems biology
Abstract:
I have primarily been interested in two different kinds of learning
tasks at the intersection of machine learning and computational systems
biology: network inference and classification. This talk will present
a collection of results in both, omitting some of the details as they
are available in our papers over the years. Instead, I will present
the methodological highlights and biological results, as part of an
effort to summarize the whole endeavor. In particular, I believe the
Bayesian statistical paradigm to be compelling for a variety of
reasons, not least of which is the ability to naturally incorporate
prior information into the learning process. Yet, the vast majority of
practitioners of Bayesian methods in computational biology formulate
and learn models with uninformative priors (e.g., a uniform
distribution, which turns a maximum a posteriori criterion into a
traditional maximum likelihood criterion). I will attempt to make a
case for the use of informative priors of various kinds, and
demonstrate how they can be used to guide and improve the accuracy of
our models in a variety of settings.
|
Jason Hsu, The Ohio State University
Statistical Design and Multiple Testing Analysis of Gene Expression
Levels from Microarray Experiments
Abstract:
Microarray experiments are no longer strictly exploratory in nature.
There are microarray-based products marketed for cancer recurrence
prognosis.
These devices might have to meet regulatory sensitivity/specificity
requirments for medical devices.
It has been observed that reported gene signatures hardly overlap. We
believe a contributing factor might be statistical design principles have
not been applied to microarray fabrication and sample hybridization. I
will describe preliminary sensitivity/specificity results from a
collaborative cancer recurrence prognosis project which designs
microarrays and sample hybridization statistically.
I will also discuss the statistical issues of (1) conditions for
shortcutting closed/partition tests to step-down and step-up methods, (2)
for which hypotheses are permutation tests appropriate, and (3) for which
microarray design and analysis of gene expression levels is control of the
False Discovery Rate (FDR) or generalized Familywise Error Rate (gFWER)
more appropriate.
|
Gunnar Rätsch, Friedrich Miescher Laboratory, MPI Tuebingen
Detection of Alternative Splicing Events Using Machine Learning
Abstract:
Eukaryotic pre-mRNAs are spliced to form mature mRNA. Pre-mRNA
alternative splicing greatly increases the complexity of gene
expression. Estimates show that more than half of the human genes and at
least a third of the genes of less complex organisms such as nematodes
or flies are alternatively spliced. In the talk I will present some
recent results on employing state-of-the-art machine learning techniques
to the problem of in silico predictions of alternative splicing events.
|
James F. Reid
Predicting clinical outcome of breast cancer patients treated with Tamoxifen using gene
expression data
Abstract:
I will report on recent work in our lab that aims at developing predictive models for
tamoxifen treatment of breast cancer. Results using public microarray data and our own will be presented
as well as an attempt at validating a recently published two-gene predictor.
|
Anthony Rossini, Novartis Pharma AG and University of Washington
Machine Learning, Statistical Computing, and Immunology: Mining Flow Cytometry Data
Abstract:
Analytic flow cytometry (aka FACS: Flourescent-activated cell sorting) has been used to
immunologically characterize cell populations in tissue and blood compartments. The data
characteristics (large numbers of observations, medium dimensional readouts) present a different
form of challenge for statistical computing. We describe the family of assays, some issues
around preprocessing, and finish with some work in progress using PRIM to discover populations
for characterizing a clinical trial subject's immune status. The challenges faced include
processing and statistical inference for very large datasets, normalization and comparability,
and finally, discovery.
|
Alessandro Verri, Universita' di Genova
Functional methods for learning
Abstract:
In this talk I'll review some of the work we have
been doing in our group on the mathematical
foundations of learning in the last few years.
The emphasis is on the use of functional methods to
derive rigorous results on mathematical properties
of a large class of learning algorithms. In the
final part of the talk, I'll argue for the need of
a deeper understanding of the theoretical aspects and
foundations of problems like feature selection
and use of unlabeled data for bioinformatics
applications.
|
David L. Wild, Keck Graduate Institute of Applied Life Sciences
Graphical Models and Bayesian Methods in Bioinformatics: From Structural to Systems Biology
Abstract:
Graphical models and Bayesian methods can be used for a variety of
modeling problems in bioinformatics. They allow robust statistical
models to be learned and sources of noise and uncertainty to be
included in a principled manner. I will describe the application of
these methods in two on-going projects which span the fields of
structural and systems biology: the inference of gene regulatory
networks from microarray data and protein structure prediction.
|
|