|
Core Group Extension
Stefan Bentink,
Dennis Kostka
Background: Given microarray data and starting from
a small core group of highly similar samples, our objective is to
find a set of signature genes, which distinguishes them from the
majority of other cases. At the same time, we want to identify
additional cases that have expression levels coherent with the
core group (across the a priori unknown set of signature genes).
This problem is related (but not identical) to the well studied
problem of supervised classification.
Our approach: In a supervised classification setting
classification rules (signatures) are derived from labeled data. The
procedure involves three steps: training, model
selection and model evaluation.
To approach our objective, we propose some modification to the supervised classification
procedure: Since we are unsure about the labels of the cases not in the core
group, their misclassification is not perceived the same way as the misclassification
of a core group member.
Training and model selection are performed with an unusual objective:
we do not take the number of misclassifications as a performance measure.
Given the original label configuration, we derive a signature yielding an estimate of the
probability of a sample to belong to the core group. This encompasses gene selection, and new core group members can
be assigned based on a cutoff.
Model selection is then based on the criteria of high sensitivity (all core group members should be
correctly identified) and generalization ability (the high sensitivity should also be achieved for independent
test sets) as well as on the fact that the posterior probabilities should clearly distinguish between core group members and non-members.
Evaluation is performed by assessing the robustness of core group / non-core group calls of the signature using the bootstrap
on an independent test set.
Publications
-
A biologic definition of Burkitt's lymphoma from transcriptional and genomic profiling
Hummel, Bentink*, Berger*, Klapper*, Wessendorf*, Barth, Bernd, Cogliatti, Dierlamm, Feller, Hansmann, Haralambieva, Harder, Hasenclever, Kühn, Lenze, Lichter, Martin-Subero, Möller, Müller-Hermelink, Ott, Parwaresch, Pott, Rosenwald, Rosolowski, Schwaenen, Stürzenhofecker, Szcepanowski, Trautmann, Wacker, Spang , Löffler, Trümper, Stein, Siebert, for the Molecular Mechanisms in Malignant Lymphomas Network Project of the Deutsche Krebshilfe
New England Journal of Medicine 2006;354:2419-2430. (*These authors contributed equally.)
[Abstract/Full Text]
|