Abstract
We propose a novel co-clustering algorithm that is based on self-organizing maps (SOMs). The method is applied to group yeast (Saccharomyces cerevisiae) genes according to both expression profiles and Gene Ontology (GO) annotations. The combination of multiple databases is supposed to provide a better biological definition and separation of gene clusters. We compare different levels of genome-wide co-clustering by weighting the involved sources of information differently. Clustering quality is determined by both general and SOM-specific validation measures. Co-clustering relies on a sufficient correlation between the different datasets. We investigate in various experiments how much GO information is contained in the applied gene expression dataset and vice versa. The second major contribution is a visualization technique that applies the cluster structure of SOMs for a better biological interpretation of gene (expression) clusterings. Our GO term maps reveal functional neighborhoods between clusters forming biologically meaningful functional SOM regions. To cope with the high variety and specificity of GO terms, gene and cluster annotations are mapped to a reduced vocabulary of more general GO terms. In particular, this advances the ability of SOMs to act as gene function predictors.
Original language | English |
---|---|
Journal | Journal of Biomedical Informatics |
Volume | 40 |
Issue number | 2 |
Pages (from-to) | 160-173 |
Number of pages | 14 |
ISSN | 1532-0464 |
DOIs | |
Publication status | Published - 1 Apr 2007 |
Externally published | Yes |
Keywords
- Clustering validation
- Clustering visualization
- Co-clustering
- Gene expression data
- Gene function prediction
- Gene ontology
- Saccharomyces cerevisiae yeast
- Self-organizing maps