Bounded coordinate-descent for biological sequence classification in high dimensional predictor space

Georgiana Ifrim*, Carsten Wiuf

*Corresponding author for this work
20 Citations (Scopus)

Abstract

We present a framework for discriminative sequence classification where linear classifiers work directly in the explicit high-dimensional predictor space of all subsequences in the training set (as opposed to kernel-induced spaces). This is made feasible by employing a gradient-bounded coordinatedescent algorithm for efficiently selecting discriminative subsequences without having to expand the whole space. Our framework can be applied to a wide range of loss functions, including binomial log-likelihood loss of logistic regression and squared hinge loss of support vector machines. When applied to protein remote homology detection and remote fold recognition, our framework achieves comparable performance to the state-of-the-art (e.g., kernel support vector machines). In contrast to state-of-the-art sequence classifiers, our models are simply lists of weighted discriminative subsequences and can thus be interpreted and related to the biological problem - a crucial requirement for the bioinformatics and medical communities.

Original languageEnglish
Title of host publicationProceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'11
Number of pages9
Publication date16 Sept 2011
Pages708-716
ISBN (Print)9781450308137
DOIs
Publication statusPublished - 16 Sept 2011
Externally publishedYes
Event17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'11 - San Diego, CA, United States
Duration: 21 Aug 201124 Aug 2011

Conference

Conference17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'11
Country/TerritoryUnited States
CitySan Diego, CA
Period21/08/201124/08/2011
SponsorACM Spec. Interest Group Knowl. Discov. Data (SIGKDD), ACM SIGMOD

Keywords

  • Greedy coordinate-descent
  • Logistic regression
  • Sequence classification
  • String classification
  • Support vectormachines

Fingerprint

Dive into the research topics of 'Bounded coordinate-descent for biological sequence classification in high dimensional predictor space'. Together they form a unique fingerprint.

Cite this