Fast large-scale clustering of protein structures using Gauss integrals

Tim Philipp Harder; Mikael Borg; Wouter Krogh Boomsma; Peter Røgen; Thomas Wim Hamelryck

doi:10.1093/bioinformatics/btr692

Fast large-scale clustering of protein structures using Gauss integrals

Tim Philipp Harder, Mikael Borg, Wouter Krogh Boomsma, Peter Røgen, Thomas Wim Hamelryck

Bioinformatik og RNA Biologi

21 Citationer (Scopus)

Abstract

Motivation: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. Results: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors-which were introduced by Røgen and co-workers-and subsequently performing K-means clustering. Conclusions: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50 000 structures, can be clustered within seconds to minutes.

Originalsprog	Engelsk
Tidsskrift	Bioinformatics
Vol/bind	28
Udgave nummer	4
Sider (fra-til)	510-515
Antal sider	6
ISSN	1367-4803
DOI	https://doi.org/10.1093/bioinformatics/btr692
Status	Udgivet - feb. 2012

Adgang til dokumentet

10.1093/bioinformatics/btr692

Fast large-scale clustering of protein structures using Gauss integralsForlagets udgivne version, 455 KB
btr692.pdfForlagets udgivne version, 436 KB

Citationsformater

@article{3e57d58d5b7d4afe8eebae09ad1937c8,

title = "Fast large-scale clustering of protein structures using Gauss integrals",

abstract = "Motivation: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. Results: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors-which were introduced by R{\o}gen and co-workers-and subsequently performing K-means clustering. Conclusions: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50 000 structures, can be clustered within seconds to minutes.",

author = "Harder, {Tim Philipp} and Mikael Borg and Boomsma, {Wouter Krogh} and Peter R{\o}gen and Hamelryck, {Thomas Wim}",

year = "2012",

month = feb,

doi = "10.1093/bioinformatics/btr692",

language = "English",

volume = "28",

pages = "510--515",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "4",

}

TY - JOUR

T1 - Fast large-scale clustering of protein structures using Gauss integrals

AU - Harder, Tim Philipp

AU - Borg, Mikael

AU - Boomsma, Wouter Krogh

AU - Røgen, Peter

AU - Hamelryck, Thomas Wim

PY - 2012/2

Y1 - 2012/2

N2 - Motivation: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. Results: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors-which were introduced by Røgen and co-workers-and subsequently performing K-means clustering. Conclusions: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50 000 structures, can be clustered within seconds to minutes.

AB - Motivation: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. Results: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors-which were introduced by Røgen and co-workers-and subsequently performing K-means clustering. Conclusions: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50 000 structures, can be clustered within seconds to minutes.

U2 - 10.1093/bioinformatics/btr692

DO - 10.1093/bioinformatics/btr692

M3 - Journal article

C2 - 22199383

SN - 1367-4803

VL - 28

SP - 510

EP - 515

JO - Bioinformatics

JF - Bioinformatics

IS - 4

ER -

Fast large-scale clustering of protein structures using Gauss integrals

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater