RNAscClust: Clustering RNA sequences using structure conservation and graph based motifs

Milad Miladi, Alexander Junge, Fabrizio Costa, Stefan E. Seemann, Jakob Hull Havgaard, Jan Gorodkin, Rolf Backofen*

*Corresponding author af dette arbejde
20 Citationer (Scopus)
56 Downloads (Pure)

Abstract

Motivation: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. Results: Here, we present RNAscClust, the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments.

OriginalsprogEngelsk
TidsskriftBioinformatics
Vol/bind33
Udgave nummer14
Sider (fra-til)2089-2096
Antal sider8
ISSN1367-4803
DOI
StatusUdgivet - 2017

Fingeraftryk

Dyk ned i forskningsemnerne om 'RNAscClust: Clustering RNA sequences using structure conservation and graph based motifs'. Sammen danner de et unikt fingeraftryk.

Citationsformater