Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures

Radhakrishnan Sabarinathan; Christian Anthon; Jan Gorodkin; Stefan E Seemann

doi:10.3390/genes9120604

Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures

Radhakrishnan Sabarinathan, Christian Anthon, Jan Gorodkin, Stefan E Seemann

1 Citationer (Scopus)

35 Downloads (Pure)

Abstract

Self-contained structured domains of RNA sequences have often distinct molecular functions. Determining the boundaries of structured domains of a non-coding RNA (ncRNA) is needed for many ncRNA gene finder programs that predict RNA secondary structures in aligned genomes because these methods do not necessarily provide precise information about the boundaries or the location of the RNA structure inside the predicted ncRNA. Even without having a structure prediction, it is of interest to search for structured domains, such as for finding common RNA motifs in RNA-protein binding assays. The precise definition of the boundaries are essential for downstream analyses such as RNA structure modelling, e.g., through covariance models, and RNA structure clustering for the search of common motifs. Such efforts have so far been focused on single sequences, thus here we present a comparison for boundary definition between single sequence and multiple sequence alignments. We also present a novel approach, named RNAbound, for finding the boundaries that are based on probabilities of evolutionarily conserved base pairings. We tested the performance of two different methods on a limited number of Rfam families using the annotated structured RNA regions in the human genome and their multiple sequence alignments created from 14 species. The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method. The actual performance of the two methods differs on single hairpin structures and branched structures. For the RNA families with branched structures, including transfer RNA (tRNA) and small nucleolar RNAs (snoRNAs), RNAbound improves the boundary predictions using multiple sequence alignments to median differences of -6 and -11.5 nucleotides (nts) for left and right boundary, respectively (window size of 200 nts).

Originalsprog	Engelsk
Artikelnummer	604
Tidsskrift	Genes
Vol/bind	9
Udgave nummer	12
Antal sider	17
ISSN	2073-4425
DOI	https://doi.org/10.3390/genes9120604
Status	Udgivet - 4 dec. 2018

Adgang til dokumentet

10.3390/genes9120604Licens: CC BY

Multiple Sequence Alignments Enhance Boundary Definition of RNA StructuresForlagets udgivne version, 1,93 MBLicens: CC BY

Citationsformater

@article{f7927f7941e048f9a4c81feabc5b1892,

title = "Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures",

abstract = "Self-contained structured domains of RNA sequences have often distinct molecular functions. Determining the boundaries of structured domains of a non-coding RNA (ncRNA) is needed for many ncRNA gene finder programs that predict RNA secondary structures in aligned genomes because these methods do not necessarily provide precise information about the boundaries or the location of the RNA structure inside the predicted ncRNA. Even without having a structure prediction, it is of interest to search for structured domains, such as for finding common RNA motifs in RNA-protein binding assays. The precise definition of the boundaries are essential for downstream analyses such as RNA structure modelling, e.g., through covariance models, and RNA structure clustering for the search of common motifs. Such efforts have so far been focused on single sequences, thus here we present a comparison for boundary definition between single sequence and multiple sequence alignments. We also present a novel approach, named RNAbound, for finding the boundaries that are based on probabilities of evolutionarily conserved base pairings. We tested the performance of two different methods on a limited number of Rfam families using the annotated structured RNA regions in the human genome and their multiple sequence alignments created from 14 species. The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method. The actual performance of the two methods differs on single hairpin structures and branched structures. For the RNA families with branched structures, including transfer RNA (tRNA) and small nucleolar RNAs (snoRNAs), RNAbound improves the boundary predictions using multiple sequence alignments to median differences of -6 and -11.5 nucleotides (nts) for left and right boundary, respectively (window size of 200 nts).",

author = "Radhakrishnan Sabarinathan and Christian Anthon and Jan Gorodkin and Seemann, {Stefan E}",

year = "2018",

month = dec,

day = "4",

doi = "10.3390/genes9120604",

language = "English",

volume = "9",

journal = "Genes",

issn = "2073-4425",

publisher = "M D P I AG",

number = "12",

}

TY - JOUR

T1 - Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures

AU - Sabarinathan, Radhakrishnan

AU - Anthon, Christian

AU - Gorodkin, Jan

AU - Seemann, Stefan E

PY - 2018/12/4

Y1 - 2018/12/4

N2 - Self-contained structured domains of RNA sequences have often distinct molecular functions. Determining the boundaries of structured domains of a non-coding RNA (ncRNA) is needed for many ncRNA gene finder programs that predict RNA secondary structures in aligned genomes because these methods do not necessarily provide precise information about the boundaries or the location of the RNA structure inside the predicted ncRNA. Even without having a structure prediction, it is of interest to search for structured domains, such as for finding common RNA motifs in RNA-protein binding assays. The precise definition of the boundaries are essential for downstream analyses such as RNA structure modelling, e.g., through covariance models, and RNA structure clustering for the search of common motifs. Such efforts have so far been focused on single sequences, thus here we present a comparison for boundary definition between single sequence and multiple sequence alignments. We also present a novel approach, named RNAbound, for finding the boundaries that are based on probabilities of evolutionarily conserved base pairings. We tested the performance of two different methods on a limited number of Rfam families using the annotated structured RNA regions in the human genome and their multiple sequence alignments created from 14 species. The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method. The actual performance of the two methods differs on single hairpin structures and branched structures. For the RNA families with branched structures, including transfer RNA (tRNA) and small nucleolar RNAs (snoRNAs), RNAbound improves the boundary predictions using multiple sequence alignments to median differences of -6 and -11.5 nucleotides (nts) for left and right boundary, respectively (window size of 200 nts).

AB - Self-contained structured domains of RNA sequences have often distinct molecular functions. Determining the boundaries of structured domains of a non-coding RNA (ncRNA) is needed for many ncRNA gene finder programs that predict RNA secondary structures in aligned genomes because these methods do not necessarily provide precise information about the boundaries or the location of the RNA structure inside the predicted ncRNA. Even without having a structure prediction, it is of interest to search for structured domains, such as for finding common RNA motifs in RNA-protein binding assays. The precise definition of the boundaries are essential for downstream analyses such as RNA structure modelling, e.g., through covariance models, and RNA structure clustering for the search of common motifs. Such efforts have so far been focused on single sequences, thus here we present a comparison for boundary definition between single sequence and multiple sequence alignments. We also present a novel approach, named RNAbound, for finding the boundaries that are based on probabilities of evolutionarily conserved base pairings. We tested the performance of two different methods on a limited number of Rfam families using the annotated structured RNA regions in the human genome and their multiple sequence alignments created from 14 species. The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method. The actual performance of the two methods differs on single hairpin structures and branched structures. For the RNA families with branched structures, including transfer RNA (tRNA) and small nucleolar RNAs (snoRNAs), RNAbound improves the boundary predictions using multiple sequence alignments to median differences of -6 and -11.5 nucleotides (nts) for left and right boundary, respectively (window size of 200 nts).

U2 - 10.3390/genes9120604

DO - 10.3390/genes9120604

M3 - Journal article

C2 - 30518121

SN - 2073-4425

VL - 9

JO - Genes

JF - Genes

IS - 12

M1 - 604

ER -

Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater