FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome

Valentin Wucher; Fabrice Legeai; Benoît Hédan; Guillaume Rizk; Laetitia Lagoutte; Tosso Leeb; Vidhya Jagannathan; Edouard Cadieu; Audrey David; Hannes Lohi; Susanna Cirera Salicio; Merete Fredholm; Nadine Botherel; Peter A.J. Leegwater; Céline Le Béguec; Hille Fieten; Jeremy J Johnson; Jessica Alföldi; Catherine André; Kerstin Lindblad-Toh; Christophe Hitte; Thomas Derrien

doi:10.1093/nar/gkw1306

FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome

Valentin Wucher, Fabrice Legeai, Benoît Hédan, Guillaume Rizk, Laetitia Lagoutte, Tosso Leeb, Vidhya Jagannathan, Edouard Cadieu, Audrey David, Hannes Lohi, Susanna Cirera Salicio, Merete Fredholm, Nadine Botherel, Peter A.J. Leegwater, Céline Le Béguec, Hille Fieten, Jeremy J Johnson, Jessica Alföldi, Catherine André, Kerstin Lindblad-TohChristophe Hitte, Thomas Derrien^*

^*Corresponding author af dette arbejde

117 Citationer (Scopus)

63 Downloads (Pure)

Abstract

Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing andmonitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/ FEELnc.

Originalsprog	Engelsk
Artikelnummer	e57
Tidsskrift	Nucleic Acids Research
Vol/bind	45
Udgave nummer	8
Antal sider	12
ISSN	0305-1048
DOI	https://doi.org/10.1093/nar/gkw1306
Status	Udgivet - 2017

Adgang til dokumentet

10.1093/nar/gkw1306Licens: CC BY-NC

FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptomeForlagets udgivne version, 1,75 MBLicens: CC BY-NC

Citationsformater

Wucher, V., Legeai, F., Hédan, B., Rizk, G., Lagoutte, L., Leeb, T., Jagannathan, V., Cadieu, E., David, A., Lohi, H., Cirera Salicio, S., Fredholm, M., Botherel, N., Leegwater, P. A. J., Le Béguec, C., Fieten, H., Johnson, J. J., Alföldi, J., André, C., ... Derrien, T. (2017). FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Research, 45(8), Artikel e57. https://doi.org/10.1093/nar/gkw1306

Wucher, V, Legeai, F, Hédan, B, Rizk, G, Lagoutte, L, Leeb, T, Jagannathan, V, Cadieu, E, David, A, Lohi, H, Cirera Salicio, S , Fredholm, M, Botherel, N, Leegwater, PAJ, Le Béguec, C, Fieten, H, Johnson, JJ, Alföldi, J, André, C, Lindblad-Toh, K, Hitte, C & Derrien, T 2017, 'FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome', Nucleic Acids Research, bind 45, nr. 8, e57. https://doi.org/10.1093/nar/gkw1306

@article{fabbf35569284b49a77d563f00f0a096,

title = "FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome",

abstract = "Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing andmonitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/ FEELnc.",

author = "Valentin Wucher and Fabrice Legeai and Beno{\^i}t H{\'e}dan and Guillaume Rizk and Laetitia Lagoutte and Tosso Leeb and Vidhya Jagannathan and Edouard Cadieu and Audrey David and Hannes Lohi and {Cirera Salicio}, Susanna and Merete Fredholm and Nadine Botherel and Leegwater, {Peter A.J.} and {Le B{\'e}guec}, C{\'e}line and Hille Fieten and Johnson, {Jeremy J} and Jessica Alf{\"o}ldi and Catherine Andr{\'e} and Kerstin Lindblad-Toh and Christophe Hitte and Thomas Derrien",

year = "2017",

doi = "10.1093/nar/gkw1306",

language = "English",

volume = "45",

journal = "Nucleic Acids Research",

issn = "0305-1048",

publisher = "Oxford University Press",

number = "8",

}

TY - JOUR

T1 - FEELnc

T2 - A tool for long non-coding RNA annotation and its application to the dog transcriptome

AU - Wucher, Valentin

AU - Legeai, Fabrice

AU - Hédan, Benoît

AU - Rizk, Guillaume

AU - Lagoutte, Laetitia

AU - Leeb, Tosso

AU - Jagannathan, Vidhya

AU - Cadieu, Edouard

AU - David, Audrey

AU - Lohi, Hannes

AU - Cirera Salicio, Susanna

AU - Fredholm, Merete

AU - Botherel, Nadine

AU - Leegwater, Peter A.J.

AU - Le Béguec, Céline

AU - Fieten, Hille

AU - Johnson, Jeremy J

AU - Alföldi, Jessica

AU - André, Catherine

AU - Lindblad-Toh, Kerstin

AU - Hitte, Christophe

AU - Derrien, Thomas

PY - 2017

Y1 - 2017

N2 - Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing andmonitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/ FEELnc.

AB - Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing andmonitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/ FEELnc.

U2 - 10.1093/nar/gkw1306

DO - 10.1093/nar/gkw1306

M3 - Journal article

C2 - 28053114

AN - SCOPUS:85020226797

SN - 0305-1048

VL - 45

JO - Nucleic Acids Research

JF - Nucleic Acids Research

IS - 8

M1 - e57

ER -

FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater