Abstract
The use of RNA sequencing (RNA-Seq) technologies is increasing mainly due to the development of new next-generation sequencing machines that have reduced the costs and the time needed for data generation.
Nevertheless, microarrays are still the more common choice and one of the reasons is the complexity of the RNA-Seq data analysis. Furthermore, numerous biases can arise from RNA-Seq technology, and these biases have to be identified and removed properly in order to obtain accurate results.
Nowadays, many tools have been developed which allow to perform each step without high-level programming skills. However, each step of the pipeline needs to be performed carefully and requires a good knowledge of both the technology and the algorithms.
In this comprehensive review, we describe the fundamental steps of the pipeline for RNA-Seq analysis to identify differentially expressed genes: raw data quality control, trimming and filtering procedures, alignment, postmapping quality control, counting, normalization and differential expression test.
For each step, we present the most common tools and we give a complete description of their main characteristics and advantages focusing on the statistics that they perform and the assumptions that they make about the data.
The choice of the right tool can have a big impact on the final results. Until now, no gold standard has been established for this type of analysis.
In conclusion, this review can be useful for both educational purposes as well as for less experienced practitioners of animal genomic research. In the absence of a commonly accepted standard procedure, the general overview presented in this review can help to make the best choices during the implementation of an RNA-Seq pipeline.
Nevertheless, microarrays are still the more common choice and one of the reasons is the complexity of the RNA-Seq data analysis. Furthermore, numerous biases can arise from RNA-Seq technology, and these biases have to be identified and removed properly in order to obtain accurate results.
Nowadays, many tools have been developed which allow to perform each step without high-level programming skills. However, each step of the pipeline needs to be performed carefully and requires a good knowledge of both the technology and the algorithms.
In this comprehensive review, we describe the fundamental steps of the pipeline for RNA-Seq analysis to identify differentially expressed genes: raw data quality control, trimming and filtering procedures, alignment, postmapping quality control, counting, normalization and differential expression test.
For each step, we present the most common tools and we give a complete description of their main characteristics and advantages focusing on the statistics that they perform and the assumptions that they make about the data.
The choice of the right tool can have a big impact on the final results. Until now, no gold standard has been established for this type of analysis.
In conclusion, this review can be useful for both educational purposes as well as for less experienced practitioners of animal genomic research. In the absence of a commonly accepted standard procedure, the general overview presented in this review can help to make the best choices during the implementation of an RNA-Seq pipeline.
Originalsprog | Engelsk |
---|---|
Titel | Systems Biology in Animal Production and Health |
Redaktører | Haja N. Kadarmideen |
Antal sider | 17 |
Vol/bind | 2 |
Udgivelsessted | Switzerland |
Forlag | Springer |
Publikationsdato | 1 jan. 2016 |
Sider | 61-77 |
Kapitel | 3 |
ISBN (Trykt) | 978-3-319-43330-1 |
ISBN (Elektronisk) | 978-3-319-43332-5 |
DOI | |
Status | Udgivet - 1 jan. 2016 |