DeepBlockAlign: a tool for aligning RNA-seq profiles of read block patterns

David Langenberger; Sachin Pundhir; Claus Thorn Ekstrøm; Peter F. Stadler; Steve Hoffmann; Jan Gorodkin

doi:10.1093/bioinformatics/btr598

DeepBlockAlign: a tool for aligning RNA-seq profiles of read block patterns

David Langenberger, Sachin Pundhir, Claus Thorn Ekstrøm, Peter F. Stadler, Steve Hoffmann, Jan Gorodkin

17 Citations (Scopus)

Abstract

Motivation: High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing. The miR-miR* pattern from microRNA precursors is the best-known, but by no means singular, example.Results: deepBlockAlign introduces a two-step approach to align RNA-seq read patterns with the aim of quickly identifying RNAs that share similar processing footprints. Overlapping mapped reads are first merged to blocks and then closely spaced blocks are combined to block groups, each representing a locus of expression. In order to compare block groups, the constituent blocks are first compared using a modified sequence alignment algorithm to determine similarity scores for pairs of blocks. In the second stage, block patterns are compared by means of a modified Sankoff algorithm that takes both block similarities and similarities of pattern of distances within the block groups into account. Hierarchical clustering of block groups clearly separates most miRNA and tRNA, and also identifies about a dozen tRNAs clustering together with miRNA. Most of these putative Dicer-processed tRNAs, including eight cases reported to generate products with miRNA-like features in literature, exhibit read blocks distinguished by precise start position of reads.

Original language	English
Journal	Bioinformatics
Volume	28
Issue number	1
Pages (from-to)	17-24
Number of pages	8
ISSN	1367-4803
DOIs	https://doi.org/10.1093/bioinformatics/btr598
Publication status	Published - Jan 2012

Access to Document

10.1093/bioinformatics/btr598

Cite this

@article{9871beaf65424f0d8339b07387d88660,

title = "DeepBlockAlign: a tool for aligning RNA-seq profiles of read block patterns",

abstract = "Motivation: High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing. The miR-miR* pattern from microRNA precursors is the best-known, but by no means singular, example.Results: deepBlockAlign introduces a two-step approach to align RNA-seq read patterns with the aim of quickly identifying RNAs that share similar processing footprints. Overlapping mapped reads are first merged to blocks and then closely spaced blocks are combined to block groups, each representing a locus of expression. In order to compare block groups, the constituent blocks are first compared using a modified sequence alignment algorithm to determine similarity scores for pairs of blocks. In the second stage, block patterns are compared by means of a modified Sankoff algorithm that takes both block similarities and similarities of pattern of distances within the block groups into account. Hierarchical clustering of block groups clearly separates most miRNA and tRNA, and also identifies about a dozen tRNAs clustering together with miRNA. Most of these putative Dicer-processed tRNAs, including eight cases reported to generate products with miRNA-like features in literature, exhibit read blocks distinguished by precise start position of reads.",

author = "David Langenberger and Sachin Pundhir and Ekstr{\o}m, {Claus Thorn} and Stadler, {Peter F.} and Steve Hoffmann and Jan Gorodkin",

year = "2012",

month = jan,

doi = "10.1093/bioinformatics/btr598",

language = "English",

volume = "28",

pages = "17--24",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - DeepBlockAlign

T2 - a tool for aligning RNA-seq profiles of read block patterns

AU - Langenberger, David

AU - Pundhir, Sachin

AU - Ekstrøm, Claus Thorn

AU - Stadler, Peter F.

AU - Hoffmann, Steve

AU - Gorodkin, Jan

PY - 2012/1

Y1 - 2012/1

N2 - Motivation: High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing. The miR-miR* pattern from microRNA precursors is the best-known, but by no means singular, example.Results: deepBlockAlign introduces a two-step approach to align RNA-seq read patterns with the aim of quickly identifying RNAs that share similar processing footprints. Overlapping mapped reads are first merged to blocks and then closely spaced blocks are combined to block groups, each representing a locus of expression. In order to compare block groups, the constituent blocks are first compared using a modified sequence alignment algorithm to determine similarity scores for pairs of blocks. In the second stage, block patterns are compared by means of a modified Sankoff algorithm that takes both block similarities and similarities of pattern of distances within the block groups into account. Hierarchical clustering of block groups clearly separates most miRNA and tRNA, and also identifies about a dozen tRNAs clustering together with miRNA. Most of these putative Dicer-processed tRNAs, including eight cases reported to generate products with miRNA-like features in literature, exhibit read blocks distinguished by precise start position of reads.

AB - Motivation: High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing. The miR-miR* pattern from microRNA precursors is the best-known, but by no means singular, example.Results: deepBlockAlign introduces a two-step approach to align RNA-seq read patterns with the aim of quickly identifying RNAs that share similar processing footprints. Overlapping mapped reads are first merged to blocks and then closely spaced blocks are combined to block groups, each representing a locus of expression. In order to compare block groups, the constituent blocks are first compared using a modified sequence alignment algorithm to determine similarity scores for pairs of blocks. In the second stage, block patterns are compared by means of a modified Sankoff algorithm that takes both block similarities and similarities of pattern of distances within the block groups into account. Hierarchical clustering of block groups clearly separates most miRNA and tRNA, and also identifies about a dozen tRNAs clustering together with miRNA. Most of these putative Dicer-processed tRNAs, including eight cases reported to generate products with miRNA-like features in literature, exhibit read blocks distinguished by precise start position of reads.

U2 - 10.1093/bioinformatics/btr598

DO - 10.1093/bioinformatics/btr598

M3 - Journal article

C2 - 22053076

SN - 1367-4803

VL - 28

SP - 17

EP - 24

JO - Bioinformatics

JF - Bioinformatics

IS - 1

ER -

DeepBlockAlign: a tool for aligning RNA-seq profiles of read block patterns

Abstract

Access to Document

Fingerprint

Cite this