SICTIN: Rapid footprinting of massively parallel sequencing data

Stefan Enroth; Robin Andersson; Claes Wadelius; Jan Komorowski

doi:10.1186/1756-0381-3-4

SICTIN: Rapid footprinting of massively parallel sequencing data

Stefan Enroth, Robin Andersson, Claes Wadelius, Jan Komorowski

9 Citationer (Scopus)

Abstract

Background. Massively parallel sequencing allows for genome-wide hypothesis-free investigation of for instance transcription factor binding sites or histone modifications. Although nucleotide resolution detailed information can easily be generated, biological insight often requires a more general view of patterns (footprints) over distinct genomic features such as transcription start sites, exons or repetitive regions. The construction of these footprints is however a time consuming task. Methods. The presented software generates a binary representation of the signals enabling fast and scalable lookup. This representation allows for footprint generation in mere minutes on a desktop computer. Several different input formats are accepted, e.g. the SAM format, bed-files and the UCSC wiggle track. Conclusions. Hypothesis-free investigation of genome wide interactions allows for biological data mining at a scale never before seen. Until recently, the main focus of analysis of sequencing data has been targeted on signal patterns around transcriptional start sites which are in manageable numbers. Today, focus is shifting to a wider perspective and numerous genomic features are being studied. To this end, we provide a system allowing for fast querying in the order of hundreds of thousands of features.

Originalsprog	Engelsk
Tidsskrift	BioData Mining
Vol/bind	3
Udgave nummer	1
Sider (fra-til)	4
ISSN	1756-0381
DOI	https://doi.org/10.1186/1756-0381-3-4
Status	Udgivet - 2010
Udgivet eksternt	Ja

Adgang til dokumentet

10.1186/1756-0381-3-4

Citationsformater

@article{0f998069c307420c90df1ce0e6017f5a,

title = "SICTIN: Rapid footprinting of massively parallel sequencing data",

abstract = "Background. Massively parallel sequencing allows for genome-wide hypothesis-free investigation of for instance transcription factor binding sites or histone modifications. Although nucleotide resolution detailed information can easily be generated, biological insight often requires a more general view of patterns (footprints) over distinct genomic features such as transcription start sites, exons or repetitive regions. The construction of these footprints is however a time consuming task. Methods. The presented software generates a binary representation of the signals enabling fast and scalable lookup. This representation allows for footprint generation in mere minutes on a desktop computer. Several different input formats are accepted, e.g. the SAM format, bed-files and the UCSC wiggle track. Conclusions. Hypothesis-free investigation of genome wide interactions allows for biological data mining at a scale never before seen. Until recently, the main focus of analysis of sequencing data has been targeted on signal patterns around transcriptional start sites which are in manageable numbers. Today, focus is shifting to a wider perspective and numerous genomic features are being studied. To this end, we provide a system allowing for fast querying in the order of hundreds of thousands of features.",

author = "Stefan Enroth and Robin Andersson and Claes Wadelius and Jan Komorowski",

year = "2010",

doi = "10.1186/1756-0381-3-4",

language = "English",

volume = "3",

pages = "4",

journal = "BioData Mining",

issn = "1756-0381",

publisher = "BioMed Central Ltd.",

number = "1",

}

TY - JOUR

T1 - SICTIN

T2 - Rapid footprinting of massively parallel sequencing data

AU - Enroth, Stefan

AU - Andersson, Robin

AU - Wadelius, Claes

AU - Komorowski, Jan

PY - 2010

Y1 - 2010

N2 - Background. Massively parallel sequencing allows for genome-wide hypothesis-free investigation of for instance transcription factor binding sites or histone modifications. Although nucleotide resolution detailed information can easily be generated, biological insight often requires a more general view of patterns (footprints) over distinct genomic features such as transcription start sites, exons or repetitive regions. The construction of these footprints is however a time consuming task. Methods. The presented software generates a binary representation of the signals enabling fast and scalable lookup. This representation allows for footprint generation in mere minutes on a desktop computer. Several different input formats are accepted, e.g. the SAM format, bed-files and the UCSC wiggle track. Conclusions. Hypothesis-free investigation of genome wide interactions allows for biological data mining at a scale never before seen. Until recently, the main focus of analysis of sequencing data has been targeted on signal patterns around transcriptional start sites which are in manageable numbers. Today, focus is shifting to a wider perspective and numerous genomic features are being studied. To this end, we provide a system allowing for fast querying in the order of hundreds of thousands of features.

AB - Background. Massively parallel sequencing allows for genome-wide hypothesis-free investigation of for instance transcription factor binding sites or histone modifications. Although nucleotide resolution detailed information can easily be generated, biological insight often requires a more general view of patterns (footprints) over distinct genomic features such as transcription start sites, exons or repetitive regions. The construction of these footprints is however a time consuming task. Methods. The presented software generates a binary representation of the signals enabling fast and scalable lookup. This representation allows for footprint generation in mere minutes on a desktop computer. Several different input formats are accepted, e.g. the SAM format, bed-files and the UCSC wiggle track. Conclusions. Hypothesis-free investigation of genome wide interactions allows for biological data mining at a scale never before seen. Until recently, the main focus of analysis of sequencing data has been targeted on signal patterns around transcriptional start sites which are in manageable numbers. Today, focus is shifting to a wider perspective and numerous genomic features are being studied. To this end, we provide a system allowing for fast querying in the order of hundreds of thousands of features.

U2 - 10.1186/1756-0381-3-4

DO - 10.1186/1756-0381-3-4

M3 - Journal article

C2 - 20707885

SN - 1756-0381

VL - 3

SP - 4

JO - BioData Mining

JF - BioData Mining

IS - 1

ER -

SICTIN: Rapid footprinting of massively parallel sequencing data

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater