Abstract
The introduction of second-generation sequencing, has in recent years allowed the biological
community to determine the genomes and transcriptomes of organisms and individuals
at an unprecedented rate. However, almost every step in the sequencing protocol introduces
uncertainties in how the resulting sequencing data should be interpreted. This has
over the years spurred the development of many probabilistic methods that are capable of
modelling dierent aspects of the sequencing process. Here, I present two of such methods
that were developed to each tackle a dierent problem in bioinformatics, together with an
application of the latter method to a large Danish sequencing project.
The rst is a probabilistic method for transcriptome assembly that is based on a novel
generative model of the RNA sequencing process and provides condence estimates on the
assembled transcripts. We show that this approach outperforms existing state-of-the-art
methods measured using sensitivity and precision on both simulated and real data.
The second is a novel probabilistic method that uses exact alignment of k-mers to a set
of variants graphs to provide unbiased estimates of genotypes in a population of individuals.
Using simulations we show that this method markedly increases sensitivity without
sacricing precision, when compared to mapping-based approaches, especially in variant
dense regions. We further demonstrate, using high coverage real genome sequencing data
of parent-ospring trios, that our method is accurate even for larger structural variants
measured using trio concordance.
Finally, we applied the second method to genotype variants, predicted using both a mappingbased
approach and de novo assemblies, in a population of 50 Danish parent-ospring trios
in the GenomeDenmark project. Using this hybrid-approach we not only created a variant
set that was more complete, in term of structural variants, compared to previous similar
studies but also signicantly reduced the bias towards deletions normally observed in such
studies.
community to determine the genomes and transcriptomes of organisms and individuals
at an unprecedented rate. However, almost every step in the sequencing protocol introduces
uncertainties in how the resulting sequencing data should be interpreted. This has
over the years spurred the development of many probabilistic methods that are capable of
modelling dierent aspects of the sequencing process. Here, I present two of such methods
that were developed to each tackle a dierent problem in bioinformatics, together with an
application of the latter method to a large Danish sequencing project.
The rst is a probabilistic method for transcriptome assembly that is based on a novel
generative model of the RNA sequencing process and provides condence estimates on the
assembled transcripts. We show that this approach outperforms existing state-of-the-art
methods measured using sensitivity and precision on both simulated and real data.
The second is a novel probabilistic method that uses exact alignment of k-mers to a set
of variants graphs to provide unbiased estimates of genotypes in a population of individuals.
Using simulations we show that this method markedly increases sensitivity without
sacricing precision, when compared to mapping-based approaches, especially in variant
dense regions. We further demonstrate, using high coverage real genome sequencing data
of parent-ospring trios, that our method is accurate even for larger structural variants
measured using trio concordance.
Finally, we applied the second method to genotype variants, predicted using both a mappingbased
approach and de novo assemblies, in a population of 50 Danish parent-ospring trios
in the GenomeDenmark project. Using this hybrid-approach we not only created a variant
set that was more complete, in term of structural variants, compared to previous similar
studies but also signicantly reduced the bias towards deletions normally observed in such
studies.
Originalsprog | Engelsk |
---|
Forlag | Department of Biology, Faculty of Science, University of Copenhagen |
---|---|
Antal sider | 119 |
Status | Udgivet - 2016 |