Abstract
Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known.
Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating neutrality test statistics, which are commonly used for finding genomic regions that have been under natural selection, 3) a new method for estimating individual admixture proportions, which can be used for finding population structure and 4) a general framework for analysis of high-throughput sequencing data.
These methods are all based on the concept of genotype likelihoods, which provides a degree of uncertainty of the data, and we show, both through simulations and with proper high-throughput sequencing data, that for low-depth data our methods outperform existing approaches, which are based on the assumption of known genotypes. The new methods are implemented in fast multi-threaded programs, which have been made freely available to the scientific community and have already been successfully applied in many different studies.
Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating neutrality test statistics, which are commonly used for finding genomic regions that have been under natural selection, 3) a new method for estimating individual admixture proportions, which can be used for finding population structure and 4) a general framework for analysis of high-throughput sequencing data.
These methods are all based on the concept of genotype likelihoods, which provides a degree of uncertainty of the data, and we show, both through simulations and with proper high-throughput sequencing data, that for low-depth data our methods outperform existing approaches, which are based on the assumption of known genotypes. The new methods are implemented in fast multi-threaded programs, which have been made freely available to the scientific community and have already been successfully applied in many different studies.
Original language | English |
---|
Publisher | Natural History Museum of Denmark, Faculty of Science, University of Copenhagen |
---|---|
Number of pages | 109 |
Publication status | Published - 2015 |