Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA

Thorfinn Sand Korneliussen

Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA

SCIENCE PhD theses

Abstract

Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known.

Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating neutrality test statistics, which are commonly used for finding genomic regions that have been under natural selection, 3) a new method for estimating individual admixture proportions, which can be used for finding population structure and 4) a general framework for analysis of high-throughput sequencing data.

These methods are all based on the concept of genotype likelihoods, which provides a degree of uncertainty of the data, and we show, both through simulations and with proper high-throughput sequencing data, that for low-depth data our methods outperform existing approaches, which are based on the assumption of known genotypes. The new methods are implemented in fast multi-threaded programs, which have been made freely available to the scientific community and have already been successfully applied in many different studies.

Original language	English

Publisher	Natural History Museum of Denmark, Faculty of Science, University of Copenhagen
Number of pages	109
Publication status	Published - 2015

Access to Document

PHD-Thorfinn Sand KorneliussenFinal published version, 6.55 MB

https://rex.kb.dk/primo-explore/fulldisplay?docid=KGL01010158161&context=L&vid=NUI&search_scope=KGL&tab=default_tab&lang=da_DK

Cite this

@phdthesis{2a7489e0c660437ca9d306a1448d45a7,

title = "Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA",

abstract = "Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known.Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating neutrality test statistics, which are commonly used for finding genomic regions that have been under natural selection, 3) a new method for estimating individual admixture proportions, which can be used for finding population structure and 4) a general framework for analysis of high-throughput sequencing data.These methods are all based on the concept of genotype likelihoods, which provides a degree of uncertainty of the data, and we show, both through simulations and with proper high-throughput sequencing data, that for low-depth data our methods outperform existing approaches, which are based on the assumption of known genotypes. The new methods are implemented in fast multi-threaded programs, which have been made freely available to the scientific community and have already been successfully applied in many different studies.",

author = "Korneliussen, {Thorfinn Sand}",

year = "2015",

language = "English",

publisher = "Natural History Museum of Denmark, Faculty of Science, University of Copenhagen",

}

TY - BOOK

T1 - Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA

AU - Korneliussen, Thorfinn Sand

PY - 2015

Y1 - 2015

N2 - Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known.Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating neutrality test statistics, which are commonly used for finding genomic regions that have been under natural selection, 3) a new method for estimating individual admixture proportions, which can be used for finding population structure and 4) a general framework for analysis of high-throughput sequencing data.These methods are all based on the concept of genotype likelihoods, which provides a degree of uncertainty of the data, and we show, both through simulations and with proper high-throughput sequencing data, that for low-depth data our methods outperform existing approaches, which are based on the assumption of known genotypes. The new methods are implemented in fast multi-threaded programs, which have been made freely available to the scientific community and have already been successfully applied in many different studies.

AB - Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known.Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating neutrality test statistics, which are commonly used for finding genomic regions that have been under natural selection, 3) a new method for estimating individual admixture proportions, which can be used for finding population structure and 4) a general framework for analysis of high-throughput sequencing data.These methods are all based on the concept of genotype likelihoods, which provides a degree of uncertainty of the data, and we show, both through simulations and with proper high-throughput sequencing data, that for low-depth data our methods outperform existing approaches, which are based on the assumption of known genotypes. The new methods are implemented in fast multi-threaded programs, which have been made freely available to the scientific community and have already been successfully applied in many different studies.

UR - https://rex.kb.dk/primo-explore/fulldisplay?docid=KGL01010158161&context=L&vid=NUI&search_scope=KGL&tab=default_tab&lang=da_DK

M3 - Ph.D. thesis

BT - Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA

PB - Natural History Museum of Denmark, Faculty of Science, University of Copenhagen

ER -

Statistical Methods for Population Genetic Inference Based on Low-Depth Sequencing Data from Modern and Ancient DNA

Abstract

Access to Document

Other files and links

Cite this