Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data

Jonas Meisner; Anders Albrechtsen

doi:10.1534/genetics.118.301336

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data

Jonas Meisner, Anders Albrechtsen

Computational and RNA Biology

56 Citations (Scopus)

Abstract

We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.

Original language	English
Journal	Genetics
Volume	210
Issue number	2
Pages (from-to)	719-731
ISSN	1943-2631
DOIs	https://doi.org/10.1534/genetics.118.301336
Publication status	Published - Oct 2018

Access to Document

10.1534/genetics.118.301336

719.full.pdfFinal published version, 1.14 MB

Cite this

@article{76f5c575c5054c0fa1ceb5d3e3af51ea,

title = "Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data",

abstract = "We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.",

author = "Jonas Meisner and Anders Albrechtsen",

note = "Copyright {\textcopyright} 2018, Genetics.",

year = "2018",

month = oct,

doi = "10.1534/genetics.118.301336",

language = "English",

volume = "210",

pages = "719--731",

journal = "Genetics",

issn = "1943-2631",

publisher = "The Genetics Society of America (GSA)",

number = "2",

}

TY - JOUR

T1 - Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data

AU - Meisner, Jonas

AU - Albrechtsen, Anders

PY - 2018/10

Y1 - 2018/10

N2 - We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.

AB - We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.

U2 - 10.1534/genetics.118.301336

DO - 10.1534/genetics.118.301336

M3 - Journal article

C2 - 30131346

SN - 1943-2631

VL - 210

SP - 719

EP - 731

JO - Genetics

JF - Genetics

IS - 2

ER -

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data

Abstract

Access to Document

Fingerprint

Cite this