Estimating individual admixture proportions from next generation sequencing data

Line Skotte; Thorfinn Sand Korneliussen; Anders Albrechtsen

doi:10.1534/genetics.113.154138

Estimating individual admixture proportions from next generation sequencing data

Line Skotte, Thorfinn Sand Korneliussen, Anders Albrechtsen

Bioinformatik og RNA Biologi

175 Citationer (Scopus)

Abstract

Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual's ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.

Originalsprog	Engelsk
Tidsskrift	Genetics
Vol/bind	195
Udgave nummer	3
Sider (fra-til)	693-702
Antal sider	10
ISSN	0016-6731
DOI	https://doi.org/10.1534/genetics.113.154138
Status	Udgivet - nov. 2013

Adgang til dokumentet

10.1534/genetics.113.154138

693.full.pdfForlagets udgivne version, 1,34 MB

Citationsformater

@article{1aefb55762194b50b5137a29ac03f638,

title = "Estimating individual admixture proportions from next generation sequencing data",

abstract = "Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual's ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.",

author = "Line Skotte and Korneliussen, {Thorfinn Sand} and Anders Albrechtsen",

year = "2013",

month = nov,

doi = "10.1534/genetics.113.154138",

language = "English",

volume = "195",

pages = "693--702",

journal = "Genetics",

issn = "0016-6731",

publisher = "The Genetics Society of America (GSA)",

number = "3",

}

TY - JOUR

T1 - Estimating individual admixture proportions from next generation sequencing data

AU - Skotte, Line

AU - Korneliussen, Thorfinn Sand

AU - Albrechtsen, Anders

PY - 2013/11

Y1 - 2013/11

N2 - Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual's ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.

AB - Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual's ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.

U2 - 10.1534/genetics.113.154138

DO - 10.1534/genetics.113.154138

M3 - Journal article

C2 - 24026093

SN - 0016-6731

VL - 195

SP - 693

EP - 702

JO - Genetics

JF - Genetics

IS - 3

ER -

Estimating individual admixture proportions from next generation sequencing data

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater