Fast admixture analysis and population tree estimation for SNP and NGS data

Jade Yu Cheng; Thomas Mailund; Rasmus Nielsen

doi:10.1093/bioinformatics/btx098

Fast admixture analysis and population tree estimation for SNP and NGS data

Jade Yu Cheng^*, Thomas Mailund, Rasmus Nielsen

^*Corresponding author af dette arbejde

13 Citationer (Scopus)

Abstract

Motivation: Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components. Contribution: We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana.

Originalsprog	Engelsk
Tidsskrift	Bioinformatics
Vol/bind	33
Udgave nummer	14
Sider (fra-til)	2148-2155
Antal sider	8
ISSN	1367-4803
DOI	https://doi.org/10.1093/bioinformatics/btx098
Status	Udgivet - 15 jul. 2017

Adgang til dokumentet

10.1093/bioinformatics/btx098

btx098.pdfForlagets udgivne version, 858 KB

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{5bd25523318743ff80fa74cc2651b247,

title = "Fast admixture analysis and population tree estimation for SNP and NGS data",

abstract = "Motivation: Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components. Contribution: We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana.",

author = "Cheng, {Jade Yu} and Thomas Mailund and Rasmus Nielsen",

year = "2017",

month = jul,

day = "15",

doi = "10.1093/bioinformatics/btx098",

language = "English",

volume = "33",

pages = "2148--2155",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "14",

}

TY - JOUR

T1 - Fast admixture analysis and population tree estimation for SNP and NGS data

AU - Cheng, Jade Yu

AU - Mailund, Thomas

AU - Nielsen, Rasmus

PY - 2017/7/15

Y1 - 2017/7/15

N2 - Motivation: Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components. Contribution: We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana.

AB - Motivation: Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components. Contribution: We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana.

UR - http://www.scopus.com/inward/record.url?scp=85024488622&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btx098

DO - 10.1093/bioinformatics/btx098

M3 - Journal article

C2 - 28334108

AN - SCOPUS:85024488622

SN - 1367-4803

VL - 33

SP - 2148

EP - 2155

JO - Bioinformatics

JF - Bioinformatics

IS - 14

ER -

Fast admixture analysis and population tree estimation for SNP and NGS data

Abstract

Adgang til dokumentet

Andre filer og links

Fingeraftryk

Citationsformater