Association testing for next-generation sequencing data using score statistics

Line Skotte; Thorfinn Sand Korneliussen; Anders Albrechtsen

doi:10.1002/gepi.21636

Association testing for next-generation sequencing data using score statistics

Line Skotte, Thorfinn Sand Korneliussen, Anders Albrechtsen

Bioinformatik og RNA Biologi

22 Citationer (Scopus)

Abstract

The advances in sequencing technology have made large-scale sequencing studies for large cohorts feasible. Often, the primary goal for large-scale studies is to identify genetic variants associated with a disease or other phenotypes. Even when deep sequencing is performed, there will be many sites where there is not enough data to call genotypes accurately. Ignoring the genotype classification uncertainty by basing subsequent analyses on called genotypes leads to a loss in power. Additionally, using called genotypes can lead to spurious association signals. Some methods taking the uncertainty of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies as well as mapping of quantitative traits. The model allows additional covariates that enable correction for confounding factors such as population stratification or cohort effects. Genet. Epidemiol. 36:430-437, 2012. (C) 2012 Wiley Periodicals, Inc.

Originalsprog	Engelsk
Tidsskrift	Genetic Epidemiology
Vol/bind	36
Udgave nummer	5
Sider (fra-til)	430-437
Antal sider	8
ISSN	0741-0395
DOI	https://doi.org/10.1002/gepi.21636
Status	Udgivet - jul. 2012

Adgang til dokumentet

10.1002/gepi.21636

Association testing for next-generation sequencing data using score statisticsForlagets udgivne version, 798 KB

Citationsformater

@article{6c254d7b75514fafae8557c448e7dc15,

title = "Association testing for next-generation sequencing data using score statistics",

abstract = "The advances in sequencing technology have made large-scale sequencing studies for large cohorts feasible. Often, the primary goal for large-scale studies is to identify genetic variants associated with a disease or other phenotypes. Even when deep sequencing is performed, there will be many sites where there is not enough data to call genotypes accurately. Ignoring the genotype classification uncertainty by basing subsequent analyses on called genotypes leads to a loss in power. Additionally, using called genotypes can lead to spurious association signals. Some methods taking the uncertainty of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies as well as mapping of quantitative traits. The model allows additional covariates that enable correction for confounding factors such as population stratification or cohort effects.",

author = "Line Skotte and Korneliussen, {Thorfinn Sand} and Anders Albrechtsen",

year = "2012",

month = jul,

doi = "10.1002/gepi.21636",

language = "English",

volume = "36",

pages = "430--437",

journal = "Genetic Epidemiology",

issn = "0741-0395",

publisher = "JohnWiley & Sons, Inc.",

number = "5",

}

TY - JOUR

T1 - Association testing for next-generation sequencing data using score statistics

AU - Skotte, Line

AU - Korneliussen, Thorfinn Sand

AU - Albrechtsen, Anders

PY - 2012/7

Y1 - 2012/7

N2 - The advances in sequencing technology have made large-scale sequencing studies for large cohorts feasible. Often, the primary goal for large-scale studies is to identify genetic variants associated with a disease or other phenotypes. Even when deep sequencing is performed, there will be many sites where there is not enough data to call genotypes accurately. Ignoring the genotype classification uncertainty by basing subsequent analyses on called genotypes leads to a loss in power. Additionally, using called genotypes can lead to spurious association signals. Some methods taking the uncertainty of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies as well as mapping of quantitative traits. The model allows additional covariates that enable correction for confounding factors such as population stratification or cohort effects.

AB - The advances in sequencing technology have made large-scale sequencing studies for large cohorts feasible. Often, the primary goal for large-scale studies is to identify genetic variants associated with a disease or other phenotypes. Even when deep sequencing is performed, there will be many sites where there is not enough data to call genotypes accurately. Ignoring the genotype classification uncertainty by basing subsequent analyses on called genotypes leads to a loss in power. Additionally, using called genotypes can lead to spurious association signals. Some methods taking the uncertainty of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies as well as mapping of quantitative traits. The model allows additional covariates that enable correction for confounding factors such as population stratification or cohort effects.

U2 - 10.1002/gepi.21636

DO - 10.1002/gepi.21636

M3 - Journal article

SN - 0741-0395

VL - 36

SP - 430

EP - 437

JO - Genetic Epidemiology

JF - Genetic Epidemiology

IS - 5

ER -

Association testing for next-generation sequencing data using score statistics

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater