Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process

Anna Ramírez-Soriano; Rasmus Nielsen

doi:10.1534/genetics.108.094060

Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process

Anna Ramírez-Soriano, Rasmus Nielsen

Computational and RNA Biology

24 Citations (Scopus)

Abstract

Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.

Original language	English
Journal	Genetics
Volume	181
Issue number	2
Pages (from-to)	701-10
Number of pages	9
ISSN	0016-6731
DOIs	https://doi.org/10.1534/genetics.108.094060
Publication status	Published - 2009

Access to Document

10.1534/genetics.108.094060

Cite this

@article{e08bc4f0a52a11df928f000ea68e967b,

title = "Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process",

abstract = "Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.",

author = "Anna Ram{\'i}rez-Soriano and Rasmus Nielsen",

note = "Keywords: Alleles; Analysis of Variance; Bias (Epidemiology); Biometry; Computer Simulation; Databases, Genetic; Genetics, Population; Genome, Human; Genotype; Humans; Models, Genetic; Mutation; Polymorphism, Single Nucleotide",

year = "2009",

doi = "10.1534/genetics.108.094060",

language = "English",

volume = "181",

pages = "701--10",

journal = "Genetics",

issn = "1943-2631",

publisher = "The Genetics Society of America (GSA)",

number = "2",

}

TY - JOUR

T1 - Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process

AU - Ramírez-Soriano, Anna

AU - Nielsen, Rasmus

N1 - Keywords: Alleles; Analysis of Variance; Bias (Epidemiology); Biometry; Computer Simulation; Databases, Genetic; Genetics, Population; Genome, Human; Genotype; Humans; Models, Genetic; Mutation; Polymorphism, Single Nucleotide

PY - 2009

Y1 - 2009

N2 - Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.

AB - Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.

U2 - 10.1534/genetics.108.094060

DO - 10.1534/genetics.108.094060

M3 - Journal article

C2 - 19087964

SN - 1943-2631

VL - 181

SP - 701

EP - 710

JO - Genetics

JF - Genetics

IS - 2

ER -

Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process

Abstract

Access to Document

Fingerprint

Cite this