TY - JOUR
T1 - Estimating inbreeding coefficients from NGS data
T2 - impact on genotype calling and allele frequency estimation
AU - Garrett Vieira, Filipe Jorge
AU - Fumagalli, Matteo
AU - Albrechtsen, Anders
AU - Nielsen, Rasmus
PY - 2013/11
Y1 - 2013/11
N2 - Most methods for next-generation sequencing (NGS) data analyses incorporate information regarding allele frequencies using the assumption of Hardy-Weinberg equilibrium (HWE) as a prior. However, many organisms including those that are domesticated, partially selfing, or with asexual life cycles show strong deviations from HWE. For such species, and specially for low-coverage data, it is necessary to obtain estimates of inbreeding coefficients (F ) for each individual before calling genotypes. Here, we present two methods for estimating inbreeding coefficients from NGS data based on an expectation-maximization (EM) algorithm. We assess the impact of taking inbreeding into account when calling genotypes or estimating the site frequency spectrum (SFS), and demonstrate a marked increase in accuracy on low-coverage highly inbred samples. We demonstrate the applicability and efficacy of these methods in both simulated and real data sets.
AB - Most methods for next-generation sequencing (NGS) data analyses incorporate information regarding allele frequencies using the assumption of Hardy-Weinberg equilibrium (HWE) as a prior. However, many organisms including those that are domesticated, partially selfing, or with asexual life cycles show strong deviations from HWE. For such species, and specially for low-coverage data, it is necessary to obtain estimates of inbreeding coefficients (F ) for each individual before calling genotypes. Here, we present two methods for estimating inbreeding coefficients from NGS data based on an expectation-maximization (EM) algorithm. We assess the impact of taking inbreeding into account when calling genotypes or estimating the site frequency spectrum (SFS), and demonstrate a marked increase in accuracy on low-coverage highly inbred samples. We demonstrate the applicability and efficacy of these methods in both simulated and real data sets.
U2 - 10.1101/gr.157388.113
DO - 10.1101/gr.157388.113
M3 - Journal article
C2 - 23950147
SN - 1088-9051
VL - 23
SP - 1852
EP - 1861
JO - Genome Research
JF - Genome Research
ER -