TY - JOUR
T1 - Accurate genotyping across variant classes and lengths using variant graphs
AU - Sibbesen, Jonas Andreas
AU - Maretty, Lasse
AU - Krogh, Anders
AU - Petersen, Bent
AU - Sibbesen, Jonas Andreas
AU - Liu, Siyang
AU - Have, Christian Theil
AU - Bork-Jensen, Jette
AU - Guo, Xiaosen
AU - Hansen, Torben
AU - Krogh, Anders
AU - Sørensen, Thorkild I.A.
AU - Pedersen, Oluf Borbye
AU - Wang, Jun
AU - Brunak, Søren
PY - 2018/7/1
Y1 - 2018/7/1
N2 - Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.
AB - Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.
U2 - 10.1038/s41588-018-0145-5
DO - 10.1038/s41588-018-0145-5
M3 - Journal article
C2 - 29915429
AN - SCOPUS:85048689124
SN - 1061-4036
VL - 50
SP - 1054
EP - 1059
JO - Nature Genetics
JF - Nature Genetics
ER -