Accurate genotyping across variant classes and lengths using variant graphs

Jonas Andreas Sibbesen; Lasse Maretty; Anders Krogh; Bent Petersen; Jonas Andreas Sibbesen; Siyang Liu; Christian Theil Have; Jette Bork-Jensen; Xiaosen Guo; Torben Hansen; Anders Krogh; Thorkild I.A. Sørensen; Oluf Borbye Pedersen; Jun Wang; Søren Brunak

doi:10.1038/s41588-018-0145-5

Accurate genotyping across variant classes and lengths using variant graphs

Jonas Andreas Sibbesen, Lasse Maretty, Anders Krogh^*, Bent Petersen, Jonas Andreas Sibbesen, Siyang Liu, Christian Theil Have, Jette Bork-Jensen, Xiaosen Guo, Torben Hansen, Anders Krogh, Thorkild I.A. Sørensen, Oluf Borbye Pedersen, Jun Wang, Søren Brunak

^*Corresponding author for this work

15 Citations (Scopus)

Abstract

Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.

Original language	English
Journal	Nature Genetics
Volume	50
Pages (from-to)	1054-1059
Number of pages	6
ISSN	1061-4036
DOIs	https://doi.org/10.1038/s41588-018-0145-5
Publication status	Published - 1 Jul 2018

Access to Document

10.1038/s41588-018-0145-5

Cite this

@article{388b0498530b4f2bacb35861f5e1057c,

title = "Accurate genotyping across variant classes and lengths using variant graphs",

abstract = "Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a {\textquoteleft}variation-prior{\textquoteright} database containing already known variants significantly improves sensitivity.",

author = "Sibbesen, {Jonas Andreas} and Lasse Maretty and Anders Krogh and Bent Petersen and Sibbesen, {Jonas Andreas} and Siyang Liu and Have, {Christian Theil} and Jette Bork-Jensen and Xiaosen Guo and Torben Hansen and Anders Krogh and S{\o}rensen, {Thorkild I.A.} and Pedersen, {Oluf Borbye} and Jun Wang and S{\o}ren Brunak",

year = "2018",

month = jul,

day = "1",

doi = "10.1038/s41588-018-0145-5",

language = "English",

volume = "50",

pages = "1054--1059",

journal = "Nature Genetics",

issn = "1061-4036",

publisher = "nature publishing group",

}

TY - JOUR

T1 - Accurate genotyping across variant classes and lengths using variant graphs

AU - Sibbesen, Jonas Andreas

AU - Maretty, Lasse

AU - Krogh, Anders

AU - Petersen, Bent

AU - Sibbesen, Jonas Andreas

AU - Liu, Siyang

AU - Have, Christian Theil

AU - Bork-Jensen, Jette

AU - Guo, Xiaosen

AU - Hansen, Torben

AU - Krogh, Anders

AU - Sørensen, Thorkild I.A.

AU - Pedersen, Oluf Borbye

AU - Wang, Jun

AU - Brunak, Søren

PY - 2018/7/1

Y1 - 2018/7/1

N2 - Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.

AB - Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.

U2 - 10.1038/s41588-018-0145-5

DO - 10.1038/s41588-018-0145-5

M3 - Journal article

C2 - 29915429

AN - SCOPUS:85048689124

SN - 1061-4036

VL - 50

SP - 1054

EP - 1059

JO - Nature Genetics

JF - Nature Genetics

ER -

Accurate genotyping across variant classes and lengths using variant graphs

Abstract

Access to Document

Fingerprint

Cite this