Modelling noise in second generation sequencing forensic genetics STR data using a one-inflated (zero-truncated) negative binomial model

Søren B. Vilsen; Torben Tvedebrink; Helle Smidt Mogensen; Niels Morling

doi:10.1016/j.fsigss.2015.09.165

Modelling noise in second generation sequencing forensic genetics STR data using a one-inflated (zero-truncated) negative binomial model

Søren B. Vilsen, Torben Tvedebrink, Helle Smidt Mogensen, Niels Morling

3 Citationer (Scopus)

Abstract

We present a model fitting the distribution of non-systematic errors in STR second generation sequencing, SGS, analysis. The model fits the distribution of non-systematic errors, i.e. the noise, using a one-inflated, zero-truncated, negative binomial model. The model is a two component model. The first component models the excess of singleton reads, while the second component models the remainder of the errors according to a truncated negative binomial distribution.

We estimated the parameters of the model in two ways: (1) we maximised the likelihood using an explicitly calculated gradient function and (2) we used the expectation-maximisation, EM, algorithm. The estimated parameters were used to create dynamic, sample specific thresholds for noise removal using marker specific proportions of the negative binomial distribution.

Based on data from dilution series experiments (amounts of DNA ranging from 100 pg to 2 ng) conducted at The Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark, the method was compared to that of a naïve model that implies the removal of reads with a coverage of less than 5–10% of the total marker coverage. In comparison, our method resulted in three allelic drop-outs (true alleles below threshold), whereas the 10%-threshold induced 12 drop-outs. The non-filtered error reads (e.g. stutters, shoulders and reads with miscalled bases) will subsequently be modelled by different statistical methodologies.

Originalsprog	Engelsk
Tidsskrift	Forensic Science International: Genetics. Supplement Series
Vol/bind	5
Sider (fra-til)	e416–e417
Antal sider	2
ISSN	1875-1768
DOI	https://doi.org/10.1016/j.fsigss.2015.09.165
Status	Udgivet - 1 dec. 2015

Adgang til dokumentet

10.1016/j.fsigss.2015.09.165

http://www.fsigeneticssup.com/article/S1875176815302122/pdf

Citationsformater

Modelling noise in second generation sequencing forensic genetics STR data using a one-inflated (zero-truncated) negative binomial model. / Vilsen, Søren B.; Tvedebrink, Torben; Mogensen, Helle Smidt et al.
I: Forensic Science International: Genetics. Supplement Series, Bind 5, 01.12.2015, s. e416–e417.

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

@article{eacf752205734849b2bd02ce273f82a0,

title = "Modelling noise in second generation sequencing forensic genetics STR data using a one-inflated (zero-truncated) negative binomial model",

abstract = "We present a model fitting the distribution of non-systematic errors in STR second generation sequencing, SGS, analysis. The model fits the distribution of non-systematic errors, i.e. the noise, using a one-inflated, zero-truncated, negative binomial model. The model is a two component model. The first component models the excess of singleton reads, while the second component models the remainder of the errors according to a truncated negative binomial distribution.We estimated the parameters of the model in two ways: (1) we maximised the likelihood using an explicitly calculated gradient function and (2) we used the expectation-maximisation, EM, algorithm. The estimated parameters were used to create dynamic, sample specific thresholds for noise removal using marker specific proportions of the negative binomial distribution.Based on data from dilution series experiments (amounts of DNA ranging from 100 pg to 2 ng) conducted at The Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark, the method was compared to that of a na{\"i}ve model that implies the removal of reads with a coverage of less than 5–10% of the total marker coverage. In comparison, our method resulted in three allelic drop-outs (true alleles below threshold), whereas the 10%-threshold induced 12 drop-outs. The non-filtered error reads (e.g. stutters, shoulders and reads with miscalled bases) will subsequently be modelled by different statistical methodologies.",

author = "Vilsen, {S{\o}ren B.} and Torben Tvedebrink and Mogensen, {Helle Smidt} and Niels Morling",

year = "2015",

month = dec,

day = "1",

doi = "10.1016/j.fsigss.2015.09.165",

language = "English",

volume = "5",

pages = "e416–e417",

journal = "Forensic Science International: Genetics. Supplement Series",

issn = "1875-1768",

publisher = "Elsevier Ireland Ltd",

}

TY - JOUR

T1 - Modelling noise in second generation sequencing forensic genetics STR data using a one-inflated (zero-truncated) negative binomial model

AU - Vilsen, Søren B.

AU - Tvedebrink, Torben

AU - Mogensen, Helle Smidt

AU - Morling, Niels

PY - 2015/12/1

Y1 - 2015/12/1

N2 - We present a model fitting the distribution of non-systematic errors in STR second generation sequencing, SGS, analysis. The model fits the distribution of non-systematic errors, i.e. the noise, using a one-inflated, zero-truncated, negative binomial model. The model is a two component model. The first component models the excess of singleton reads, while the second component models the remainder of the errors according to a truncated negative binomial distribution.We estimated the parameters of the model in two ways: (1) we maximised the likelihood using an explicitly calculated gradient function and (2) we used the expectation-maximisation, EM, algorithm. The estimated parameters were used to create dynamic, sample specific thresholds for noise removal using marker specific proportions of the negative binomial distribution.Based on data from dilution series experiments (amounts of DNA ranging from 100 pg to 2 ng) conducted at The Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark, the method was compared to that of a naïve model that implies the removal of reads with a coverage of less than 5–10% of the total marker coverage. In comparison, our method resulted in three allelic drop-outs (true alleles below threshold), whereas the 10%-threshold induced 12 drop-outs. The non-filtered error reads (e.g. stutters, shoulders and reads with miscalled bases) will subsequently be modelled by different statistical methodologies.

AB - We present a model fitting the distribution of non-systematic errors in STR second generation sequencing, SGS, analysis. The model fits the distribution of non-systematic errors, i.e. the noise, using a one-inflated, zero-truncated, negative binomial model. The model is a two component model. The first component models the excess of singleton reads, while the second component models the remainder of the errors according to a truncated negative binomial distribution.We estimated the parameters of the model in two ways: (1) we maximised the likelihood using an explicitly calculated gradient function and (2) we used the expectation-maximisation, EM, algorithm. The estimated parameters were used to create dynamic, sample specific thresholds for noise removal using marker specific proportions of the negative binomial distribution.Based on data from dilution series experiments (amounts of DNA ranging from 100 pg to 2 ng) conducted at The Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark, the method was compared to that of a naïve model that implies the removal of reads with a coverage of less than 5–10% of the total marker coverage. In comparison, our method resulted in three allelic drop-outs (true alleles below threshold), whereas the 10%-threshold induced 12 drop-outs. The non-filtered error reads (e.g. stutters, shoulders and reads with miscalled bases) will subsequently be modelled by different statistical methodologies.

U2 - 10.1016/j.fsigss.2015.09.165

DO - 10.1016/j.fsigss.2015.09.165

M3 - Journal article

SN - 1875-1768

VL - 5

SP - e416–e417

JO - Forensic Science International: Genetics. Supplement Series

JF - Forensic Science International: Genetics. Supplement Series

ER -