Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes

Christian Theil Have, Emil Vincent Rosenbaum Appel, Niels Grarup, Torben Hansen, Jette Bork-Jensen

    Abstract

    Abstract—Undetected mislabeled samples may affect the
    results of genotype studies, particular when rare genetic
    variants are investigated. Mislabeled samples are often not
    detected during quality control and if they are detected, they
    are normally discarded due to a lack of a reliable method to
    recover the correct labels.
    Here we describe a statistical method which given a few extra
    independent genotypes (barcode genotypes) detects mislabeled
    samples and recovers the correct labels for sample mix-ups. We
    have implemented the method in a program (named
    Wunderbar) and we evaluate the reliability of the method on
    simulated data. We find that even with only a small number of
    barcode genotypes, Wunderbar is capable of identifying
    mislabeled samples and sample mix-ups with high sensitivity
    and specificity, even with a high genotyping error rate and even
    in the presence of dependency between the individual barcode
    genotypes.
    To detect mislabeled samples we calculate the probability
    that the discordance between genotypes in the data and in the
    independent genotypes can be attributed to random
    (non-mislabeling) genotyping errors. To identify mix-ups we
    calculate the probability of identifying the set of identical
    genotypes between sample x and sample y by chance. Based on
    this we calculate a mix-up confidence score with penalization
    for introducing mismatches in the proposed new label and
    adjustment for independency among the genotypes. This
    confidence score is used to identify probable mix-ups.
    OriginalsprogEngelsk
    Artikelnummer370
    TidsskriftInternational Journal of Bioscience, Biochemistry and Bioinformatics
    Vol/bind4
    Udgave nummer5
    Sider (fra-til)355-360
    Antal sider5
    ISSN2010-3638
    DOI
    StatusUdgivet - 2014

    Fingeraftryk

    Dyk ned i forskningsemnerne om 'Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes'. Sammen danner de et unikt fingeraftryk.

    Citationsformater