Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes

Christian Theil Have, Emil Vincent Rosenbaum Appel, Niels Grarup, Torben Hansen, Jette Bork-Jensen

    Abstract

    Abstract—Undetected mislabeled samples may affect the
    results of genotype studies, particular when rare genetic
    variants are investigated. Mislabeled samples are often not
    detected during quality control and if they are detected, they
    are normally discarded due to a lack of a reliable method to
    recover the correct labels.
    Here we describe a statistical method which given a few extra
    independent genotypes (barcode genotypes) detects mislabeled
    samples and recovers the correct labels for sample mix-ups. We
    have implemented the method in a program (named
    Wunderbar) and we evaluate the reliability of the method on
    simulated data. We find that even with only a small number of
    barcode genotypes, Wunderbar is capable of identifying
    mislabeled samples and sample mix-ups with high sensitivity
    and specificity, even with a high genotyping error rate and even
    in the presence of dependency between the individual barcode
    genotypes.
    To detect mislabeled samples we calculate the probability
    that the discordance between genotypes in the data and in the
    independent genotypes can be attributed to random
    (non-mislabeling) genotyping errors. To identify mix-ups we
    calculate the probability of identifying the set of identical
    genotypes between sample x and sample y by chance. Based on
    this we calculate a mix-up confidence score with penalization
    for introducing mismatches in the proposed new label and
    adjustment for independency among the genotypes. This
    confidence score is used to identify probable mix-ups.
    Original languageEnglish
    Article number370
    JournalInternational Journal of Bioscience, Biochemistry and Bioinformatics
    Volume4
    Issue number5
    Pages (from-to)355-360
    Number of pages5
    ISSN2010-3638
    DOIs
    Publication statusPublished - 2014

    Fingerprint

    Dive into the research topics of 'Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes'. Together they form a unique fingerprint.

    Cite this