Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection

Ziheng Yang, Wendy Shuk Wan Wong, Rasmus Nielsen

    1558 Citations (Scopus)

    Abstract

    Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (dN/dS, denoted {omega}) is used as a measure of selective pressure at the protein level, with {omega} > 1 indicating positive selection. Statistical distributions are used to model the variation in {omega} among sites, allowing a subset of sites to have {omega} > 1 while the rest of the sequence may be under purifying selection with {omega} < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with {omega} > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and {omega} ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.
    Original languageEnglish
    JournalMolecular Biology and Evolution
    Volume22
    Issue number4
    Pages (from-to)1107-1118
    ISSN0737-4038
    DOIs
    Publication statusPublished - 2005

    Fingerprint

    Dive into the research topics of 'Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection'. Together they form a unique fingerprint.

    Cite this