Looking for Darwin in Genomic Sequences-Validity and Success of Statistical Methods

Weiwei Zhai; Rasmus Nielsen; Nick Goldman; Ziheng Yang

doi:10.1093/molbev/mss104

Looking for Darwin in Genomic Sequences-Validity and Success of Statistical Methods

Weiwei Zhai, Rasmus Nielsen, Nick Goldman, Ziheng Yang

23 Citationer (Scopus)

Abstract

The use of codon substitution models to compare synonymous and nonsynonymous substitution rates is a widely used approach to detecting positive Darwinian selection affecting protein evolution. However, in several recent papers, Hughes and colleagues claim that codon-based likelihood-ratio tests (LRTs) are logically flawed as they lack prior hypotheses and fail to accommodate random fluctuations in synonymous and nonsynonymous substitutions Friedman and Hughes (2007) also used site-based LRTs to analyze 605 gene families consisting of human and mouse paralogues. They found that the outcome of the tests was largely determined by irrelevant factors such as the GC content at the third codon positions and the synonymous rate d(S), but not by the nonsynonymous rate d(N) or the d(N)/d(S) ratio, factors that should be related to selection. Here, we reanalyze those data. Contra Friedman and Hughes, we found that the test results are related to sequence length and the average d(N)/d(S) ratio. We examine the criticisms of Hughes and suggest that they are based on misunderstandings of the codon models and on statistical errors. Our analyses suggest that codon-based tests are useful tools for comparative analysis of genomic data sets.

Originalsprog	Engelsk
Tidsskrift	Molecular Biology and Evolution
Vol/bind	29
Udgave nummer	10
Sider (fra-til)	2889-2893
Antal sider	5
ISSN	0737-4038
DOI	https://doi.org/10.1093/molbev/mss104
Status	Udgivet - okt. 2012
Udgivet eksternt	Ja

Emneord

codon model
Darwinian selection
likelihood-ratio test

Adgang til dokumentet

10.1093/molbev/mss104

Citationsformater

@article{6874ef2b339e494188634ae4757dbf88,

title = "Looking for Darwin in Genomic Sequences-Validity and Success of Statistical Methods",

abstract = "The use of codon substitution models to compare synonymous and nonsynonymous substitution rates is a widely used approach to detecting positive Darwinian selection affecting protein evolution. However, in several recent papers, Hughes and colleagues claim that codon-based likelihood-ratio tests (LRTs) are logically flawed as they lack prior hypotheses and fail to accommodate random fluctuations in synonymous and nonsynonymous substitutions Friedman and Hughes (2007) also used site-based LRTs to analyze 605 gene families consisting of human and mouse paralogues. They found that the outcome of the tests was largely determined by irrelevant factors such as the GC content at the third codon positions and the synonymous rate d(S), but not by the nonsynonymous rate d(N) or the d(N)/d(S) ratio, factors that should be related to selection. Here, we reanalyze those data. Contra Friedman and Hughes, we found that the test results are related to sequence length and the average d(N)/d(S) ratio. We examine the criticisms of Hughes and suggest that they are based on misunderstandings of the codon models and on statistical errors. Our analyses suggest that codon-based tests are useful tools for comparative analysis of genomic data sets.",

keywords = "codon model, Darwinian selection, likelihood-ratio test",

author = "Weiwei Zhai and Rasmus Nielsen and Nick Goldman and Ziheng Yang",

year = "2012",

month = oct,

doi = "10.1093/molbev/mss104",

language = "English",

volume = "29",

pages = "2889--2893",

journal = "Molecular Biology and Evolution",

issn = "0737-4038",

publisher = "Oxford University Press",

number = "10",

}

TY - JOUR

T1 - Looking for Darwin in Genomic Sequences-Validity and Success of Statistical Methods

AU - Zhai, Weiwei

AU - Nielsen, Rasmus

AU - Goldman, Nick

AU - Yang, Ziheng

PY - 2012/10

Y1 - 2012/10

N2 - The use of codon substitution models to compare synonymous and nonsynonymous substitution rates is a widely used approach to detecting positive Darwinian selection affecting protein evolution. However, in several recent papers, Hughes and colleagues claim that codon-based likelihood-ratio tests (LRTs) are logically flawed as they lack prior hypotheses and fail to accommodate random fluctuations in synonymous and nonsynonymous substitutions Friedman and Hughes (2007) also used site-based LRTs to analyze 605 gene families consisting of human and mouse paralogues. They found that the outcome of the tests was largely determined by irrelevant factors such as the GC content at the third codon positions and the synonymous rate d(S), but not by the nonsynonymous rate d(N) or the d(N)/d(S) ratio, factors that should be related to selection. Here, we reanalyze those data. Contra Friedman and Hughes, we found that the test results are related to sequence length and the average d(N)/d(S) ratio. We examine the criticisms of Hughes and suggest that they are based on misunderstandings of the codon models and on statistical errors. Our analyses suggest that codon-based tests are useful tools for comparative analysis of genomic data sets.

AB - The use of codon substitution models to compare synonymous and nonsynonymous substitution rates is a widely used approach to detecting positive Darwinian selection affecting protein evolution. However, in several recent papers, Hughes and colleagues claim that codon-based likelihood-ratio tests (LRTs) are logically flawed as they lack prior hypotheses and fail to accommodate random fluctuations in synonymous and nonsynonymous substitutions Friedman and Hughes (2007) also used site-based LRTs to analyze 605 gene families consisting of human and mouse paralogues. They found that the outcome of the tests was largely determined by irrelevant factors such as the GC content at the third codon positions and the synonymous rate d(S), but not by the nonsynonymous rate d(N) or the d(N)/d(S) ratio, factors that should be related to selection. Here, we reanalyze those data. Contra Friedman and Hughes, we found that the test results are related to sequence length and the average d(N)/d(S) ratio. We examine the criticisms of Hughes and suggest that they are based on misunderstandings of the codon models and on statistical errors. Our analyses suggest that codon-based tests are useful tools for comparative analysis of genomic data sets.

KW - codon model

KW - Darwinian selection

KW - likelihood-ratio test

U2 - 10.1093/molbev/mss104

DO - 10.1093/molbev/mss104

M3 - Journal article

C2 - 22490825

SN - 0737-4038

VL - 29

SP - 2889

EP - 2893

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

IS - 10

ER -