TY - JOUR
T1 - Obtaining estimates for the ages of all the protein-coding genes and most of the ontology-identified noncoding genes of the human genome, assigned to 19 phylostrata
AU - Litman, Thomas
AU - Stein, Wilfred D
N1 - Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
PY - 2019/2
Y1 - 2019/2
N2 - Following Liebeskind et al [1], we have attempted to find consensus ages for the protein-coding and the noncoding genes of the human genome, using publicly-available ortholog databases. For each database separately, we determined its age estimate for the genes it listed, determining this by identifying the earliest ortholog for the gene in question. We assigned these ages to 1 of the 19 major phylostrata defined by Domazet-Loso and Tautz [2], 2 of which were further subdivided. From these various estimates, we found the modal value if 1 was present, defining this as the consensus age for the gene. For the genes where no consensus value could be found, we recorded the median value of the age estimates across the databases interrogated. We present a resource that lists the age, as so defined, of every one of the 19,660 protein-coding genes and of 5,981 of the 16,528 non-protein-coding genes of the human genome, the age being the time when the gene was accreted to the evolving human genome. We calculate the number of genes that accreted to the genome, epoch by epoch, and consider the rate at which they accreted.
AB - Following Liebeskind et al [1], we have attempted to find consensus ages for the protein-coding and the noncoding genes of the human genome, using publicly-available ortholog databases. For each database separately, we determined its age estimate for the genes it listed, determining this by identifying the earliest ortholog for the gene in question. We assigned these ages to 1 of the 19 major phylostrata defined by Domazet-Loso and Tautz [2], 2 of which were further subdivided. From these various estimates, we found the modal value if 1 was present, defining this as the consensus age for the gene. For the genes where no consensus value could be found, we recorded the median value of the age estimates across the databases interrogated. We present a resource that lists the age, as so defined, of every one of the 19,660 protein-coding genes and of 5,981 of the 16,528 non-protein-coding genes of the human genome, the age being the time when the gene was accreted to the evolving human genome. We calculate the number of genes that accreted to the genome, epoch by epoch, and consider the rate at which they accreted.
KW - Computational Biology
KW - Databases, Genetic
KW - Evolution, Molecular
KW - Genome, Human/genetics
KW - Humans
KW - Open Reading Frames/genetics
KW - Sequence Homology
U2 - 10.1053/j.seminoncol.2018.11.002
DO - 10.1053/j.seminoncol.2018.11.002
M3 - Journal article
C2 - 30558821
SN - 0093-7754
VL - 46
SP - 3
EP - 9
JO - Seminars in Oncology
JF - Seminars in Oncology
IS - 1
ER -