On the total number of genes and their length distribution in complete microbial genomes

M Skovgaard, L J Jensen, S Brunak, David Ussery, A Krogh

154 Citations (Scopus)

Abstract

In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300 genes, we show that it probably has only approximately 3800 genes, and that a similar discrepancy exists for almost all published genomes.
Original languageEnglish
JournalTrends in Genetics
Volume17
Issue number8
Pages (from-to)425-8
Number of pages4
ISSN0168-9525
Publication statusPublished - 2001

Keywords

  • Databases, Factual
  • Escherichia coli
  • Genome
  • Genome, Bacterial
  • Models, Statistical
  • Open Reading Frames
  • Saccharomyces cerevisiae

Fingerprint

Dive into the research topics of 'On the total number of genes and their length distribution in complete microbial genomes'. Together they form a unique fingerprint.

Cite this