Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions

Chaochun Wei; Philippe Lamesch; Manimozhiyan Arumugam; Jennifer Rosenberg; Ping Hu; Marc Vidal; Michael R Brent

doi:10.1101/gr.3329005

Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions

Chaochun Wei, Philippe Lamesch, Manimozhiyan Arumugam, Jennifer Rosenberg, Ping Hu, Marc Vidal, Michael R Brent

31 Citationer (Scopus)

Abstract

The genome of Caenorhabditis elegans was the first animal genome to be sequenced. Although considerable effort has been devoted to annotating it, the standard WormBase annotation contains thousands of predicted genes for which there is no cDNA or EST evidence. We hypothesized that a more complete experimental annotation could be obtained by creating a more accurate gene-prediction program and then amplifying and sequencing predicted genes. Our approach was to adapt the TWINSCAN gene prediction system to C. elegans and C. briggsae and to improve its splice site and intron-length models. The resulting system has 60% sensitivity and 58% specificity in exact prediction of open reading frames (ORFs), and hence, proteins-the best results we are aware of any multicellular organism. We then attempted to amplify, clone, and sequence 265 TWINSCAN-predicted ORFs that did not overlap WormBase gene annotations. The success rate was 55%, adding 146 genes that were completely absent from WormBase to the ORF clone collection (ORFeome). The same procedure had a 7% success rate on 90 Worm Base "predicted" genes that do not overlap TWINSCAN predictions. These results indicate that the accuracy of WormBase could be significantly increased by replacing its partially curated predicted genes with TWINSCAN predictions. The technology described in this study will continue to drive the C. elegans ORFeome toward completion and contribute to the annotation of the three Caenorhabditis species currently being sequenced. The results also suggest that this technology can significantly improve our knowledge of the "parts list" for even the best-studied model organisms.

Originalsprog	Engelsk
Tidsskrift	Genome Research
Vol/bind	15
Udgave nummer	4
Sider (fra-til)	577-82
Antal sider	6
ISSN	1088-9051
DOI	https://doi.org/10.1101/gr.3329005
Status	Udgivet - 2005

Adgang til dokumentet

10.1101/gr.3329005

Citationsformater

@article{568e9408b1924762a081590f7b38f979,

title = "Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions",

abstract = "The genome of Caenorhabditis elegans was the first animal genome to be sequenced. Although considerable effort has been devoted to annotating it, the standard WormBase annotation contains thousands of predicted genes for which there is no cDNA or EST evidence. We hypothesized that a more complete experimental annotation could be obtained by creating a more accurate gene-prediction program and then amplifying and sequencing predicted genes. Our approach was to adapt the TWINSCAN gene prediction system to C. elegans and C. briggsae and to improve its splice site and intron-length models. The resulting system has 60% sensitivity and 58% specificity in exact prediction of open reading frames (ORFs), and hence, proteins-the best results we are aware of any multicellular organism. We then attempted to amplify, clone, and sequence 265 TWINSCAN-predicted ORFs that did not overlap WormBase gene annotations. The success rate was 55%, adding 146 genes that were completely absent from WormBase to the ORF clone collection (ORFeome). The same procedure had a 7% success rate on 90 Worm Base {"}predicted{"} genes that do not overlap TWINSCAN predictions. These results indicate that the accuracy of WormBase could be significantly increased by replacing its partially curated predicted genes with TWINSCAN predictions. The technology described in this study will continue to drive the C. elegans ORFeome toward completion and contribute to the annotation of the three Caenorhabditis species currently being sequenced. The results also suggest that this technology can significantly improve our knowledge of the {"}parts list{"} for even the best-studied model organisms.",

keywords = "Animals, Caenorhabditis elegans, Cloning, Molecular, Computational Biology, Databases, Genetic, Genes, Helminth, Genome, Genomics, Introns, Open Reading Frames, Sensitivity and Specificity",

author = "Chaochun Wei and Philippe Lamesch and Manimozhiyan Arumugam and Jennifer Rosenberg and Ping Hu and Marc Vidal and Brent, {Michael R}",

year = "2005",

doi = "10.1101/gr.3329005",

language = "English",

volume = "15",

pages = "577--82",

journal = "Genome Research",

issn = "1088-9051",

publisher = "Cold Spring Harbor Laboratory Press",

number = "4",

}

TY - JOUR

T1 - Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions

AU - Wei, Chaochun

AU - Lamesch, Philippe

AU - Arumugam, Manimozhiyan

AU - Rosenberg, Jennifer

AU - Hu, Ping

AU - Vidal, Marc

AU - Brent, Michael R

PY - 2005

Y1 - 2005

N2 - The genome of Caenorhabditis elegans was the first animal genome to be sequenced. Although considerable effort has been devoted to annotating it, the standard WormBase annotation contains thousands of predicted genes for which there is no cDNA or EST evidence. We hypothesized that a more complete experimental annotation could be obtained by creating a more accurate gene-prediction program and then amplifying and sequencing predicted genes. Our approach was to adapt the TWINSCAN gene prediction system to C. elegans and C. briggsae and to improve its splice site and intron-length models. The resulting system has 60% sensitivity and 58% specificity in exact prediction of open reading frames (ORFs), and hence, proteins-the best results we are aware of any multicellular organism. We then attempted to amplify, clone, and sequence 265 TWINSCAN-predicted ORFs that did not overlap WormBase gene annotations. The success rate was 55%, adding 146 genes that were completely absent from WormBase to the ORF clone collection (ORFeome). The same procedure had a 7% success rate on 90 Worm Base "predicted" genes that do not overlap TWINSCAN predictions. These results indicate that the accuracy of WormBase could be significantly increased by replacing its partially curated predicted genes with TWINSCAN predictions. The technology described in this study will continue to drive the C. elegans ORFeome toward completion and contribute to the annotation of the three Caenorhabditis species currently being sequenced. The results also suggest that this technology can significantly improve our knowledge of the "parts list" for even the best-studied model organisms.

AB - The genome of Caenorhabditis elegans was the first animal genome to be sequenced. Although considerable effort has been devoted to annotating it, the standard WormBase annotation contains thousands of predicted genes for which there is no cDNA or EST evidence. We hypothesized that a more complete experimental annotation could be obtained by creating a more accurate gene-prediction program and then amplifying and sequencing predicted genes. Our approach was to adapt the TWINSCAN gene prediction system to C. elegans and C. briggsae and to improve its splice site and intron-length models. The resulting system has 60% sensitivity and 58% specificity in exact prediction of open reading frames (ORFs), and hence, proteins-the best results we are aware of any multicellular organism. We then attempted to amplify, clone, and sequence 265 TWINSCAN-predicted ORFs that did not overlap WormBase gene annotations. The success rate was 55%, adding 146 genes that were completely absent from WormBase to the ORF clone collection (ORFeome). The same procedure had a 7% success rate on 90 Worm Base "predicted" genes that do not overlap TWINSCAN predictions. These results indicate that the accuracy of WormBase could be significantly increased by replacing its partially curated predicted genes with TWINSCAN predictions. The technology described in this study will continue to drive the C. elegans ORFeome toward completion and contribute to the annotation of the three Caenorhabditis species currently being sequenced. The results also suggest that this technology can significantly improve our knowledge of the "parts list" for even the best-studied model organisms.

KW - Animals

KW - Caenorhabditis elegans

KW - Cloning, Molecular

KW - Computational Biology

KW - Databases, Genetic

KW - Genes, Helminth

KW - Genome

KW - Genomics

KW - Introns

KW - Open Reading Frames

KW - Sensitivity and Specificity

U2 - 10.1101/gr.3329005

DO - 10.1101/gr.3329005

M3 - Journal article

C2 - 15805498

SN - 1088-9051

VL - 15

SP - 577

EP - 582

JO - Genome Research

JF - Genome Research

IS - 4

ER -

Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater