TY - JOUR
T1 - EXTRACT
T2 - interactive extraction of environment metadata and term suggestion for metagenomic sample annotation
AU - Pafilis, Evangelos
AU - Buttigieg, Pier Luigi
AU - Ferrell, Barbra
AU - Pereira, Emiliano
AU - Schnetzer, Julia
AU - Arvanitidis, Christos
AU - Jensen, Lars Juhl
N1 - © The Author(s) 2016. Published by Oxford University Press.
PY - 2016
Y1 - 2016
N2 - The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed.
AB - The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed.
U2 - 10.1093/database/baw005
DO - 10.1093/database/baw005
M3 - Journal article
C2 - 26896844
SN - 1758-0463
VL - 2016
JO - Database: The Journal of Biological Databases and Curation
JF - Database: The Journal of Biological Databases and Curation
M1 - baw005
ER -