One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition

Lars Juhl Jensen

One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition

Disease Systems Biology Program

64 Downloads (Pure)

Abstract

Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

Original language	English
Journal	CEUR Workshop Proceedings
Volume	1747
Number of pages	2
ISSN	1613-0073
Publication status	Published - 2016

Keywords

Dictionaries
Named entity recognition
Software

Access to Document

BIT102_ICBO2016Final published version, 145 KB

http://ceur-ws.org/Vol-1747/BIT102_ICBO2016.pdfLicence: Unspecified

Cite this

@inproceedings{ea8cb3b98e4544c0b71774c9ca1d7170,

title = "One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition",

abstract = "Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.",

keywords = "Dictionaries, Named entity recognition, Software",

author = "Jensen, {Lars Juhl}",

year = "2016",

language = "English",

volume = "1747",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "ceur workshop proceedings",

}

TY - GEN

T1 - One tagger, many uses

T2 - Illustrating the power of ontologies in dictionary-based named entity recognition

AU - Jensen, Lars Juhl

PY - 2016

Y1 - 2016

N2 - Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

AB - Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

KW - Dictionaries

KW - Named entity recognition

KW - Software

M3 - Conference article

AN - SCOPUS:85018753947

SN - 1613-0073

VL - 1747

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

ER -

One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition

Abstract

Keywords

Access to Document

Fingerprint

Cite this