One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition

Lars Juhl Jensen

One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition

Disease Systems Biology Program

64 Downloads (Pure)

Abstract

Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

Originalsprog	Engelsk
Tidsskrift	CEUR Workshop Proceedings
Vol/bind	1747
Antal sider	2
ISSN	1613-0073
Status	Udgivet - 2016

Adgang til dokumentet

BIT102_ICBO2016Forlagets udgivne version, 145 KB

http://ceur-ws.org/Vol-1747/BIT102_ICBO2016.pdfLicens: Ikke-specificeret

Citationsformater

@inproceedings{ea8cb3b98e4544c0b71774c9ca1d7170,

title = "One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition",

abstract = "Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.",

keywords = "Dictionaries, Named entity recognition, Software",

author = "Jensen, {Lars Juhl}",

year = "2016",

language = "English",

volume = "1747",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "ceur workshop proceedings",

}

TY - GEN

T1 - One tagger, many uses

T2 - Illustrating the power of ontologies in dictionary-based named entity recognition

AU - Jensen, Lars Juhl

PY - 2016

Y1 - 2016

N2 - Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

AB - Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

KW - Dictionaries

KW - Named entity recognition

KW - Software

M3 - Conference article

AN - SCOPUS:85018753947

SN - 1613-0073

VL - 1747

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

ER -

One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater