KorAP Architecture – Diving in the Deep Sea of Corpus Data

Nils Diewald; Michael Hanl; Eliza Margaretha; Joachim Bingel; Marc Kupietz; Piotr Banski; Andreas Witt

KorAP Architecture – Diving in the Deep Sea of Corpus Data

Nils Diewald, Michael Hanl, Eliza Margaretha, Joachim Bingel, Marc Kupietz, Piotr Banski, Andreas Witt

Department of Nordic Studies and Linguistics

12 Citations (Scopus)

Abstract

KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP's design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DeReKo for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.

Original language	English
Title of host publication	Proceedings of the 10th conference of the Language Resources and Evaluation Conference
Number of pages	6
Publisher	European Language Resources Association
Publication date	2016
Pages	3586-3591
ISBN (Print)	9782951740891
Publication status	Published - 2016
Event	LREC 2016 - Duration: 23 May 2016 → 28 May 2016

Conference

Conference	LREC 2016
Period	23/05/2016 → 28/05/2016

Access to Document

http://www.lrec-conf.org/proceedings/lrec2016/pdf/243_Paper.pdfLicence: CC BY-NC

Cite this

KorAP Architecture – Diving in the Deep Sea of Corpus Data. / Diewald, Nils; Hanl, Michael; Margaretha, Eliza et al.

Proceedings of the 10th conference of the Language Resources and Evaluation Conference. European Language Resources Association, 2016. p. 3586-3591.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Diewald, N, Hanl, M, Margaretha, E, Bingel, J, Kupietz, M, Banski, P & Witt, A 2016, KorAP Architecture – Diving in the Deep Sea of Corpus Data. in Proceedings of the 10th conference of the Language Resources and Evaluation Conference. European Language Resources Association, pp. 3586-3591, LREC 2016, 23/05/2016. <http://www.lrec-conf.org/proceedings/lrec2016/pdf/243_Paper.pdf>

@inproceedings{a433cd8a8802459087b7c068119419a1,

title = "KorAP Architecture – Diving in the Deep Sea of Corpus Data",

abstract = "KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP's design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DeReKo for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.",

author = "Nils Diewald and Michael Hanl and Eliza Margaretha and Joachim Bingel and Marc Kupietz and Piotr Banski and Andreas Witt",

year = "2016",

language = "English",

isbn = "9782951740891",

pages = "3586--3591",

booktitle = "Proceedings of the 10th conference of the Language Resources and Evaluation Conference",

publisher = "European Language Resources Association",

note = "LREC 2016 ; Conference date: 23-05-2016 Through 28-05-2016",

}

TY - GEN

T1 - KorAP Architecture – Diving in the Deep Sea of Corpus Data

AU - Diewald, Nils

AU - Hanl, Michael

AU - Margaretha, Eliza

AU - Bingel, Joachim

AU - Kupietz, Marc

AU - Banski, Piotr

AU - Witt, Andreas

PY - 2016

Y1 - 2016

N2 - KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP's design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DeReKo for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.

AB - KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP's design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DeReKo for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.

M3 - Article in proceedings

SN - 9782951740891

SP - 3586

EP - 3591

BT - Proceedings of the 10th conference of the Language Resources and Evaluation Conference

PB - European Language Resources Association

T2 - LREC 2016

Y2 - 23 May 2016 through 28 May 2016

ER -

KorAP Architecture – Diving in the Deep Sea of Corpus Data

Abstract

Conference

Access to Document

Fingerprint

Cite this