Automatic Transformation of the Thai Categorial Grammar Treebank to Dependency Trees

Christian Rishøj, Taneth Ruangrajitpakorn, Prachya Boonkwan, Thepchai Supnithi

Abstract

A method for deriving an approximately labeled dependency treebank from the Thai Categorial Grammar Treebank has been implemented. The method involves a lexical dictionary for assigning dependency directions to the CG types associated with the grammatical entities in the CG bank, falling back on a generic mapping of CG types in case of unknown words. Currently, all but a handful of the trees in the Thai CG bank can unambiguously be transformed into directed dependency trees. Dependency labels can optionally be assigned with a learned classifier, which in a preliminary evaluation with a very small training set achieves 76.5% label accuracy. In the process, a number of annotation errors in the CG bank were identified and corrected. Although rather limited in its coverage, excluding e.g. long-distance dependencies, topicalisations and longer sentences, the resulting treebank is believed to be sound in terms of structural annotational consistency and a valuable complement to the scarce Thai language resources in existence.
OriginalsprogEngelsk
TitelProceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP)
ForlagAssociation for Computational Linguistics
Publikationsdatonov. 2011
StatusUdgivet - nov. 2011

Fingeraftryk

Dyk ned i forskningsemnerne om 'Automatic Transformation of the Thai Categorial Grammar Treebank to Dependency Trees'. Sammen danner de et unikt fingeraftryk.

Citationsformater