Automatic Transformation of the Thai Categorial Grammar Treebank to Dependency Trees

Christian Rishøj, Taneth Ruangrajitpakorn, Prachya Boonkwan, Thepchai Supnithi

Abstract

A method for deriving an approximately labeled dependency treebank from the Thai Categorial Grammar Treebank has been implemented. The method involves a lexical dictionary for assigning dependency directions to the CG types associated with the grammatical entities in the CG bank, falling back on a generic mapping of CG types in case of unknown words. Currently, all but a handful of the trees in the Thai CG bank can unambiguously be transformed into directed dependency trees. Dependency labels can optionally be assigned with a learned classifier, which in a preliminary evaluation with a very small training set achieves 76.5% label accuracy. In the process, a number of annotation errors in the CG bank were identified and corrected. Although rather limited in its coverage, excluding e.g. long-distance dependencies, topicalisations and longer sentences, the resulting treebank is believed to be sound in terms of structural annotational consistency and a valuable complement to the scarce Thai language resources in existence.
Original languageEnglish
Title of host publicationProceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP)
PublisherAssociation for Computational Linguistics
Publication dateNov 2011
Publication statusPublished - Nov 2011

Fingerprint

Dive into the research topics of 'Automatic Transformation of the Thai Categorial Grammar Treebank to Dependency Trees'. Together they form a unique fingerprint.

Cite this