Shared Task System Description: Frustratingly Hard Compositionality Prediction

Anders Trærup Johannsen, Hector Martinez Alonso, Christian Rishøj, Anders Søgaard

Abstract

We considered a wide range of features for the DiSCo 2011 shared task about compositionality prediction for word pairs, including COALS-based endocentricity scores, compositionality scores based on distributional clusters, statistics about wordnet-induced paraphrases, hyphenation, and the likelihood of long translation equivalents in other languages. Many of the features we considered correlated significantly with human compositionality scores, but in support vector regression experiments we obtained the best results using only COALS-based endocentricity scores. Our system was nevertheless the best performing system in the shared task, and average error reductions over a simple baseline in cross-validation were 13.7% for English glish and 50.1% for German.
OriginalsprogEngelsk
TitelProceedings of the Workshop on Distributional Semantics and Compositionality (DiSCo'2011)
Antal sider4
UdgivelsesstedPortland, Oregon
ForlagAssociation for Computational Linguistics
Publikationsdatojun. 2011
Sider29-32
ISBN (Trykt)9781937284022
StatusUdgivet - jun. 2011

Fingeraftryk

Dyk ned i forskningsemnerne om 'Shared Task System Description: Frustratingly Hard Compositionality Prediction'. Sammen danner de et unikt fingeraftryk.

Citationsformater