A study of metrics of distance and correlation between ranked lists for compositionality detection

Christina Lioma; Niels Dalum Hansen

doi:10.1016/j.cogsys.2017.03.001

A study of metrics of distance and correlation between ranked lists for compositionality detection

Christina Lioma^*, Niels Dalum Hansen

^*Corresponding author af dette arbejde

Datalogisk Institut

3 Citationer (Scopus)

Abstract

Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is meaning-preserving, compositionality can be approximated as the semantic similarity between a phrase and a version of that phrase where words have been replaced by their synonyms. Different ways of representing such phrases exist (e.g., vectors (Kiela and Clark, 2013) or language models (Lioma, Simonsen, Larsen, and Hansen, 2015)), and the choice of representation affects the measurement of semantic similarity. We propose a new compositionality detection method that represents phrases as ranked lists of term weights. Our method approximates the semantic similarity between two ranked list representations using a range of well-known distance and correlation metrics. In contrast to most state-of-the-art approaches in compositionality detection, our method is completely unsupervised. Experiments with a publicly available dataset of 1048 human-annotated phrases shows that, compared to strong supervised baselines, our approach provides superior measurement of compositionality using any of the distance and correlation metrics considered.

Originalsprog	Engelsk
Tidsskrift	Cognitive Systems Research
Vol/bind	44
Sider (fra-til)	40-49
Antal sider	10
ISSN	2214-4366
DOI	https://doi.org/10.1016/j.cogsys.2017.03.001
Status	Udgivet - aug. 2017

Adgang til dokumentet

10.1016/j.cogsys.2017.03.001

http://arxiv.org/pdf/1703.03640Licens: Andet

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{989df65291c04279954791bbf98da341,

title = "A study of metrics of distance and correlation between ranked lists for compositionality detection",

abstract = "Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is meaning-preserving, compositionality can be approximated as the semantic similarity between a phrase and a version of that phrase where words have been replaced by their synonyms. Different ways of representing such phrases exist (e.g., vectors (Kiela and Clark, 2013) or language models (Lioma, Simonsen, Larsen, and Hansen, 2015)), and the choice of representation affects the measurement of semantic similarity. We propose a new compositionality detection method that represents phrases as ranked lists of term weights. Our method approximates the semantic similarity between two ranked list representations using a range of well-known distance and correlation metrics. In contrast to most state-of-the-art approaches in compositionality detection, our method is completely unsupervised. Experiments with a publicly available dataset of 1048 human-annotated phrases shows that, compared to strong supervised baselines, our approach provides superior measurement of compositionality using any of the distance and correlation metrics considered.",

keywords = "Compositionality detection, Metrics of distance and correlation",

author = "Christina Lioma and Hansen, {Niels Dalum}",

year = "2017",

month = aug,

doi = "10.1016/j.cogsys.2017.03.001",

language = "English",

volume = "44",

pages = "40--49",

journal = "Cognitive Systems Research",

issn = "2214-4366",

publisher = "Elsevier",

}

TY - JOUR

T1 - A study of metrics of distance and correlation between ranked lists for compositionality detection

AU - Lioma, Christina

AU - Hansen, Niels Dalum

PY - 2017/8

Y1 - 2017/8

N2 - Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is meaning-preserving, compositionality can be approximated as the semantic similarity between a phrase and a version of that phrase where words have been replaced by their synonyms. Different ways of representing such phrases exist (e.g., vectors (Kiela and Clark, 2013) or language models (Lioma, Simonsen, Larsen, and Hansen, 2015)), and the choice of representation affects the measurement of semantic similarity. We propose a new compositionality detection method that represents phrases as ranked lists of term weights. Our method approximates the semantic similarity between two ranked list representations using a range of well-known distance and correlation metrics. In contrast to most state-of-the-art approaches in compositionality detection, our method is completely unsupervised. Experiments with a publicly available dataset of 1048 human-annotated phrases shows that, compared to strong supervised baselines, our approach provides superior measurement of compositionality using any of the distance and correlation metrics considered.

AB - Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is meaning-preserving, compositionality can be approximated as the semantic similarity between a phrase and a version of that phrase where words have been replaced by their synonyms. Different ways of representing such phrases exist (e.g., vectors (Kiela and Clark, 2013) or language models (Lioma, Simonsen, Larsen, and Hansen, 2015)), and the choice of representation affects the measurement of semantic similarity. We propose a new compositionality detection method that represents phrases as ranked lists of term weights. Our method approximates the semantic similarity between two ranked list representations using a range of well-known distance and correlation metrics. In contrast to most state-of-the-art approaches in compositionality detection, our method is completely unsupervised. Experiments with a publicly available dataset of 1048 human-annotated phrases shows that, compared to strong supervised baselines, our approach provides superior measurement of compositionality using any of the distance and correlation metrics considered.

KW - Compositionality detection

KW - Metrics of distance and correlation

UR - http://www.scopus.com/inward/record.url?scp=85017228067&partnerID=8YFLogxK

U2 - 10.1016/j.cogsys.2017.03.001

DO - 10.1016/j.cogsys.2017.03.001

M3 - Journal article

AN - SCOPUS:85017228067

SN - 2214-4366

VL - 44

SP - 40

EP - 49

JO - Cognitive Systems Research

JF - Cognitive Systems Research

ER -

A study of metrics of distance and correlation between ranked lists for compositionality detection

Abstract

Adgang til dokumentet

Andre filer og links

Fingeraftryk

Citationsformater