Non-compositional term dependence for information retrieval

Christina Lioma, Jakob Grue Simonsen, Birger Larsen, Niels Dalum Hansen

12 Citations (Scopus)
2 Downloads (Pure)

Abstract

Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. red tape might be overall less frequent than tape measure in some corpus, but this does not mean that red+tape are less dependent than tape+measure. This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase red tape meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR.

Original languageEnglish
Title of host publicationSIGIR '15 : Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Number of pages10
PublisherAssociation for Computing Machinery
Publication date9 Aug 2015
Pages595-604
ISBN (Electronic)978-1-4503-3621-5
DOIs
Publication statusPublished - 9 Aug 2015
EventInternational ACM SIGIR Conference on Research and Development in Information Retrieval - , Chile
Duration: 9 Aug 201513 Aug 2015

Conference

ConferenceInternational ACM SIGIR Conference on Research and Development in Information Retrieval
Country/TerritoryChile
Period09/08/201513/08/2015

Fingerprint

Dive into the research topics of 'Non-compositional term dependence for information retrieval'. Together they form a unique fingerprint.

Cite this