Entropy and graph based modelling of document coherence using discourse entities: an application to information retrieval

Casper Petersen, Christina Lioma, Jakob Grue Simonsen, Birger Larsen

7 Citationer (Scopus)
1 Downloads (Pure)

Abstract

We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse ow that can indicate coherence, such as the average clustering or be-tweenness of discourse entities in text. Experiments with several instantiations of these models show that: (i) our models perform on a par with two other well-known models of text coherence even without any parameter tuning, and (ii) reranking retrieval results according to their coherence scores gives notable performance gains, confirming a relation between document coherence and relevance. This work contributes two novel models of document coherence, the application of which to IR complements recent work in the integration of document cohesiveness or comprehensibility to ranking [5, 56].

OriginalsprogEngelsk
TitelProceedings of the 2015 International Conference on The Theory of Information Retrieval
Antal sider10
ForlagAssociation for Computing Machinery
Publikationsdato27 sep. 2015
Sider191-200
ISBN (Trykt)978-1-4503-3833-2
DOI
StatusUdgivet - 27 sep. 2015
BegivenhedACM SIGIR International Conference on the Theory of Information Retrieval - Amherst, USA
Varighed: 27 sep. 201530 sep. 2015

Konference

KonferenceACM SIGIR International Conference on the Theory of Information Retrieval
Land/OmrådeUSA
ByAmherst
Periode27/09/201530/09/2015

Fingeraftryk

Dyk ned i forskningsemnerne om 'Entropy and graph based modelling of document coherence using discourse entities: an application to information retrieval'. Sammen danner de et unikt fingeraftryk.

Citationsformater