Adaptive distributional extensions to DFR ranking

Casper Petersen; Jakob Grue Simonsen; Kalervo Järvelin; Christina Lioma

doi:10.1145/2983323.2983895

Adaptive distributional extensions to DFR ranking

Casper Petersen, Jakob Grue Simonsen, Kalervo Järvelin, Christina Lioma

Department of Computer Science

4 Citations (Scopus)

Abstract

Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).

Original language	English
Title of host publication	Proceedings of the 25th ACM International Conference on Information and Knowledge Management
Number of pages	4
Publisher	Association for Computing Machinery
Publication date	24 Oct 2016
Pages	2005-2008
ISBN (Electronic)	978-1-4503-4073-1
DOIs	https://doi.org/10.1145/2983323.2983895
Publication status	Published - 24 Oct 2016
Event	25th ACM International Conference on Information and Knowledge Management - Indianapolis, United States Duration: 24 Oct 2016 → 28 Oct 2016 Conference number: 25

Conference

Conference	25th ACM International Conference on Information and Knowledge Management
Number	25
Country/Territory	United States
City	Indianapolis
Period	24/10/2016 → 28/10/2016

Access to Document

10.1145/2983323.2983895

Cite this

Adaptive distributional extensions to DFR ranking. / Petersen, Casper; Simonsen, Jakob Grue; Järvelin, Kalervo et al.
Proceedings of the 25th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 2016. p. 2005-2008.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Petersen, C, Simonsen, JG, Järvelin, K & Lioma, C 2016, Adaptive distributional extensions to DFR ranking. in Proceedings of the 25th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, pp. 2005-2008, 25th ACM International Conference on Information and Knowledge Management, Indianapolis, United States, 24/10/2016. https://doi.org/10.1145/2983323.2983895

@inproceedings{80093e8369084ef3b0db97d01d0c317f,

title = "Adaptive distributional extensions to DFR ranking",

abstract = "Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).",

keywords = "cs.IR",

author = "Casper Petersen and Simonsen, {Jakob Grue} and Kalervo J{\"a}rvelin and Christina Lioma",

year = "2016",

month = oct,

day = "24",

doi = "10.1145/2983323.2983895",

language = "English",

pages = "2005--2008",

booktitle = "Proceedings of the 25th ACM International Conference on Information and Knowledge Management",

publisher = "Association for Computing Machinery",

note = "25th ACM International Conference on Information and Knowledge Management ; Conference date: 24-10-2016 Through 28-10-2016",

}

TY - GEN

T1 - Adaptive distributional extensions to DFR ranking

AU - Petersen, Casper

AU - Simonsen, Jakob Grue

AU - Järvelin, Kalervo

AU - Lioma, Christina

N1 - Conference code: 25

PY - 2016/10/24

Y1 - 2016/10/24

N2 - Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).

AB - Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).

KW - cs.IR

U2 - 10.1145/2983323.2983895

DO - 10.1145/2983323.2983895

M3 - Article in proceedings

SP - 2005

EP - 2008

BT - Proceedings of the 25th ACM International Conference on Information and Knowledge Management

PB - Association for Computing Machinery

T2 - 25th ACM International Conference on Information and Knowledge Management

Y2 - 24 October 2016 through 28 October 2016

ER -

Adaptive distributional extensions to DFR ranking

Abstract

Conference

Access to Document

Fingerprint

Cite this