Improving query correctness using centralized probably approximately correct (pac) search

Ingemar Cox; Jianhan Zhu; Ruoxun Fu; Lars Kai Hansen

Improving query correctness using centralized probably approximately correct (pac) search

Ingemar Cox, Jianhan Zhu, Ruoxun Fu, Lars Kai Hansen

3 Citations (Scopus)

Abstract

A non-deterministic architecture for information retrieval, known as probably approximately correct (PAC) search, has recently been proposed. However, for equivalent storage and computational resources, the performance of PAC is only 63% of a deterministic system. We propose a modification to the PAC architecture, introducing a centralized query coordination node. To respond to a query, random sampling of computers is replaced with pseudo-random sampling using the query as a seed. Then, for queries that occur frequently, this pseudo-random sample is iteratively refined so that performance improves with each iteration. A theoretical analysis is presented that provides an upper bound on the performance of any iterative algorithm. Two heuristic algorithms are then proposed to iteratively improve the performance of PAC search. Experiments on the TREC-8 dataset demonstrate that performance can improve from 67% to 96% in just 10 iterations, and continues to improve with each iteration. Thus, for queries that occur 10 or more times, the performance of a non-deterministic PAC architecture can closely match that of a deterministic system.

Translated title of the contribution	Improving query correctness using centralized probably approximately correct (pac) search
Original language	English
Title of host publication	Advances in Information Retrieval
Number of pages	16
Publisher	Springer Science+Business Media
Publication date	2010
Pages	265-280
Publication status	Published - 2010
Externally published	Yes

Cite this

@inbook{fdc53490f8634c51bacf7e47edfe61cf,

title = "Improving query correctness using centralized probably approximately correct (pac) search",

abstract = "A non-deterministic architecture for information retrieval, known as probably approximately correct (PAC) search, has recently been proposed. However, for equivalent storage and computational resources, the performance of PAC is only 63% of a deterministic system. We propose a modification to the PAC architecture, introducing a centralized query coordination node. To respond to a query, random sampling of computers is replaced with pseudo-random sampling using the query as a seed. Then, for queries that occur frequently, this pseudo-random sample is iteratively refined so that performance improves with each iteration. A theoretical analysis is presented that provides an upper bound on the performance of any iterative algorithm. Two heuristic algorithms are then proposed to iteratively improve the performance of PAC search. Experiments on the TREC-8 dataset demonstrate that performance can improve from 67% to 96% in just 10 iterations, and continues to improve with each iteration. Thus, for queries that occur 10 or more times, the performance of a non-deterministic PAC architecture can closely match that of a deterministic system.",

author = "Ingemar Cox and Jianhan Zhu and Ruoxun Fu and Hansen, {Lars Kai}",

year = "2010",

language = "English",

pages = "265--280",

booktitle = "Advances in Information Retrieval",

publisher = "Springer Science+Business Media",

address = "Singapore",

}

TY - CHAP

T1 - Improving query correctness using centralized probably approximately correct (pac) search

AU - Cox, Ingemar

AU - Zhu, Jianhan

AU - Fu, Ruoxun

AU - Hansen, Lars Kai

PY - 2010

Y1 - 2010

N2 - A non-deterministic architecture for information retrieval, known as probably approximately correct (PAC) search, has recently been proposed. However, for equivalent storage and computational resources, the performance of PAC is only 63% of a deterministic system. We propose a modification to the PAC architecture, introducing a centralized query coordination node. To respond to a query, random sampling of computers is replaced with pseudo-random sampling using the query as a seed. Then, for queries that occur frequently, this pseudo-random sample is iteratively refined so that performance improves with each iteration. A theoretical analysis is presented that provides an upper bound on the performance of any iterative algorithm. Two heuristic algorithms are then proposed to iteratively improve the performance of PAC search. Experiments on the TREC-8 dataset demonstrate that performance can improve from 67% to 96% in just 10 iterations, and continues to improve with each iteration. Thus, for queries that occur 10 or more times, the performance of a non-deterministic PAC architecture can closely match that of a deterministic system.

AB - A non-deterministic architecture for information retrieval, known as probably approximately correct (PAC) search, has recently been proposed. However, for equivalent storage and computational resources, the performance of PAC is only 63% of a deterministic system. We propose a modification to the PAC architecture, introducing a centralized query coordination node. To respond to a query, random sampling of computers is replaced with pseudo-random sampling using the query as a seed. Then, for queries that occur frequently, this pseudo-random sample is iteratively refined so that performance improves with each iteration. A theoretical analysis is presented that provides an upper bound on the performance of any iterative algorithm. Two heuristic algorithms are then proposed to iteratively improve the performance of PAC search. Experiments on the TREC-8 dataset demonstrate that performance can improve from 67% to 96% in just 10 iterations, and continues to improve with each iteration. Thus, for queries that occur 10 or more times, the performance of a non-deterministic PAC architecture can closely match that of a deterministic system.

M3 - Book chapter

SP - 265

EP - 280

BT - Advances in Information Retrieval

PB - Springer Science+Business Media

ER -

Improving query correctness using centralized probably approximately correct (pac) search

Abstract

Fingerprint

Cite this