Increasing ranked accuracy for unstructured distributed search with dynamic replication

Sami Richardson; Ingemar J Cox

Increasing ranked accuracy for unstructured distributed search with dynamic replication

1 Citation (Scopus)

Abstract

In unstructured peer-to-peer networks it is necessary to query every node to guarantee finding a particular document. This is usually not feasible, so search is probabilistic. In previous work the metric of rank-accuracy was proposed to measure the performance of a query. It compares the set of documents retrieved from a search of a random subset of nodes to the set that would have been retrieved from an exhaustive search of all nodes, assigning a higher weight to higher ranked documents. It was shown that rank-accuracy can be significantly increased by replicating documents across nodes in proportion to their weighted retrieval rate, a function of query frequency and rank. However, this work assumed that the query distribution is known and unchanging, both of which are seldom true for practical systems. In this paper we make no such assumptions. We investigate how the popularity- and rank-aware distribution of documents can be approximated by replicating documents as queries are made. Using a simulated network of 10,000 nodes and 10,000,000 queries drawn from a Yahoo! web search engine log, we show that such a scheme can achieve over a 450% increase in overall rank-accuracy when compared to a uniform random distribution of documents. We also show this increase can be improved up to about 650% when the dynamically replicated documents are pruned down to just the terms that appear in queries.

Original language	English
Title of host publication	Peer-to-Peer Computing (P2P), 2013 IEEE Thirteenth International Conference on
Number of pages	5
Publication date	2013
Publication status	Published - 2013
Externally published	Yes

Cite this

@inbook{47668d7fafa0404eaa6975ba3b9f556d,

title = "Increasing ranked accuracy for unstructured distributed search with dynamic replication",

abstract = "In unstructured peer-to-peer networks it is necessary to query every node to guarantee finding a particular document. This is usually not feasible, so search is probabilistic. In previous work the metric of rank-accuracy was proposed to measure the performance of a query. It compares the set of documents retrieved from a search of a random subset of nodes to the set that would have been retrieved from an exhaustive search of all nodes, assigning a higher weight to higher ranked documents. It was shown that rank-accuracy can be significantly increased by replicating documents across nodes in proportion to their weighted retrieval rate, a function of query frequency and rank. However, this work assumed that the query distribution is known and unchanging, both of which are seldom true for practical systems. In this paper we make no such assumptions. We investigate how the popularity- and rank-aware distribution of documents can be approximated by replicating documents as queries are made. Using a simulated network of 10,000 nodes and 10,000,000 queries drawn from a Yahoo! web search engine log, we show that such a scheme can achieve over a 450% increase in overall rank-accuracy when compared to a uniform random distribution of documents. We also show this increase can be improved up to about 650% when the dynamically replicated documents are pruned down to just the terms that appear in queries.",

author = "Sami Richardson and Cox, {Ingemar J}",

year = "2013",

language = "English",

booktitle = "Peer-to-Peer Computing (P2P), 2013 IEEE Thirteenth International Conference on",

}

TY - CHAP

T1 - Increasing ranked accuracy for unstructured distributed search with dynamic replication

AU - Richardson, Sami

AU - Cox, Ingemar J

PY - 2013

Y1 - 2013

N2 - In unstructured peer-to-peer networks it is necessary to query every node to guarantee finding a particular document. This is usually not feasible, so search is probabilistic. In previous work the metric of rank-accuracy was proposed to measure the performance of a query. It compares the set of documents retrieved from a search of a random subset of nodes to the set that would have been retrieved from an exhaustive search of all nodes, assigning a higher weight to higher ranked documents. It was shown that rank-accuracy can be significantly increased by replicating documents across nodes in proportion to their weighted retrieval rate, a function of query frequency and rank. However, this work assumed that the query distribution is known and unchanging, both of which are seldom true for practical systems. In this paper we make no such assumptions. We investigate how the popularity- and rank-aware distribution of documents can be approximated by replicating documents as queries are made. Using a simulated network of 10,000 nodes and 10,000,000 queries drawn from a Yahoo! web search engine log, we show that such a scheme can achieve over a 450% increase in overall rank-accuracy when compared to a uniform random distribution of documents. We also show this increase can be improved up to about 650% when the dynamically replicated documents are pruned down to just the terms that appear in queries.

AB - In unstructured peer-to-peer networks it is necessary to query every node to guarantee finding a particular document. This is usually not feasible, so search is probabilistic. In previous work the metric of rank-accuracy was proposed to measure the performance of a query. It compares the set of documents retrieved from a search of a random subset of nodes to the set that would have been retrieved from an exhaustive search of all nodes, assigning a higher weight to higher ranked documents. It was shown that rank-accuracy can be significantly increased by replicating documents across nodes in proportion to their weighted retrieval rate, a function of query frequency and rank. However, this work assumed that the query distribution is known and unchanging, both of which are seldom true for practical systems. In this paper we make no such assumptions. We investigate how the popularity- and rank-aware distribution of documents can be approximated by replicating documents as queries are made. Using a simulated network of 10,000 nodes and 10,000,000 queries drawn from a Yahoo! web search engine log, we show that such a scheme can achieve over a 450% increase in overall rank-accuracy when compared to a uniform random distribution of documents. We also show this increase can be improved up to about 650% when the dynamically replicated documents are pruned down to just the terms that appear in queries.

M3 - Book chapter

BT - Peer-to-Peer Computing (P2P), 2013 IEEE Thirteenth International Conference on

ER -

Increasing ranked accuracy for unstructured distributed search with dynamic replication

Abstract

Fingerprint

Cite this