Abstract
In unstructured peer-to-peer networks it is necessary to query every node to guarantee finding a particular document. This is usually not feasible, so search is probabilistic. In previous work the metric of rank-accuracy was proposed to measure the performance of a query. It compares the set of documents retrieved from a search of a random subset of nodes to the set that would have been retrieved from an exhaustive search of all nodes, assigning a higher weight to higher ranked documents. It was shown that rank-accuracy can be significantly increased by replicating documents across nodes in proportion to their weighted retrieval rate, a function of query frequency and rank. However, this work assumed that the query distribution is known and unchanging, both of which are seldom true for practical systems. In this paper we make no such assumptions. We investigate how the popularity- and rank-aware distribution of documents can be approximated by replicating documents as queries are made. Using a simulated network of 10,000 nodes and 10,000,000 queries drawn from a Yahoo! web search engine log, we show that such a scheme can achieve over a 450% increase in overall rank-accuracy when compared to a uniform random distribution of documents. We also show this increase can be improved up to about 650% when the dynamically replicated documents are pruned down to just the terms that appear in queries.
Original language | English |
---|---|
Title of host publication | Peer-to-Peer Computing (P2P), 2013 IEEE Thirteenth International Conference on |
Number of pages | 5 |
Publication date | 2013 |
Publication status | Published - 2013 |
Externally published | Yes |