Abstract
A common problem in unstructured peer-to-peer (P2P) information retrieval is the need to compute global statistics of the full collection, when only a small subset of the collection is visible to a peer. Without accurate estimates of these statistics, the effectiveness of modern retrieval models can be reduced. We show that for the case of a probably approximately correct P2P architecture, and using either the BM25 retrieval model or a language model with Dirichlet smoothing, very close approximations of the required global statistics can be estimated with very little overhead and a small extension to the protocol. However, through theoretical modeling and simulations we demonstrate this technique also greatly increases the ability for adversarial peers to manipulate search results. We show an adversary controlling fewer than 10% of peers can censor or increase the rank of documents, or disrupt overall search results. As a defense, we propose a simple modification to the extension, and show global statistics estimation is viable even when up to 40% of peers are adversarial.
Original language | Undefined/Unknown |
---|---|
Title of host publication | Proceedings of the 37th International ACM SIGIR Conference on Research 38; Development in Information Retrieval |
Number of pages | 10 |
Publisher | IEEE/ACM |
Publication date | 2014 |
Pages | 203-212 |
DOIs | |
Publication status | Published - 2014 |
Externally published | Yes |
Event | International ACM SIGIR conference on Research & development in information retrieval (2014) - Gold Coast, Australia Duration: 6 Jul 2014 → 11 Jul 2014 |
Conference
Conference | International ACM SIGIR conference on Research & development in information retrieval (2014) |
---|---|
Country/Territory | Australia |
City | Gold Coast |
Period | 06/07/2014 → 11/07/2014 |