Abstract
We study the r-near neighbors reporting problem (rNNR) (or spherical range reporting), i.e., reporting all points in a high-dimensional point set S that lie within a radius r of a given query point. This problem has played building block roles in finding near-duplicate web pages, solving k-diverse near neighbor search and content-based image retrieval problems. Our approach builds upon the locality-sensitive hashing (LSH) framework due to its appealing asymptotic sub-linear query time for near neighbor search problems in high-dimensional space. A bottleneck of the traditional LSH scheme for solving rNNR is that its performance is sensitive to data and query-dependent parameters. On data sets whose data distributions have diverse local density patterns, LSH with inappropriate tuning parameters can sometimes be outperformed by a simple linear search. In this paper, we introduce a hybrid search strategy between LSH-based search and linear search for rNNR in high-dimensional space. By integrating an auxiliary data structure into LSH hash tables, we can efficiently estimate the computational cost of LSH-based search for a given query regardless of the data distribution. This means that we are able to choose the appropriate search strategy between LSH-based search and linear search to achieve better performance. Moreover, the integrated data structure is time efficient and fits well with many recent state-of-the-art LSH-based approaches. Our experiments on real-world data sets show that the hybrid search approach outperforms (or is comparable to) both LSH-based search and linear search for a wide range of search radii and data distributions in high-dimensional space.
Originalsprog | Engelsk |
---|---|
Titel | Advances in Database Technology — EDBT 2017 : proceedings of the 20th International Conference on Extending Database Technology |
Redaktører | Volker Markl, Salvatore Orlando, Bernhard Mitschang, Periklis Andritsos, Kai-Uwe Sattler, Sebastian Breß |
Antal sider | 4 |
Forlag | OpenProceedings.org |
Publikationsdato | 2017 |
Sider | 454-457 |
ISBN (Elektronisk) | 978-3-89318-073-8 |
DOI | |
Status | Udgivet - 2017 |
Begivenhed | 20th International Conference on Extending Database Technology - Venedig, Italien Varighed: 21 mar. 2017 → 24 mar. 2017 Konferencens nummer: 20 |
Konference
Konference | 20th International Conference on Extending Database Technology |
---|---|
Nummer | 20 |
Land/Område | Italien |
By | Venedig |
Periode | 21/03/2017 → 24/03/2017 |