Hybrid LSH: faster near neighbors reporting in high-dimensional space

Dang Ninh Pham

4 Citationer (Scopus)
14 Downloads (Pure)

Abstract

We study the r-near neighbors reporting problem (rNNR) (or spherical range reporting), i.e., reporting all points in a high-dimensional point set S that lie within a radius r of a given query point. This problem has played building block roles in finding near-duplicate web pages, solving k-diverse near neighbor search and content-based image retrieval problems. Our approach builds upon the locality-sensitive hashing (LSH) framework due to its appealing asymptotic sub-linear query time for near neighbor search problems in high-dimensional space. A bottleneck of the traditional LSH scheme for solving rNNR is that its performance is sensitive to data and query-dependent parameters. On data sets whose data distributions have diverse local density patterns, LSH with inappropriate tuning parameters can sometimes be outperformed by a simple linear search. In this paper, we introduce a hybrid search strategy between LSH-based search and linear search for rNNR in high-dimensional space. By integrating an auxiliary data structure into LSH hash tables, we can efficiently estimate the computational cost of LSH-based search for a given query regardless of the data distribution. This means that we are able to choose the appropriate search strategy between LSH-based search and linear search to achieve better performance. Moreover, the integrated data structure is time efficient and fits well with many recent state-of-the-art LSH-based approaches. Our experiments on real-world data sets show that the hybrid search approach outperforms (or is comparable to) both LSH-based search and linear search for a wide range of search radii and data distributions in high-dimensional space.

OriginalsprogEngelsk
TitelAdvances in Database Technology — EDBT 2017 : proceedings of the 20th International Conference on Extending Database Technology
RedaktørerVolker Markl, Salvatore Orlando, Bernhard Mitschang, Periklis Andritsos, Kai-Uwe Sattler, Sebastian Breß
Antal sider4
ForlagOpenProceedings.org
Publikationsdato2017
Sider454-457
ISBN (Elektronisk)978-3-89318-073-8
DOI
StatusUdgivet - 2017
Begivenhed20th International Conference on Extending Database Technology - Venedig, Italien
Varighed: 21 mar. 201724 mar. 2017
Konferencens nummer: 20

Konference

Konference20th International Conference on Extending Database Technology
Nummer20
Land/OmrådeItalien
ByVenedig
Periode21/03/201724/03/2017

Fingeraftryk

Dyk ned i forskningsemnerne om 'Hybrid LSH: faster near neighbors reporting in high-dimensional space'. Sammen danner de et unikt fingeraftryk.

Citationsformater