Abstract
A random hash function h is ε-minwise if for any set S, |S| = n, and element x ∈ S, Pr[h(x) = min h (S)] = (1 ± ε)/n. Minwise hash functions with low bias ε have widespread applications within similarity estimation. Hashing from a universe [u], the twisted tabulation hashing of Pǎtraşcu and Thorup [SODA'13] makes c = O(1) lookups in tables of size u1/c. Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields Õ(1/u1/c-minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS'79] Õ(1/u1/c-minwise hashing requires Ω(log u)-independence [Indyk SODA'99]. Pǎtraşcu and Thorup [STOC'11] had shown that simple tabulation, using same space and lookups yields Õ(1/n1/c-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.
Original language | English |
---|---|
Title of host publication | Algorithm Theory – SWAT 2014 : 14th Scandinavian Symposium and Workshops, Copenhagen, Denmark, July 2-4, 2014. Proceedings |
Editors | R. Ravi, Inge Li Gørtz |
Number of pages | 12 |
Publisher | Springer |
Publication date | 2014 |
Pages | 134-145 |
Chapter | 12 |
ISBN (Print) | 978-3-319-08403-9 |
ISBN (Electronic) | 978-3-319-08404-6 |
DOIs | |
Publication status | Published - 2014 |
Event | 14th Scandinavian Symposium and Workshops on Algorithm Theory - København, Denmark Duration: 2 Jun 2014 → 4 Jun 2014 Conference number: 14 |
Conference
Conference | 14th Scandinavian Symposium and Workshops on Algorithm Theory |
---|---|
Number | 14 |
Country/Territory | Denmark |
City | København |
Period | 02/06/2014 → 04/06/2014 |
Series | Lecture notes in computer science |
---|---|
Volume | 8503 |
ISSN | 0302-9743 |