Approximately minwise independence with twisted tabulation

Søren Dahlgaard, Mikkel Thorup

6 Citations (Scopus)

Abstract

A random hash function h is ε-minwise if for any set S, |S| = n, and element x ∈ S, Pr[h(x) = min h (S)] = (1 ± ε)/n. Minwise hash functions with low bias ε have widespread applications within similarity estimation. Hashing from a universe [u], the twisted tabulation hashing of Pǎtraşcu and Thorup [SODA'13] makes c = O(1) lookups in tables of size u1/c. Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields Õ(1/u1/c-minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS'79] Õ(1/u1/c-minwise hashing requires Ω(log u)-independence [Indyk SODA'99]. Pǎtraşcu and Thorup [STOC'11] had shown that simple tabulation, using same space and lookups yields Õ(1/n1/c-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.

Original languageEnglish
Title of host publicationAlgorithm Theory – SWAT 2014 : 14th Scandinavian Symposium and Workshops, Copenhagen, Denmark, July 2-4, 2014. Proceedings
EditorsR. Ravi, Inge Li Gørtz
Number of pages12
PublisherSpringer
Publication date2014
Pages134-145
Chapter12
ISBN (Print)978-3-319-08403-9
ISBN (Electronic)978-3-319-08404-6
DOIs
Publication statusPublished - 2014
Event14th Scandinavian Symposium and Workshops on Algorithm Theory - København, Denmark
Duration: 2 Jun 20144 Jun 2014
Conference number: 14

Conference

Conference14th Scandinavian Symposium and Workshops on Algorithm Theory
Number14
Country/TerritoryDenmark
CityKøbenhavn
Period02/06/201404/06/2014
SeriesLecture notes in computer science
Volume8503
ISSN0302-9743

Fingerprint

Dive into the research topics of 'Approximately minwise independence with twisted tabulation'. Together they form a unique fingerprint.

Cite this