Fast and powerful hashing using tabulation

Mikkel Thorup*

*Corresponding author for this work
5 Citations (Scopus)

Abstract

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist (A new hashing method with application for game playing. Technical Report 88, Computer Sciences Department, University of Wisconsin). Keys are viewed as consisting of c characters and we have precomputed character tables h1, ⋯, hc mapping characters to random hash values. A key x = (x1, ⋯, xc) is hashed to h1[x1] ⊕ h2[x2]⋯ ⊕ hc[xc]. This schemes is very fast with character tables in cache. Although simple tabulation is not even four-independent, it does provide many of the guarantees that are normally obtained via higher independence, for example, linear probing and Cuckoo hashing. Next, we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoffstyle tail bounds and a very small bias for minwise hashing. This is also yields an extremely fast pseudorandom number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Wegman and Carter.26 In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. Although these tabulation schemes are all easy to implement and use, their analysis is not.

Original languageEnglish
JournalCommunications of the ACM
Volume60
Issue number7
Pages (from-to)94-101
Number of pages8
ISSN0001-0782
DOIs
Publication statusPublished - Jul 2017

Fingerprint

Dive into the research topics of 'Fast and powerful hashing using tabulation'. Together they form a unique fingerprint.

Cite this