Fast and powerful hashing using tabulation

Mikkel Thorup*

*Corresponding author af dette arbejde
5 Citationer (Scopus)

Abstract

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist (A new hashing method with application for game playing. Technical Report 88, Computer Sciences Department, University of Wisconsin). Keys are viewed as consisting of c characters and we have precomputed character tables h1, ⋯, hc mapping characters to random hash values. A key x = (x1, ⋯, xc) is hashed to h1[x1] ⊕ h2[x2]⋯ ⊕ hc[xc]. This schemes is very fast with character tables in cache. Although simple tabulation is not even four-independent, it does provide many of the guarantees that are normally obtained via higher independence, for example, linear probing and Cuckoo hashing. Next, we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoffstyle tail bounds and a very small bias for minwise hashing. This is also yields an extremely fast pseudorandom number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Wegman and Carter.26 In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. Although these tabulation schemes are all easy to implement and use, their analysis is not.

OriginalsprogEngelsk
TidsskriftCommunications of the ACM
Vol/bind60
Udgave nummer7
Sider (fra-til)94-101
Antal sider8
ISSN0001-0782
DOI
StatusUdgivet - jul. 2017

Fingeraftryk

Dyk ned i forskningsemnerne om 'Fast and powerful hashing using tabulation'. Sammen danner de et unikt fingeraftryk.

Citationsformater