Simple tabulation, fast expanders, double tabulation, and high independence

20 Citations (Scopus)

Abstract

Simple tabulation dates back to Zobrist in 1970 who used it for game playing programs. Keys are viewed as consisting of c characters from some alphabet Φ. We initialize c tables h0, ... , hc-1 mapping characters to random hash values. A key x = (x0, ... , x c-1) is hashed to h0[x0]⊕...⊕hc-1[xc-1], where ⊕ denotes bit-wise exclusive-or. The scheme is extremely fast when the character hash tables hi are in cache. Simple tabulation hashing is not even 4-independent, but we show here that if we apply it twice, then we do get high independence. First we hash to some intermediate keys that are 6 times longer than the original keys, and then we hash the intermediate keys to the final hash values. The intermediate keys have d = 6c characters from Φ. We can then view the hash function as a highly unbalanced bipartite graph with keys on one side, each with edges to d output characters on the other side. We show that this graph has nice expansion properties, and from that it follows that if we perform another level of simple tabulation on the intermediate keys, then the composition is a highly independent hash function. More precisely, the independence we get is ΦΩ(1/c). In our O-notation, we view both Φ and c is going to infinity, but with c much smaller than Φ. Our space is O(cΦ) and the hash function is evaluated in O(c) time. Siegel [FOCS'89, SICOMP'04] has proved that with this space, if the hash function is evaluated in o(c) time, then the independence can only be o(c), so our evaluation time is best possible for Ω(c) independence-our independence is much higher if c = Φo(1/c). Siegel used O(c)c evaluation time to get the same independence with similar space. Siegel's main focus was c = O(1), but we are exponentially faster when c = ω(1). Applying our scheme recursively, we can increase our independence to ΦΩ(1) with o(clog c) evaluation time. Compared with Siegel's scheme this is both faster and higher independence. Siegel states about his scheme that it is "far too slow for any practical application". Our scheme is trivial to implement, and it does provide realistic implementations of 100-independent hashing for, say, 32-bit and 64-bit keys.

Original languageEnglish
Title of host publication2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS)
Number of pages10
PublisherIEEE
Publication date2013
Pages90-99
ISBN (Electronic)978-0-7695-5135-7
DOIs
Publication statusPublished - 2013
Event2003 IEEE 54th Annual Symposium on Foundations of Computer Science - Berkeley, United States
Duration: 27 Oct 201329 Oct 2013
Conference number: 54

Conference

Conference2003 IEEE 54th Annual Symposium on Foundations of Computer Science
Number54
Country/TerritoryUnited States
CityBerkeley
Period27/10/201329/10/2013

Fingerprint

Dive into the research topics of 'Simple tabulation, fast expanders, double tabulation, and high independence'. Together they form a unique fingerprint.

Cite this