Abstract
Simple tabulation dates back to Zobrist in 1970 who used it for game playing programs. Keys are viewed as consisting of c characters from some alphabet Φ. We initialize c tables h0, ... , hc-1 mapping characters to random hash values. A key x = (x0, ... , x c-1) is hashed to h0[x0]⊕...⊕hc-1[xc-1], where ⊕ denotes bit-wise exclusive-or. The scheme is extremely fast when the character hash tables hi are in cache. Simple tabulation hashing is not even 4-independent, but we show here that if we apply it twice, then we do get high independence. First we hash to some intermediate keys that are 6 times longer than the original keys, and then we hash the intermediate keys to the final hash values. The intermediate keys have d = 6c characters from Φ. We can then view the hash function as a highly unbalanced bipartite graph with keys on one side, each with edges to d output characters on the other side. We show that this graph has nice expansion properties, and from that it follows that if we perform another level of simple tabulation on the intermediate keys, then the composition is a highly independent hash function. More precisely, the independence we get is ΦΩ(1/c). In our O-notation, we view both Φ and c is going to infinity, but with c much smaller than Φ. Our space is O(cΦ) and the hash function is evaluated in O(c) time. Siegel [FOCS'89, SICOMP'04] has proved that with this space, if the hash function is evaluated in o(c) time, then the independence can only be o(c), so our evaluation time is best possible for Ω(c) independence-our independence is much higher if c = Φo(1/c). Siegel used O(c)c evaluation time to get the same independence with similar space. Siegel's main focus was c = O(1), but we are exponentially faster when c = ω(1). Applying our scheme recursively, we can increase our independence to ΦΩ(1) with o(clog c) evaluation time. Compared with Siegel's scheme this is both faster and higher independence. Siegel states about his scheme that it is "far too slow for any practical application". Our scheme is trivial to implement, and it does provide realistic implementations of 100-independent hashing for, say, 32-bit and 64-bit keys.
Original language | English |
---|---|
Title of host publication | 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS) |
Number of pages | 10 |
Publisher | IEEE |
Publication date | 2013 |
Pages | 90-99 |
ISBN (Electronic) | 978-0-7695-5135-7 |
DOIs | |
Publication status | Published - 2013 |
Event | 2003 IEEE 54th Annual Symposium on Foundations of Computer Science - Berkeley, United States Duration: 27 Oct 2013 → 29 Oct 2013 Conference number: 54 |
Conference
Conference | 2003 IEEE 54th Annual Symposium on Foundations of Computer Science |
---|---|
Number | 54 |
Country/Territory | United States |
City | Berkeley |
Period | 27/10/2013 → 29/10/2013 |