Abstract
We consider the hashing of a set X ⊆ U with |X| = m using a simple tabulation hash function h : U → [n] = {0,n − 1} and analyse the number of non-empty bins, that is, the size of h(X). We show that the expected size of h(X) matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The number of non-empty bins is a fundamental measure in the balls and bins paradigm, and it is critical in applications such as Bloom filters and Filter hashing. For example, normally Bloom filters are proportioned for a desired low false-positive probability assuming fully random hashing. Our results imply that if we implement the hashing with simple tabulation, we obtain the same low false-positive probability for any possible input.
Original language | English |
---|---|
Title of host publication | Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms |
Editors | Timothy M. Chan |
Publisher | Society for Industrial and Applied Mathematics |
Publication date | 2 Jan 2019 |
Pages | 2498-2512 |
ISBN (Electronic) | 978-1-61197-548-2 |
DOIs | |
Publication status | Published - 2 Jan 2019 |
Event | 30th Annual ACM-SIAM Symposium on Discrete Algorithms : SODA19 - San Diego, United States Duration: 6 Jan 2019 → 9 Jan 2019 |
Conference
Conference | 30th Annual ACM-SIAM Symposium on Discrete Algorithms |
---|---|
Country/Territory | United States |
City | San Diego |
Period | 06/01/2019 → 09/01/2019 |