Non-empty Bins with Simple Tabulation Hashing

Abstract

We consider the hashing of a set X ⊆ U with |X| = m using a simple tabulation hash function h : U → [n] = {0,n − 1} and analyse the number of non-empty bins, that is, the size of h(X). We show that the expected size of h(X) matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The number of non-empty bins is a fundamental measure in the balls and bins paradigm, and it is critical in applications such as Bloom filters and Filter hashing. For example, normally Bloom filters are proportioned for a desired low false-positive probability assuming fully random hashing. Our results imply that if we implement the hashing with simple tabulation, we obtain the same low false-positive probability for any possible input.

Original languageEnglish
Title of host publicationProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms
EditorsTimothy M. Chan
PublisherSociety for Industrial and Applied Mathematics
Publication date2 Jan 2019
Pages2498-2512
ISBN (Electronic)978-1-61197-548-2
DOIs
Publication statusPublished - 2 Jan 2019
Event30th Annual ACM-SIAM Symposium on Discrete Algorithms
: SODA19
- San Diego, United States
Duration: 6 Jan 20199 Jan 2019

Conference

Conference30th Annual ACM-SIAM Symposium on Discrete Algorithms
Country/TerritoryUnited States
CitySan Diego
Period06/01/201909/01/2019

Fingerprint

Dive into the research topics of 'Non-empty Bins with Simple Tabulation Hashing'. Together they form a unique fingerprint.

Cite this