Non-empty Bins with Simple Tabulation Hashing

Anders Aamand; Mikkel Thorup

doi:10.1137/1.9781611975482.153

Non-empty Bins with Simple Tabulation Hashing

Department of Computer Science

Abstract

We consider the hashing of a set X ⊆ U with |X| = m using a simple tabulation hash function h : U → [n] = {0,n − 1} and analyse the number of non-empty bins, that is, the size of h(X). We show that the expected size of h(X) matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The number of non-empty bins is a fundamental measure in the balls and bins paradigm, and it is critical in applications such as Bloom filters and Filter hashing. For example, normally Bloom filters are proportioned for a desired low false-positive probability assuming fully random hashing. Our results imply that if we implement the hashing with simple tabulation, we obtain the same low false-positive probability for any possible input.

Original language	English
Title of host publication	Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms
Editors	Timothy M. Chan
Publisher	Society for Industrial and Applied Mathematics
Publication date	2 Jan 2019
Pages	2498-2512
ISBN (Electronic)	978-1-61197-548-2
DOIs	https://doi.org/10.1137/1.9781611975482.153
Publication status	Published - 2 Jan 2019
Event	30th Annual ACM-SIAM Symposium on Discrete Algorithms : SODA19 - San Diego, United States Duration: 6 Jan 2019 → 9 Jan 2019

Conference

Conference	30th Annual ACM-SIAM Symposium on Discrete Algorithms
Country/Territory	United States
City	San Diego
Period	06/01/2019 → 09/01/2019

Access to Document

10.1137/1.9781611975482.153

1.9781611975482.153Final published version, 657 KB

https://epubs.siam.org/doi/10.1137/1.9781611975482.153

Cite this

Aamand, A & Thorup, M 2019, Non-empty Bins with Simple Tabulation Hashing. in TM Chan (ed.), Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, pp. 2498-2512, 30th Annual ACM-SIAM Symposium on Discrete Algorithms
, San Diego, United States, 06/01/2019. https://doi.org/10.1137/1.9781611975482.153

@inproceedings{ad5e74abfdb841f6ac9e72f20eaf6bca,

title = "Non-empty Bins with Simple Tabulation Hashing",

abstract = "We consider the hashing of a set X ⊆ U with |X| = m using a simple tabulation hash function h : U → [n] = {0,n − 1} and analyse the number of non-empty bins, that is, the size of h(X). We show that the expected size of h(X) matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The number of non-empty bins is a fundamental measure in the balls and bins paradigm, and it is critical in applications such as Bloom filters and Filter hashing. For example, normally Bloom filters are proportioned for a desired low false-positive probability assuming fully random hashing. Our results imply that if we implement the hashing with simple tabulation, we obtain the same low false-positive probability for any possible input.",

author = "Anders Aamand and Mikkel Thorup",

year = "2019",

month = jan,

day = "2",

doi = "10.1137/1.9781611975482.153",

language = "English",

pages = "2498--2512",

editor = "Chan, {Timothy M.}",

booktitle = "Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms",

publisher = "Society for Industrial and Applied Mathematics",

address = "United States",

note = "30th Annual ACM-SIAM Symposium on Discrete Algorithms<br/> : SODA19 ; Conference date: 06-01-2019 Through 09-01-2019",

}

TY - GEN

T1 - Non-empty Bins with Simple Tabulation Hashing

AU - Aamand, Anders

AU - Thorup, Mikkel

PY - 2019/1/2

Y1 - 2019/1/2

N2 - We consider the hashing of a set X ⊆ U with |X| = m using a simple tabulation hash function h : U → [n] = {0,n − 1} and analyse the number of non-empty bins, that is, the size of h(X). We show that the expected size of h(X) matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The number of non-empty bins is a fundamental measure in the balls and bins paradigm, and it is critical in applications such as Bloom filters and Filter hashing. For example, normally Bloom filters are proportioned for a desired low false-positive probability assuming fully random hashing. Our results imply that if we implement the hashing with simple tabulation, we obtain the same low false-positive probability for any possible input.

AB - We consider the hashing of a set X ⊆ U with |X| = m using a simple tabulation hash function h : U → [n] = {0,n − 1} and analyse the number of non-empty bins, that is, the size of h(X). We show that the expected size of h(X) matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The number of non-empty bins is a fundamental measure in the balls and bins paradigm, and it is critical in applications such as Bloom filters and Filter hashing. For example, normally Bloom filters are proportioned for a desired low false-positive probability assuming fully random hashing. Our results imply that if we implement the hashing with simple tabulation, we obtain the same low false-positive probability for any possible input.

U2 - 10.1137/1.9781611975482.153

DO - 10.1137/1.9781611975482.153

M3 - Article in proceedings

SP - 2498

EP - 2512

BT - Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms

A2 - Chan, Timothy M.

PB - Society for Industrial and Applied Mathematics

T2 - 30th Annual ACM-SIAM Symposium on Discrete Algorithms<br/>

Y2 - 6 January 2019 through 9 January 2019

ER -

Non-empty Bins with Simple Tabulation Hashing

Abstract

Conference

Access to Document

Fingerprint

Cite this