Zipf's law unzipped

Seung Ki Baek; Petter Minnhagen; Per Johan Sebastian Bernhardsson

doi:10.1088/1367-2630/13/4/043004

Zipf's law unzipped

Seung Ki Baek, Petter Minnhagen, Per Johan Sebastian Bernhardsson

84 Citationer (Scopus)

Abstract

Why does Zipf s law give a good description of data from seemingly completely unrelated phenomena? Here it is argued that the reason is that they can all be described as outcomes of a ubiquitous random group division: the elements can be citizens of a country and the groups family names, or the elements can be all the words making up a novel and the groups the unique words, or the elements could be inhabitants and the groups the cities in a country and so on. A random group formation (RGF) is presented from which a Bayesian estimate is obtained based on minimal information: it provides the best prediction for the number of groups with k elements, given the total number of elements, groups and the number of elements in the largest group. For each specification of these three values, the RGF predicts a unique group distribution N(k) α exp(-bk)/k ^γ, where the power-law index γ is a unique function of the same three values. The universality of the result is made possible by the fact that no system-specific assumptions are made about the mechanism responsible for the group division. The direct relation between γ and the total number of elements, groups and the number of elements in the largest group is calculated. The predictive power of the RGF model is demonstrated by direct comparison with data from a variety of systems. It is shown that γ usually takes values in the interval 1 ≤ γ ≤ 2 and that the value for a given phenomenon depends in a systematic way on the total size of the dataset. The results are put in the context of earlier discussions on Zipf's and Gibrat's laws, N(k) ∞ k ^-2 and the connection between growth models and RGF is elucidated.

Originalsprog	Engelsk
Tidsskrift	New Journal of Physics
Vol/bind	13
Sider (fra-til)	043004
ISSN	1367-2630
DOI	https://doi.org/10.1088/1367-2630/13/4/043004
Status	Udgivet - 7 apr. 2011

Adgang til dokumentet

10.1088/1367-2630/13/4/043004

Citationsformater

@article{fe431e56b3384a1683d1f8aacc756051,

title = "Zipf's law unzipped",

abstract = "Why does Zipf s law give a good description of data from seemingly completely unrelated phenomena? Here it is argued that the reason is that they can all be described as outcomes of a ubiquitous random group division: the elements can be citizens of a country and the groups family names, or the elements can be all the words making up a novel and the groups the unique words, or the elements could be inhabitants and the groups the cities in a country and so on. A random group formation (RGF) is presented from which a Bayesian estimate is obtained based on minimal information: it provides the best prediction for the number of groups with k elements, given the total number of elements, groups and the number of elements in the largest group. For each specification of these three values, the RGF predicts a unique group distribution N(k) α exp(-bk)/k γ, where the power-law index γ is a unique function of the same three values. The universality of the result is made possible by the fact that no system-specific assumptions are made about the mechanism responsible for the group division. The direct relation between γ and the total number of elements, groups and the number of elements in the largest group is calculated. The predictive power of the RGF model is demonstrated by direct comparison with data from a variety of systems. It is shown that γ usually takes values in the interval 1 ≤ γ ≤ 2 and that the value for a given phenomenon depends in a systematic way on the total size of the dataset. The results are put in the context of earlier discussions on Zipf's and Gibrat's laws, N(k) ∞ k -2 and the connection between growth models and RGF is elucidated.",

author = "Baek, {Seung Ki} and Petter Minnhagen and Bernhardsson, {Per Johan Sebastian}",

year = "2011",

month = apr,

day = "7",

doi = "10.1088/1367-2630/13/4/043004",

language = "English",

volume = "13",

pages = "043004",

journal = "New Journal of Physics",

issn = "1367-2630",

publisher = "IOP Publishing",

}

TY - JOUR

T1 - Zipf's law unzipped

AU - Baek, Seung Ki

AU - Minnhagen, Petter

AU - Bernhardsson, Per Johan Sebastian

PY - 2011/4/7

Y1 - 2011/4/7

N2 - Why does Zipf s law give a good description of data from seemingly completely unrelated phenomena? Here it is argued that the reason is that they can all be described as outcomes of a ubiquitous random group division: the elements can be citizens of a country and the groups family names, or the elements can be all the words making up a novel and the groups the unique words, or the elements could be inhabitants and the groups the cities in a country and so on. A random group formation (RGF) is presented from which a Bayesian estimate is obtained based on minimal information: it provides the best prediction for the number of groups with k elements, given the total number of elements, groups and the number of elements in the largest group. For each specification of these three values, the RGF predicts a unique group distribution N(k) α exp(-bk)/k γ, where the power-law index γ is a unique function of the same three values. The universality of the result is made possible by the fact that no system-specific assumptions are made about the mechanism responsible for the group division. The direct relation between γ and the total number of elements, groups and the number of elements in the largest group is calculated. The predictive power of the RGF model is demonstrated by direct comparison with data from a variety of systems. It is shown that γ usually takes values in the interval 1 ≤ γ ≤ 2 and that the value for a given phenomenon depends in a systematic way on the total size of the dataset. The results are put in the context of earlier discussions on Zipf's and Gibrat's laws, N(k) ∞ k -2 and the connection between growth models and RGF is elucidated.

AB - Why does Zipf s law give a good description of data from seemingly completely unrelated phenomena? Here it is argued that the reason is that they can all be described as outcomes of a ubiquitous random group division: the elements can be citizens of a country and the groups family names, or the elements can be all the words making up a novel and the groups the unique words, or the elements could be inhabitants and the groups the cities in a country and so on. A random group formation (RGF) is presented from which a Bayesian estimate is obtained based on minimal information: it provides the best prediction for the number of groups with k elements, given the total number of elements, groups and the number of elements in the largest group. For each specification of these three values, the RGF predicts a unique group distribution N(k) α exp(-bk)/k γ, where the power-law index γ is a unique function of the same three values. The universality of the result is made possible by the fact that no system-specific assumptions are made about the mechanism responsible for the group division. The direct relation between γ and the total number of elements, groups and the number of elements in the largest group is calculated. The predictive power of the RGF model is demonstrated by direct comparison with data from a variety of systems. It is shown that γ usually takes values in the interval 1 ≤ γ ≤ 2 and that the value for a given phenomenon depends in a systematic way on the total size of the dataset. The results are put in the context of earlier discussions on Zipf's and Gibrat's laws, N(k) ∞ k -2 and the connection between growth models and RGF is elucidated.

U2 - 10.1088/1367-2630/13/4/043004

DO - 10.1088/1367-2630/13/4/043004

M3 - Journal article

SN - 1367-2630

VL - 13

SP - 043004

JO - New Journal of Physics

JF - New Journal of Physics

ER -

Zipf's law unzipped

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater