500 items collected (not including preprints, though 45 papers written in non-English languages are included), up from 255 at the previous location at Rockefeller Univ (96% increase!)

Zipf’s law, named after the Harvard linguistic professor George Kingsley Zipf (1902-1950), is the observation that frequency of occurrence of some event ( P ), as a function of the rank ( i) when the rank is determined by the above frequency of occurrence, is a power-law function Pi ~ 1/ia with the exponent a close to unity (1).

comments on this bibliography:

  • Items written in English are arranged by years, and within each year, alphabetically by the first author’s name
  • Main items collected are papers, magazine or newspaper articles, and books that have “Zipf’s law” in their title or abstract
  • Due to the “second form” of the Zipf’s law, as well as relationship between Zipf’s law and other empirical laws, papers that do not explicitly contain the key word “Zipf’s law” are also included occasionally . These papers may be about “Pareto’s principle”, “Lotka’s law”, “Bradford’s law”, “Benford’s law”, “Heaps’ law”, etc, or may be about power-law in general.
  • In some fields where Zipf’s law plays an important role (quantitative linguistics, urban growth, internet, etc.), some papers that are more focused on specific knowledge in that area, rather than Zipf’s law, can also be included.
The year 2002 marked the 100 anniversary of the birth of George Zipf. To honor this occasion, the online web-journal, Glottometrics devoted several volumes to Zipf and Zipf’s law: vol 3 vol 4 vol 5 This page is created and maintained by Wentian Li of North Shore LIJ Institute for Medical Research. I would like to thank Gabriel Altmann, Rob Axtell, Claudio Cioffi-Revilla, Allen Downey, Xavier Gabaix, Jurek Kolasa, Vladimir Kuznetsov, Bill Reed, Jeff Robbins, Ronald Rousseau, Flemming Topsoe, Carlos Urzua for recommending papers and various help.

Some other laws

Benford’s law: On a wide variety of statistical data, the first digit is d with the probability log10 (1+1/d). This is also referred to as “the first-digit phenomenon.” The general significant-digit law is that the first significant digits ddd … d occur with the probability log10 ( 1 + 1/ddd … d ). This law was first published by Simon Newcomb in 1881. It went unnoticed until Frank Benford, apparently unaware of Newcomb’s paper, concluded the same law and published it in 1938, supported by huge amounts of data. [source: http://www.nist.gov/dads/HTML/benfordslaw.html]

Bradford’s law: Journals in a field can be divided into three parts, each with about one-third of all articles: (1) a core of a few journals; (2) a second zone, with more journals; and (3) a third zone, with the bulk of journals. The number of journals is 1:n:n. Bradford formulated his law after studying a bibliography of geophysics, covering 326 journals in the field. He discovered that 9 journals contained 429 articles, 59 contained 499 articles, and 258 contained 404 articles. Although Bradford’s Law is not statistically accurate, librarians commonly use it as a guideline. [source: http://www.nist.gov/dads/HTML/bradfordsLaw.html]

Heaps’ law : An empirical rule which describes the vocabulary growth as a function of the text size. It establishes that a text of n words has a vocabulary of size V= K nb where 0 < b < 1. [source: http://encyclopedia.thefreedictionary.com/Heaps’+law ]

Lotka’s law : The number of authors making n contributions is about 1/na of those making one contribution, where a is often nearly 2. [source: http://www.nist.gov/dads/HTML/lotkaslaw.html ]

Paretos principle : The cumulative distribution function (CDF) of incomes, i.e. the number of people whose income is more than x, is an inverse power of x: P[X > x] ~ x-k. Rule of thumb that 20% of a population earns 80% of its income. [source: http://beginnersinvest.about.com/cs/economics/g/paretoslaw.htm ] There were also proposals to name it “Juran principle” [source: http://www.juran.com/lower_2.cfm?article_id=21]


