Wednesday, August 13, 2008

I finally understand Benford's Law

Benford's Law states that the first digits in various lists of randomly distributed quantities -- the weights of rocks, the lengths of rivers, death rates, populations of settlements, or income tax returns -- are distributed nonuniformly. The first digit is 1 around 30% of the time, 2 about 18% of the time (and cumulatively 1, 2, or 3 about 60% of the time) -- and 9 less than 5%. The practical significance of this is that if you were to forge a list -- say on an income tax return -- you would do well to distribute the first digit according to Benford's law rather than evenly. (I was reminded of Benford's Law by the Chinese medal tally -- their methods are, as I said, transparent from the medal count alone.)

Why is Benford's Law true? Suppose not: suppose the first digits of some list, say the length of rivers in meters, were distributed evenly (which is intuitive). Now let's express them in feet (let's approximate 1 m = 3 ft). If the first digit in meters was four or five (or in some cases three or six) then the first digit in feet is 1. On the other hand, for the first digit in feet to be 9, the first two digits in meters must be between 30 and 34. If we started with even distributions of first and second digits in meters, it follows that, in feet, the lengths begin in 1 about 30% of the time, and in 9 about 3.3% of the time. This makes no sense at all because there's nothing special about meters -- the distribution of first digits ought to be the same no matter which units you count in.

Benford's distribution is a sort of fractal: it's the only distribution of first digits that is invariant under conversion from meters to feet, or pounds to kilograms, or people to families, or geese to gaggles. What's special about Benford's distribution is that all the digits of the logarithms are distributed evenly. Since log (ab) = log a + log b, and even distributions stay even whatever you add to them, it follows that a logarithmically even distribution (i.e. Benford's) stays logarithmically even upon rescaling.

It follows that we shouldn't expect Benford's Law to hold whenever we don't expect our descriptions to be scale-invariant: e.g. telephone numbers, last digits in random lists of integers, or any variable with a normal distribution, which implies a characteristic size -- e.g. heights.

No comments: