## The First Digit Phenomenon

Posted in Bad Statistics, The Universe and Stuff with tags , , on March 11, 2009 by telescoper

I thought it would be fun to put up this quirky example of how sometimes things that really ought to be random turn out not to be. It’s also an excuse to mention a strange connection between astronomy and statistics.

The astronomer Simon Newcomb (right) was born in 1835 in Nova Scotia (Canada). He had no real formal education at all, but since there wasn’t much else to do in Nova Scotia, he taught himself mathematics and astronomy and became very adept at performing astronomical calculations with great diligence. He began work in a lowly position at the US Nautical Almanac Office in 1857, and by 1877 he was director. He became was professor of Mathematics and Astronomy and Johns Hopkins University from 1884 until 1893 and was made the first ever president of the American Astronomical Society in 1899; he died in 1909.

Newcomb was performing lengthy numerical calculations in an era long before the invention of the pocket calculator or desktop computer. In those days many such calculations, including virtually anything involving multiplication, had to be done using logarithms. The logarithm (to the base ten) of a number x is defined to be the number a such that x=10a. To multiply two numbers whose logarithms are a and b respectively involves simply adding the logarithms: 10a times 10b=10(a+b), which helps a lot because adding is a lot easier than multiplying if you have no calculator. The initial logarithms are simply looked up in a table; to find the answer you use different tables to find the “inverse” logarithm.

Newcomb was a heavy user of his book of mathematical tables for this type of calculation, and it became very grubby and worn. But he also noticed that the first pages of the logarithms seemed to have been used much more than the others. This puzzled him greatly. Logarithm tables are presented in order of the first digit of the number required: the first pages therefore contain logarithms for numbers beginning with the digit 1. Newcomb used the tables for a vast range of different calculations of different things. He expected the first digits of numbers that he had to look up to just be as likely to be anything. Shouldn’t they be randomly distributed? Shouldn’t all the pages be equally used?

Once raised, this puzzle faded away until it was re-discovered in 1938 and acquired the name of Benford’s law, or the first digit phenomenon. In virtually any list you can think of – street addresses, city populations, lengths of rivers, and so on – there are more entries beginning with the digit “1” than any other digit.

To give another example, although I admit this one is much harder to explain, in the American Physical Society’s list of fundamental constants, or at least the last version I happened to look at, no less than 40% begin with the digit 1. If you’ve been writing physics examination papers recently like I have, you will notice a similar behaviour. Out of the 16 physical constants listed in the rubric of a physics examination paper lying on my desk right now, 6 begin with the digit 1.

So what is going on?

There is a (relatively) simple answer, and a more complicated one. I’ll take the simple one first.