Bayes and his Theorem

My earlier post on Bayesian probability seems to have generated quite a lot of readers, so this lunchtime I thought I’d add a little bit of background. The previous discussion started from the result

P(B|AC) = K^{-1}P(B|C)P(A|BC) = K^{-1} P(AB|C)



Although this is called Bayes’ theorem, the general form of it as stated here was actually first written down, not by Bayes but by Laplace. What Bayes’ did was derive the special case of this formula for “inverting” the binomial distribution. This distribution gives the probability of x successes in n independent “trials” each having the same probability of success, p; each “trial” has only two possible outcomes (“success” or “failure”). Trials like this are usually called Bernoulli trials, after Daniel Bernoulli. If we ask the question “what is the probability of exactly x successes from the possible n?”, the answer is given by the binomial distribution:

P_n(x|n,p)= C(n,x) p^x (1-p)^{n-x}


C(n,x)= n!/x!(n-x)!

is the number of distinct combinations of x objects that can be drawn from a pool of n.

You can probably see immediately how this arises. The probability of x consecutive successes is p multiplied by itself x times, or px. The probability of (n-x) successive failures is similarly (1-p)n-x. The last two terms basically therefore tell us the probability that we have exactly x successes (since there must be n-x failures). The combinatorial factor in front takes account of the fact that the ordering of successes and failures doesn’t matter.

The binomial distribution applies, for example, to repeated tosses of a coin, in which case p is taken to be 0.5 for a fair coin. A biased coin might have a different value of p, but as long as the tosses are independent the formula still applies. The binomial distribution also applies to problems involving drawing balls from urns: it works exactly if the balls are replaced in the urn after each draw, but it also applies approximately without replacement, as long as the number of draws is much smaller than the number of balls in the urn. I leave it as an exercise to calculate the expectation value of the binomial distribution, but the result is not surprising: E(X)=np. If you toss a fair coin ten times the expectation value for the number of heads is 10 times 0.5, which is five. No surprise there. After another bit of maths, the variance of the distribution can also be found. It is np(1-p).

So this gives us the probability of x given a fixed value of p. Bayes was interested in the inverse of this result, the probability of p given x. In other words, Bayes was interested in the answer to the question “If I perform n independent trials and get x successes, what is the probability distribution of p?”. This is a classic example of inverse reasoning. He got the correct answer, eventually, but by very convoluted reasoning. In my opinion it is quite difficult to justify the name Bayes’ theorem based on what he actually did, although Laplace did specifically acknowledge this contribution when he derived the general result later, which is no doubt why the theorem is always named in Bayes’ honour.

This is not the only example in science where the wrong person’s name is attached to a result or discovery. In fact, it is almost a law of Nature that any theorem that has a name has the wrong name. I propose that this observation should henceforth be known as Coles’ Law.

So who was the mysterious mathematician behind this result? Thomas Bayes was born in 1702, son of Joshua Bayes, who was a Fellow of the Royal Society (FRS) and one of the very first nonconformist ministers to be ordained in England. Thomas was himself ordained and for a while worked with his father in the Presbyterian Meeting House in Leather Lane, near Holborn in London. In 1720 he was a minister in Tunbridge Wells, in Kent. He retired from the church in 1752 and died in 1761. Thomas Bayes didn’t publish a single paper on mathematics in his own name during his lifetime but despite this was elected a Fellow of the Royal Society (FRS) in 1742. Presumably he had Friends of the Right Sort. He did however write a paper on fluxions in 1736, which was published anonymously. This was probably the grounds on which he was elected an FRS.

The paper containing the theorem that now bears his name was published posthumously in the Philosophical Transactions of the Royal Society of London in 1764.

P.S. I understand that the authenticity of the picture is open to question. Whoever it actually is, he looks  to me a bit like Laurence Olivier…


8 Responses to “Bayes and his Theorem”

  1. Bryn Jones Says:

    The Royal Society is providing free access to electronic versions of its journals until the end of this month. Readers of this blog might like to look at Thomas Bayes’s two posthumous publications in the Philosophical Transactions.

    The first is a short paper about series. The other is the paper about statistics communicated by Richard Price. (The statistics paper may be accessible on a long-term basis because it is one of the Royal Society’s Trailblazing papers the society provides access to as part of its 350th anniversary celebrations.)

    Incidentally, both Thomas Bayes and Richard Price were buried in the Bunhill Fields Cemetery in London and their tombs can be seen there today.

  2. Steve Warren Says:

    You may be remembered in history as the discoverer of coleslaw, but you weren’t the first.

  3. telescoper Says:

    My surname, in Spanish, means “Cabbages”. So it was probably one of my ancestors who invented the chopped variety.

  4. Anton Garrett Says:

    Thomas Bayes is now known to have gone to Edinburgh University, where his name appears in the records. He was barred from English universities because his nonconformist family did not have him baptised in the Church of England. (Charles Darwin’s nonconformist family covered their bets by having baby Charles baptised in the CoE, although perhaps they believed it didn’t count as a baptism since Charles had no say in it. Tist is why he was able to go to Christ’s College, Cambridge.)

  5. telescoper Says:

    The noun “cole” can be found in English dictionaries as a generic name for plants of the cabbage family. It is related to the German kohl and scottish kail or kale. These are all derived from the latin word colis (or caulis) meaning a stem, which is also the root of the word cauliflower.

    The surname “Cole” and the variant “Coles” are fairly common in England and Wales, but are not related to the latin word for cabbage. Both are diminutives of the name “Nicholas”.

  6. […] I posted a little piece about Bayesian probability. That one and the others that followed it (here and here) proved to be surprisingly popular so I’ve been planning to add a few more posts […]

  7. It already has a popular name: Stigler’s law of eponymy.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: