My earlier post on Bayesian probability seems to have generated quite a lot of readers, so this lunchtime I thought I’d add a little bit of background. The previous discussion started from the result

where

Although this is called Bayes’ theorem, the general form of it as stated here was actually first written down, not by Bayes but by Laplace. What Bayes’ did was derive the special case of this formula for “inverting” the binomial distribution. This distribution gives the probability of *x* successes in *n* independent “trials” each having the same probability of success, *p*; each “trial” has only two possible outcomes (“success” or “failure”). Trials like this are usually called Bernoulli trials, after Daniel Bernoulli. If we ask the question “what is the probability of exactly *x* successes from the possible *n*?”, the answer is given by the binomial distribution:

where

is the number of distinct combinations of *x* objects that can be drawn from a pool of *n*.

You can probably see immediately how this arises. The probability of *x* consecutive successes is *p* multiplied by itself *x* times, or *p ^{x}*. The probability of

*(n-x)*successive failures is similarly

*(1-p)*. The last two terms basically therefore tell us the probability that we have exactly

^{n-x}*x*successes (since there must be

*n-x*failures). The combinatorial factor in front takes account of the fact that the ordering of successes and failures doesn’t matter.

The binomial distribution applies, for example, to repeated tosses of a coin, in which case *p* is taken to be 0.5 for a fair coin. A biased coin might have a different value of *p*, but as long as the tosses are independent the formula still applies. The binomial distribution also applies to problems involving drawing balls from urns: it works exactly if the balls are replaced in the urn after each draw, but it also applies approximately without replacement, as long as the number of draws is much smaller than the number of balls in the urn. I leave it as an exercise to calculate the expectation value of the binomial distribution, but the result is not surprising: *E(X)=np*. If you toss a fair coin ten times the expectation value for the number of heads is 10 times 0.5, which is five. No surprise there. After another bit of maths, the variance of the distribution can also be found. It is *np(1-p)*.

So this gives us the probability of *x* given a fixed value of *p*. Bayes was interested in the inverse of this result, the probability of *p* given *x*. In other words, Bayes was interested in the answer to the question “If I perform *n* independent trials and get *x* successes, what is the probability distribution of *p*?”. This is a classic example of inverse reasoning. He got the correct answer, eventually, but by very convoluted reasoning. In my opinion it is quite difficult to justify the name Bayes’ theorem based on what he actually did, although Laplace did specifically acknowledge this contribution when he derived the general result later, which is no doubt why the theorem is always named in Bayes’ honour.

This is not the only example in science where the wrong person’s name is attached to a result or discovery. In fact, it is almost a law of Nature that any theorem that has a name has the wrong name. I propose that this observation should henceforth be known as Coles’ Law.

So who was the mysterious mathematician behind this result? Thomas Bayes was born in 1702, son of Joshua Bayes, who was a Fellow of the Royal Society (FRS) and one of the very first nonconformist ministers to be ordained in England. Thomas was himself ordained and for a while worked with his father in the Presbyterian Meeting House in Leather Lane, near Holborn in London. In 1720 he was a minister in Tunbridge Wells, in Kent. He retired from the church in 1752 and died in 1761. Thomas Bayes didn’t publish a single paper on mathematics in his own name during his lifetime but despite this was elected a Fellow of the Royal Society (FRS) in 1742. Presumably he had Friends of the Right Sort. He did however write a paper on fluxions in 1736, which was published anonymously. This was probably the grounds on which he was elected an FRS.

The paper containing the theorem that now bears his name was published posthumously in the Philosophical Transactions of the Royal Society of London in 1764.

P.S. I understand that the authenticity of the picture is open to question. Whoever it actually is, he looks to me a bit like Laurence Olivier…