## Bayes, Laplace and Bayes’ Theorem

A couple of interesting pieces have appeared which discuss Bayesian reasoning in the popular media. One is by Jon Butterworth in his *Grauniad* science blog and the other is a feature article in the New York Times. I’m in early today because I have an all-day *Teaching and Learning Strategy Meeting* so before I disappear for that I thought I’d post a quick bit of background.

One way to get to Bayes’ Theorem is by starting with

where I refer to three logical propositions A, B and C and the vertical bar “|” denotes conditioning, i.e. means the probability of A being true given the assumed truth of B; “AB” means “A and B”, etc. This basically follows from the fact that “A and B” must always be equivalent to “B and A”. Bayes’ theorem then follows straightforwardly as

where

Many versions of this, including the one in Jon Butterworth’s blog, exclude the third proposition and refer to A and B only. I prefer to keep an extra one in there to remind us that every statement about probability depends on information either known or assumed to be known; any proper statement of probability requires this information to be stated clearly and used appropriately but sadly this requirement is frequently ignored.

Although this is called Bayes’ theorem, the general form of it as stated here was actually first written down not by Bayes, but by Laplace. What Bayes did was derive the special case of this formula for “inverting” the binomial distribution. This distribution gives the probability of *x* successes in *n* independent “trials” each having the same probability of success, *p*; each “trial” has only two possible outcomes (“success” or “failure”). Trials like this are usually called Bernoulli trials, after Daniel Bernoulli. If we ask the question “what is the probability of exactly *x* successes from the possible *n*?”, the answer is given by the binomial distribution:

where

is the number of distinct combinations of *x* objects that can be drawn from a pool of *n*.

You can probably see immediately how this arises. The probability of *x* consecutive successes is *p* multiplied by itself *x* times, or *p ^{x}*. The probability of

*(n-x)*successive failures is similarly

*(1-p)*. The last two terms basically therefore tell us the probability that we have exactly

^{n-x}*x*successes (since there must be

*n-x*failures). The combinatorial factor in front takes account of the fact that the ordering of successes and failures doesn’t matter.

The binomial distribution applies, for example, to repeated tosses of a coin, in which case *p* is taken to be 0.5 for a fair coin. A biased coin might have a different value of *p*, but as long as the tosses are independent the formula still applies. The binomial distribution also applies to problems involving drawing balls from urns: it works exactly if the balls are replaced in the urn after each draw, but it also applies approximately without replacement, as long as the number of draws is much smaller than the number of balls in the urn. I leave it as an exercise to calculate the expectation value of the binomial distribution, but the result is not surprising: *E(X)=np*. If you toss a fair coin ten times the expectation value for the number of heads is 10 times 0.5, which is five. No surprise there. After another bit of maths, the variance of the distribution can also be found. It is *np(1-p)*.

So this gives us the probability of *x* given a fixed value of *p*. Bayes was interested in the inverse of this result, the probability of *p* given *x*. In other words, Bayes was interested in the answer to the question “If I perform *n* independent trials and get *x* successes, what is the probability distribution of *p*?”. This is a classic example of inverse reasoning, in that it involved turning something like P(A|BC) into something like P(B|AC), which is what is achieved by the theorem stated at the start of this post.

Bayes got the correct answer for his problem, eventually, but by very convoluted reasoning. In my opinion it is quite difficult to justify the name Bayes’ theorem based on what he actually did, although Laplace did specifically acknowledge this contribution when he derived the general result later, which is no doubt why the theorem is always named in Bayes’ honour.

This is not the only example in science where the wrong person’s name is attached to a result or discovery. *Stigler’s Law of Eponymy* strikes again!

So who was the mysterious mathematician behind this result? Thomas Bayes was born in 1702, son of Joshua Bayes, who was a Fellow of the Royal Society (FRS) and one of the very first nonconformist ministers to be ordained in England. Thomas was himself ordained and for a while worked with his father in the Presbyterian Meeting House in Leather Lane, near Holborn in London. In 1720 he was a minister in Tunbridge Wells, in Kent. He retired from the church in 1752 and died in 1761. Thomas Bayes didn’t publish a single paper on mathematics in his own name during his lifetime but was elected a Fellow of the Royal Society (FRS) in 1742.

The paper containing the theorem that now bears his name was published posthumously in the Philosophical Transactions of the Royal Society of London in 1763. In his great *Philosophical Essay on Probabilities *Laplace wrote:

Bayes, in the

Transactions Philosophiquesof the Year 1763, sought directly the probability that the possibilities indicated by past experiences are comprised within given limits; and he has arrived at this in a refined and very ingenious manner, although a little perplexing.

The reasoning in the 1763 paper is indeed perplexing, and I remain convinced that the general form we now we refer to as Bayes’ Theorem should really be called Laplace’s Theorem. Nevertheless, Bayes did establish an extremely important principle that is reflected in the title of the New York Times piece I referred to at the start of this piece. In a nutshell this is that probabilities of future events can be updated on the basis of past measurements or, as I prefer to put it, “one person’s posterior is another’s prior”.

Follow @telescoper

October 2, 2014 at 12:19 am

There’s doubt as to whether Bayes actually wrote that article. Bellhouse (20004, Statistical Science, vol 19 p3) suggests that it was his friend Richard Price who wrote it – it’s a nice article. The theorem has also been attributed to the blind mathematician Nicolas Saunderson, who became the 4th Lucasian professor at Cambridge. (Newton had been the second Lucasian professor). However, as you remark, it was Laplace who gave it to us in the form we recognise today.

Fine blog!

October 2, 2014 at 11:23 am

Peter: it recently occurred to me that everyone in physics does Bayes an apostrophical mis-service:

(1) Bayes’s theorem: a theorem due to a guy called Bayes

(2) Bayes’ theorem: a theorem that is the collaborative work of a number of people, all of whom are called Baye

Everyone writes (2) when they mean (1). Not as bad as the grocery store banana’s, perhaps, but still wrong. Interestingly, Wikipedia gets it right.

October 2, 2014 at 5:14 pm

John,

There is no court of final authority in matters of language. The Academie Francaise tries it for French but it is (rightly) ignored. Language is one of the very few things that really does belong to “the people”.

October 2, 2014 at 6:40 pm

Actually I was aware that it should be Bayes’s Theorem but went with the Wikpedia version I linked to, which is Bayes’ theorem. Also to follow the correct path would ruin the joke about Coles’ Law.

In order to clarify the apostrophe situation I thought I’d add this useful style guide.

http://www.grammaruntied.com/?p=816

October 3, 2014 at 1:22 pm

Shouldn’t it at least try to match pronounciation? Should be therefore speak of “Bayzuz theorem” rather than “Bayz theorem”? (This sort of thing sounds even worse in theology with Jesus’s life…)

October 3, 2014 at 1:38 pm

I have felt for some time that we should abandon English and go back to using Latin. Then the genitive case would take care of everything.

October 2, 2014 at 5:38 pm

Yes, it clearly should be called Laplace’s theorem of probability.

Bayes’ theorem is actually one line on from the formula given above, when you write the posterior as the prior multiplied by the likelihood and then normalise it – the normalisation factor being the (reciprocal of) the ‘other’ probability in the 4-probability formula above, expressed via marginalisation.

Only in recent decades has it been found that Thomas Bayes was educated at Edinburgh University. Unless you had been baptised in the Church of England then you could not attend the English universities, so the sons of those nonconformists who could afford a university education for their children sent them north.

You say that Thomas Bayes was one of the first nonconformists to be ordained, but “nonconformist” simply means a believer outside the State-recognised church, which in England was Roman Catholic to the 1530s and became the Church of England thereafter thanks to Henry VIII’s ruthlessness. But in the 17th century various factions contended for the Church of England, and you could be a vicar in it one day and out of it the next (unless you took a vow conforming to the theology of the newly empowered high-ups). The song “The Vicar of Bray” is a gloriously cynical history lesson about a man who blew with the wind of every change and retained his position. John Bunyan didn’t, and he wrote Pilgrim’s Progress while in jail for unlicensed preaching in the 1660s after the Restoration.

Furthermore, ordination is an invention of the church rather than something found in the Bible; the New Testament is explicit that all Christians are priests by default, and although there must be leadership of congregations there was no ordained officer class.

There were nonconformists in England before the Reformation too (although they were slandered as heretics); look up “Lollards”.

Thomas Bayes is buried in Bunhill Fields, the nonconformist burial ground near Moorgate in central London. His grave was restored by the Royal Statistical Society at the behest of DV Lindley, a Bayesian, some decades ago.

October 2, 2014 at 6:49 pm

Anton,

I’ve edited the piece to show how Bayes’s Theorem actually follows from the expression I wrote, of course it follows quite straightforwardly once you realise that AB and BA are the same thing!

Peter