Bayes, Laplace and Bayes’ Theorem

A  couple of interesting pieces have appeared which discuss Bayesian reasoning in the popular media. One is by Jon Butterworth in his Grauniad science blog and the other is a feature article in the New York Times. I’m in early today because I have an all-day Teaching and Learning Strategy Meeting so before I disappear for that I thought I’d post a quick bit of background.

One way to get to Bayes’ Theorem is by starting with

P(A|C)P(B|AC)=P(B|C)P(A|BC)=P(AB|C)

where I refer to three logical propositions A, B and C and the vertical bar “|” denotes conditioning, i.e. P(A|B) means the probability of A being true given the assumed truth of B; “AB” means “A and B”, etc. This basically follows from the fact that “A and B” must always be equivalent to “B and A”.  Bayes’ theorem  then follows straightforwardly as

P(B|AC) = K^{-1}P(B|C)P(A|BC) = K^{-1} P(AB|C)

where

K=P(A|C).

Many versions of this, including the one in Jon Butterworth’s blog, exclude the third proposition and refer to A and B only. I prefer to keep an extra one in there to remind us that every statement about probability depends on information either known or assumed to be known; any proper statement of probability requires this information to be stated clearly and used appropriately but sadly this requirement is frequently ignored.

Although this is called Bayes’ theorem, the general form of it as stated here was actually first written down not by Bayes, but by Laplace. What Bayes did was derive the special case of this formula for “inverting” the binomial distribution. This distribution gives the probability of x successes in n independent “trials” each having the same probability of success, p; each “trial” has only two possible outcomes (“success” or “failure”). Trials like this are usually called Bernoulli trials, after Daniel Bernoulli. If we ask the question “what is the probability of exactly x successes from the possible n?”, the answer is given by the binomial distribution:

P_n(x|n,p)= C(n,x) p^x (1-p)^{n-x}

where

C(n,x)= \frac{n!}{x!(n-x)!}

is the number of distinct combinations of x objects that can be drawn from a pool of n.

You can probably see immediately how this arises. The probability of x consecutive successes is p multiplied by itself x times, or px. The probability of (n-x) successive failures is similarly (1-p)n-x. The last two terms basically therefore tell us the probability that we have exactly x successes (since there must be n-x failures). The combinatorial factor in front takes account of the fact that the ordering of successes and failures doesn’t matter.

The binomial distribution applies, for example, to repeated tosses of a coin, in which case p is taken to be 0.5 for a fair coin. A biased coin might have a different value of p, but as long as the tosses are independent the formula still applies. The binomial distribution also applies to problems involving drawing balls from urns: it works exactly if the balls are replaced in the urn after each draw, but it also applies approximately without replacement, as long as the number of draws is much smaller than the number of balls in the urn. I leave it as an exercise to calculate the expectation value of the binomial distribution, but the result is not surprising: E(X)=np. If you toss a fair coin ten times the expectation value for the number of heads is 10 times 0.5, which is five. No surprise there. After another bit of maths, the variance of the distribution can also be found. It is np(1-p).

So this gives us the probability of x given a fixed value of p. Bayes was interested in the inverse of this result, the probability of p given x. In other words, Bayes was interested in the answer to the question “If I perform n independent trials and get x successes, what is the probability distribution of p?”. This is a classic example of inverse reasoning, in that it involved turning something like P(A|BC) into something like P(B|AC), which is what is achieved by the theorem stated at the start of this post.

Bayes got the correct answer for his problem, eventually, but by very convoluted reasoning. In my opinion it is quite difficult to justify the name Bayes’ theorem based on what he actually did, although Laplace did specifically acknowledge this contribution when he derived the general result later, which is no doubt why the theorem is always named in Bayes’ honour.

 

This is not the only example in science where the wrong person’s name is attached to a result or discovery. Stigler’s Law of Eponymy strikes again!

So who was the mysterious mathematician behind this result? Thomas Bayes was born in 1702, son of Joshua Bayes, who was a Fellow of the Royal Society (FRS) and one of the very first nonconformist ministers to be ordained in England. Thomas was himself ordained and for a while worked with his father in the Presbyterian Meeting House in Leather Lane, near Holborn in London. In 1720 he was a minister in Tunbridge Wells, in Kent. He retired from the church in 1752 and died in 1761. Thomas Bayes didn’t publish a single paper on mathematics in his own name during his lifetime but was elected a Fellow of the Royal Society (FRS) in 1742.

The paper containing the theorem that now bears his name was published posthumously in the Philosophical Transactions of the Royal Society of London in 1763. In his great Philosophical Essay on Probabilities Laplace wrote:

Bayes, in the Transactions Philosophiques of the Year 1763, sought directly the probability that the possibilities indicated by past experiences are comprised within given limits; and he has arrived at this in a refined and very ingenious manner, although a little perplexing.

The reasoning in the 1763 paper is indeed perplexing, and I remain convinced that the general form we now we refer to as Bayes’ Theorem should really be called Laplace’s Theorem. Nevertheless, Bayes did establish an extremely important principle that is reflected in the title of the New York Times piece I referred to at the start of this piece. In a nutshell this is that probabilities of future events can be updated on the basis of past measurements or, as I prefer to put it, “one person’s posterior is another’s prior”.

 

 

 

12 Responses to “Bayes, Laplace and Bayes’ Theorem”

  1. Bernard Jones Says:

    There’s doubt as to whether Bayes actually wrote that article. Bellhouse (20004, Statistical Science, vol 19 p3) suggests that it was his friend Richard Price who wrote it – it’s a nice article. The theorem has also been attributed to the blind mathematician Nicolas Saunderson, who became the 4th Lucasian professor at Cambridge. (Newton had been the second Lucasian professor). However, as you remark, it was Laplace who gave it to us in the form we recognise today.
    Fine blog!

  2. John Peacock Says:

    Peter: it recently occurred to me that everyone in physics does Bayes an apostrophical mis-service:

    (1) Bayes’s theorem: a theorem due to a guy called Bayes

    (2) Bayes’ theorem: a theorem that is the collaborative work of a number of people, all of whom are called Baye

    Everyone writes (2) when they mean (1). Not as bad as the grocery store banana’s, perhaps, but still wrong. Interestingly, Wikipedia gets it right.

    • One of my “favourite” gripes. People, in English, the singular possessive is always “‘s”, whether or not the basic form ends in “s”. Thus: James’s cycle, Thomas’s book etc.

      The plural possessive is also “s” if the plural is not formed by adding “s”, as in “children’s”. Otherwise, it is “‘”, e.g. “astronomers’ telescopes”.

      There are confusing similarities in other languages, of which more below, but native English speakers have no excuse. 🙂

      In German, the singular possessive is formed by adding “s” with no apostrophe, e.g. Wolfgangs Buch, unless the basic form ends in an “s”, in which the possessive is formed with an apostrophe, e.g. Andreas’ Buch.

      In Dutch, with a few exceptions for which there is a reason, the plural or words ending in a vowel is formed by adding “‘s”, e.g. “auto’s”. This has nothing to do with the possessive, of course, though it does superficially look like the grocers’ banana’s. Rather, Dutch vowels are long at the end of a word or if followed by another syllable, unless the following consonant is doubled. Thus, “autos” would be pronounced with a short “o”. The usual rule is to double the consonant in such cases, e.g. “autoos”. The apostrophe denotes the missing “o”.

      We now return to the regular programme on Peter Coles’s blog.

    • Anton Garrett Says:

      John,

      There is no court of final authority in matters of language. The Academie Francaise tries it for French but it is (rightly) ignored. Language is one of the very few things that really does belong to “the people”.

      • In general, I agree. If no-one ever did anything “wrong” then we would all still be speaking the original language. Languages develop by changing, and essentially all changes were at some time wrong usage.

        On the other hand, this does not imply that one should adopt an “anything goes” philosophy. Only a small fraction of “wrong” usage has become accepted and led to change in the language, presumably because the rest didn’t catch on, perhaps because people thought it made too little sense.

        The purpose of language is communication, and this is aided by clarity. If a wrong usage leads to less clarity, it should be avoided.

    • telescoper Says:

      Actually I was aware that it should be Bayes’s Theorem but went with the Wikpedia version I linked to, which is Bayes’ theorem. Also to follow the correct path would ruin the joke about Coles’ Law.

      In order to clarify the apostrophe situation I thought I’d add this useful style guide.

      http://www.grammaruntied.com/?p=816

      • Anton Garrett Says:

        Shouldn’t it at least try to match pronounciation? Should be therefore speak of “Bayzuz theorem” rather than “Bayz theorem”? (This sort of thing sounds even worse in theology with Jesus’s life…)

      • telescoper Says:

        I have felt for some time that we should abandon English and go back to using Latin. Then the genitive case would take care of everything.

      • I see that the style guide also has trouble with “smart” quotes.

        The AP rule about adding an apostrophe only if the word following the possessive starts with an “s”, e.g. “the boss’ sister”, is, literally, one of the stupidest things I have read in my entire life. The AP rule about proper nouns ending in “s” is almost as stupid, as are the Chicago exceptions.

  3. Anton Garrett Says:

    Yes, it clearly should be called Laplace’s theorem of probability.

    Bayes’ theorem is actually one line on from the formula given above, when you write the posterior as the prior multiplied by the likelihood and then normalise it – the normalisation factor being the (reciprocal of) the ‘other’ probability in the 4-probability formula above, expressed via marginalisation.

    Only in recent decades has it been found that Thomas Bayes was educated at Edinburgh University. Unless you had been baptised in the Church of England then you could not attend the English universities, so the sons of those nonconformists who could afford a university education for their children sent them north.

    You say that Thomas Bayes was one of the first nonconformists to be ordained, but “nonconformist” simply means a believer outside the State-recognised church, which in England was Roman Catholic to the 1530s and became the Church of England thereafter thanks to Henry VIII’s ruthlessness. But in the 17th century various factions contended for the Church of England, and you could be a vicar in it one day and out of it the next (unless you took a vow conforming to the theology of the newly empowered high-ups). The song “The Vicar of Bray” is a gloriously cynical history lesson about a man who blew with the wind of every change and retained his position. John Bunyan didn’t, and he wrote Pilgrim’s Progress while in jail for unlicensed preaching in the 1660s after the Restoration.

    Furthermore, ordination is an invention of the church rather than something found in the Bible; the New Testament is explicit that all Christians are priests by default, and although there must be leadership of congregations there was no ordained officer class.

    There were nonconformists in England before the Reformation too (although they were slandered as heretics); look up “Lollards”.

    Thomas Bayes is buried in Bunhill Fields, the nonconformist burial ground near Moorgate in central London. His grave was restored by the Royal Statistical Society at the behest of DV Lindley, a Bayesian, some decades ago.

    • telescoper Says:

      Anton,

      I’ve edited the piece to show how Bayes’s Theorem actually follows from the expression I wrote, of course it follows quite straightforwardly once you realise that AB and BA are the same thing!

      Peter

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: