## Bayes’ Theorem or Price’s Theorem?

Posted in History, The Universe and Stuff with tags , on December 4, 2015 by telescoper

I’m indebted to a fellow blogger for drawing my attention to the person shown in the above picture, Dr Richard Price who has been described as “the most original thinker ever born in Wales”, and who has a Society named after him.

Price was a moral philosopher, nonconformist preacher and also a mathematician of some note. Of particular interest to this blog is the role he played in the development of what is now known as Bayes’ Theorem, after the Reverend Thomas Bayes.

However, the paper in the Philosophical Transactions of the Royal Society that contains the first published form of this theorem was not published until 1763, over a year after Bayes’ death and, as you can see if you follow the link, is attributed jointly to “Mr Bayes and Mr Price”.  It appears that there was an original manuscript written by Bayes around about 1755  which was communicated to Price when Bayes died in 1761 and then presented for publication over a year later; Price had been asked to act as “literary executor” of Bayes’ estate.

Unfortunately the original manuscript has never been found and it is therefore impossible to say for sure how much Price contributed to the final version. However, a relatively recent and very interesting article  raises this question, and argues (reasonably convincingly to my mind) that Bayes’ part stops at page 14 of 32 pages. It is therefore quite possible that Price wrote over half the paper himself although most historical discussions of this matter simple state that Price “edited” Bayes’ work.

It has to be said that the paper is not exactly a model of clarity and pertains only to a particular case of the full theorem. The form in general use today was first published by Laplace in 1812, so it should really be called Laplace’s Theorem, but Laplace did give generous credit to the work of Bayes which is no doubt why the name stuck.

I don’t suppose we will know for sure exactly how much Price contributed to the development of Bayes’ theorem, but this may be yet another example of the law that any result in science or mathematics that has a person’s name attached to it has the wrong name attached to it!

Finally, I will mention that the Richard Price Society has started a petition to the Welsh government. I’m taking the liberty of copying the purpose of this petition here:

We call on the Welsh government to acknowledge the important contribution of Dr Richard Price not only to the eighteenth century Enlightenment, but also to the making of the modern world that we live in today, and develop his birthplace and childhood home into a visitor information centre where people of all nationalities and ages can discover how his significant contributions to theology, mathematics and philosophy have shaped the modern world.

Tynton Farm in Llangeinor, the birthplace and childhood home of Dr Richard Price is for sale. Once derelict, the farm has been sensitively restored and almost all of the original features have been preserved. The Richard Price Society is aware that the house attracts visitors from all corners of the globe and this is attested by the previous owner’s Visitors Book that was signed by visitors to the farm. The position of the farm and its provenance would make it an ideal learning centre where people can find out just what an important person he was and remains. This is an opportunity to buy the property at market value and help celebrate the achievements of Wales’ intellectual giant and apostle of liberty.

I have signed it, and hope you will consider doing likewise!

## Bayes, Laplace and Bayes’ Theorem

Posted in Bad Statistics with tags , , , , , , , , on October 1, 2014 by telescoper

A  couple of interesting pieces have appeared which discuss Bayesian reasoning in the popular media. One is by Jon Butterworth in his Grauniad science blog and the other is a feature article in the New York Times. I’m in early today because I have an all-day Teaching and Learning Strategy Meeting so before I disappear for that I thought I’d post a quick bit of background.

One way to get to Bayes’ Theorem is by starting with

$P(A|C)P(B|AC)=P(B|C)P(A|BC)=P(AB|C)$

where I refer to three logical propositions A, B and C and the vertical bar “|” denotes conditioning, i.e. $P(A|B)$ means the probability of A being true given the assumed truth of B; “AB” means “A and B”, etc. This basically follows from the fact that “A and B” must always be equivalent to “B and A”.  Bayes’ theorem  then follows straightforwardly as

$P(B|AC) = K^{-1}P(B|C)P(A|BC) = K^{-1} P(AB|C)$

where

$K=P(A|C).$

Many versions of this, including the one in Jon Butterworth’s blog, exclude the third proposition and refer to A and B only. I prefer to keep an extra one in there to remind us that every statement about probability depends on information either known or assumed to be known; any proper statement of probability requires this information to be stated clearly and used appropriately but sadly this requirement is frequently ignored.

Although this is called Bayes’ theorem, the general form of it as stated here was actually first written down not by Bayes, but by Laplace. What Bayes did was derive the special case of this formula for “inverting” the binomial distribution. This distribution gives the probability of x successes in n independent “trials” each having the same probability of success, p; each “trial” has only two possible outcomes (“success” or “failure”). Trials like this are usually called Bernoulli trials, after Daniel Bernoulli. If we ask the question “what is the probability of exactly x successes from the possible n?”, the answer is given by the binomial distribution:

$P_n(x|n,p)= C(n,x) p^x (1-p)^{n-x}$

where

$C(n,x)= \frac{n!}{x!(n-x)!}$

is the number of distinct combinations of x objects that can be drawn from a pool of n.

You can probably see immediately how this arises. The probability of x consecutive successes is p multiplied by itself x times, or px. The probability of (n-x) successive failures is similarly (1-p)n-x. The last two terms basically therefore tell us the probability that we have exactly x successes (since there must be n-x failures). The combinatorial factor in front takes account of the fact that the ordering of successes and failures doesn’t matter.

The binomial distribution applies, for example, to repeated tosses of a coin, in which case p is taken to be 0.5 for a fair coin. A biased coin might have a different value of p, but as long as the tosses are independent the formula still applies. The binomial distribution also applies to problems involving drawing balls from urns: it works exactly if the balls are replaced in the urn after each draw, but it also applies approximately without replacement, as long as the number of draws is much smaller than the number of balls in the urn. I leave it as an exercise to calculate the expectation value of the binomial distribution, but the result is not surprising: E(X)=np. If you toss a fair coin ten times the expectation value for the number of heads is 10 times 0.5, which is five. No surprise there. After another bit of maths, the variance of the distribution can also be found. It is np(1-p).

So this gives us the probability of x given a fixed value of p. Bayes was interested in the inverse of this result, the probability of p given x. In other words, Bayes was interested in the answer to the question “If I perform n independent trials and get x successes, what is the probability distribution of p?”. This is a classic example of inverse reasoning, in that it involved turning something like P(A|BC) into something like P(B|AC), which is what is achieved by the theorem stated at the start of this post.

Bayes got the correct answer for his problem, eventually, but by very convoluted reasoning. In my opinion it is quite difficult to justify the name Bayes’ theorem based on what he actually did, although Laplace did specifically acknowledge this contribution when he derived the general result later, which is no doubt why the theorem is always named in Bayes’ honour.

This is not the only example in science where the wrong person’s name is attached to a result or discovery. Stigler’s Law of Eponymy strikes again!

So who was the mysterious mathematician behind this result? Thomas Bayes was born in 1702, son of Joshua Bayes, who was a Fellow of the Royal Society (FRS) and one of the very first nonconformist ministers to be ordained in England. Thomas was himself ordained and for a while worked with his father in the Presbyterian Meeting House in Leather Lane, near Holborn in London. In 1720 he was a minister in Tunbridge Wells, in Kent. He retired from the church in 1752 and died in 1761. Thomas Bayes didn’t publish a single paper on mathematics in his own name during his lifetime but was elected a Fellow of the Royal Society (FRS) in 1742.

The paper containing the theorem that now bears his name was published posthumously in the Philosophical Transactions of the Royal Society of London in 1763. In his great Philosophical Essay on Probabilities Laplace wrote:

Bayes, in the Transactions Philosophiques of the Year 1763, sought directly the probability that the possibilities indicated by past experiences are comprised within given limits; and he has arrived at this in a refined and very ingenious manner, although a little perplexing.

The reasoning in the 1763 paper is indeed perplexing, and I remain convinced that the general form we now we refer to as Bayes’ Theorem should really be called Laplace’s Theorem. Nevertheless, Bayes did establish an extremely important principle that is reflected in the title of the New York Times piece I referred to at the start of this piece. In a nutshell this is that probabilities of future events can be updated on the basis of past measurements or, as I prefer to put it, “one person’s posterior is another’s prior”.

## Bayes in the dock (again)

Posted in Bad Statistics with tags , , , , , on February 28, 2013 by telescoper

This morning on Twitter there appeared a link to a blog post reporting that the Court of Appeal had rejected the use of Bayesian probability in legal cases. I recommend anyone interested in probability to read it, as it gives a fascinating insight into how poorly the concept is understood.

Although this is a new report about a new case, it’s actually not an entirely new conclusion. I blogged about a similar case a couple of years ago, in fact. The earlier story n concerned an erroneous argument given during a trial about the significance of a match found between a footprint found at a crime scene and footwear belonging to a suspect.  The judge took exception to the fact that the figures being used were not known sufficiently accurately to make a reliable assessment, and thus decided that Bayes’ theorem shouldn’t be used in court unless the data involved in its application were “firm”.

If you read the Guardian article to which I’ve provided a link you will see that there’s a lot of reaction from the legal establishment and statisticians about this, focussing on the forensic use of probabilistic reasoning. This all reminds me of the tragedy of the Sally Clark case and what a disgrace it is that nothing has been done since then to improve the misrepresentation of statistical arguments in trials. Some of my Bayesian colleagues have expressed dismay at the judge’s opinion.

My reaction to this affair is more muted than you would probably expect. First thing to say is that this is really not an issue relating to the Bayesian versus frequentist debate at all. It’s about a straightforward application of Bayes’ theorem which, as its name suggests, is a theorem; actually it’s just a straightforward consequence of the sum and product laws of the calculus of probabilities. No-one, not even the most die-hard frequentist, would argue that Bayes’ theorem is false. What happened in this case is that an “expert” applied Bayes’ theorem to unreliable data and by so doing obtained misleading results. The  issue is not Bayes’ theorem per se, but the application of it to inaccurate data. Garbage in, garbage out. There’s no place for garbage in the courtroom, so in my opinion the judge was quite right to throw this particular argument out.

But while I’m on the subject of using Bayesian logic in the courts, let me add a few wider comments. First, I think that Bayesian reasoning provides a rigorous mathematical foundation for the process of assessing quantitatively the extent to which evidence supports a given theory or interpretation. As such it describes accurately how scientific investigations proceed by updating probabilities in the light of new data. It also describes how a criminal investigation works too.

What Bayesian inference is not good at is achieving closure in the form of a definite verdict. There are two sides to this. One is that the maxim “innocent until proven guilty” cannot be incorporated in Bayesian reasoning. If one assigns a zero prior probability of guilt then no amount of evidence will be able to change this into a non-zero posterior probability; the required burden is infinite. On the other hand, there is the problem that the jury must decide guilt in a criminal trial “beyond reasonable doubt”. But how much doubt is reasonable, exactly? And will a jury understand a probabilistic argument anyway?

In pure science we never really need to achieve this kind of closure, collapsing the broad range of probability into a simple “true” or “false”, because this is a process of continual investigation. It’s a reasonable inference, for example, based on Supernovae and other observations that the Universe is accelerating. But is it proven that this is so? I’d say “no”,  and don’t think my doubts are at all unreasonable…

So what I’d say is that while statistical arguments are extremely important for investigating crimes – narrowing down the field of suspects, assessing the reliability of evidence, establishing lines of inquiry, and so on – I don’t think they should ever play a central role once the case has been brought to court unless there’s much clearer guidance given to juries and stricter monitoring of so-called “expert” witnesses.

I’m sure various readers will wish to express diverse opinions on this case so, as usual, please feel free to contribute through the box below!

## Bayes’ Theorem and the Search for Supersymmetry

Posted in The Universe and Stuff with tags , , , , on May 13, 2012 by telescoper

Interesting comments about Bayes’ theorem and the prospects for detecting supersymmetry at the Large Hadron Collider. This piece explains how a non-detection isn’t always “absence of evidence” but can indeed by “evidence of absence”. It’s also worth reading the comments if you’re wondering whether what people say about Lubos Motl is actually true…

Here’s a puzzle. There are three cups upside down on a table. You friend tells you that a pea is hidden under one of them. Based on past experience you estimate that there is a 90% probability that this is true. You turn over two cups and don’t find the pea. What is the probability now that there is a pea underneath? You may want to think about this before reading on.

Naively you might think that two-thirds of the parameter space has been eliminated, so the probability has gone from 90% to 30%, but this is quite wrong. You can use Bayes Theorem to get the correct answer but let me give you a more intuitive frequentist answer. The situation can be models by imagining that there are thirty initial possibilities with equal probability. Nine of them have a pea under the first cup, nine more under the second and nine more under the third…

View original post 880 more words

## Bayes in the Dock

Posted in Bad Statistics with tags , , , , on October 6, 2011 by telescoper

A few days ago John Peacock sent me a link to an interesting story about the use of Bayes’ theorem in legal proceedings and I’ve been meaning to post about it but haven’t had the time. I get the distinct feeling that John, who is of the frequentist persuasion,  feels a certain amount of delight that the beastly Bayesians have got their comeuppance at last.

The story in question concerns an erroneous argument given during a trial about the significance of a match found between a footprint found at a crime scene and footwear belonging to a suspect.  The judge took exception to the fact that the figures being used were not known sufficiently accurately to make a reliable assessment, and thus decided that Bayes’ theorem shouldn’t be used in court unless the data involved in its application were “firm”.

If you read the Guardian article you will see that there’s a lot of reaction from the legal establishment and statisticians about this, focussing on the forensic use of probabilistic reasoning. This all reminds me of the tragedy of the Sally Clark case and what a disgrace it is that nothing has been done since then to improve the misrepresentation of statistical arguments in trials. Some of my Bayesian colleagues have expressed dismay at the judge’s opinion, which no doubt pleases Professor Peacock no end.

My reaction to this affair is more muted than you would probably expect. First thing to say is that this is really not an issue relating to the Bayesian versus frequentist debate at all. It’s about a straightforward application of Bayes’ theorem which, as its name suggests, is a theorem; actually it’s just a straightforward consequence of the sum and product laws of the calculus of probabilities. No-one, not even the most die-hard frequentist, would argue that Bayes’ theorem is false. What happened in this case is that an “expert” applied Bayes’ theorem to unreliable data and by so doing obtained misleading results. The  issue is not Bayes’ theorem per se, but the application of it to inaccurate data. Garbage in, garbage out. There’s no place for garbage in the courtroom, so in my opinion the judge was quite right to throw this particular argument out.

But while I’m on the subject of using Bayesian logic in the courts, let me add a few wider comments. First, I think that Bayesian reasoning provides a rigorous mathematical foundation for the process of assessing quantitatively the extent to which evidence supports a given theory or interpretation. As such it describes accurately how scientific investigations proceed by updating probabilities in the light of new data. It also describes how a criminal investigation works too.

What Bayesian inference is not good at is achieving closure in the form of a definite verdict. There are two sides to this. One is that the maxim “innocent until proven guilty” cannot be incorporated in Bayesian reasoning. If one assigns a zero prior probability of guilt then no amount of evidence will be able to change this into a non-zero posterior probability; the required burden is infinite. On the other hand, there is the problem that the jury must decide guilt in a criminal trial “beyond reasonable doubt”. But how much doubt is reasonable, exactly? And will a jury understand a probabilistic argument anyway?

In pure science we never really need to achieve this kind of closure, collapsing the broad range of probability into a simple “true” or “false”, because this is a process of continual investigation. It’s a reasonable inference, for example, based on Supernovae and other observations that the Universe is accelerating. But is it proven that this is so? I’d say “no”,  and don’t think my doubts are at all unreasonable…

So what I’d say is that while statistical arguments are extremely important for investigating crimes – narrowing down the field of suspects, assessing the reliability of evidence, establishing lines of inquiry, and so on – I don’t think they should ever play a central role once the case has been brought to court unless there’s much clearer guidance given to juries on how to use it and stricter monitoring of so-called “expert” witnesses.

I’m sure various readers will wish to express diverse opinions on this case so, as usual, please feel free to contribute through the box below!

## Bayes and his Theorem

Posted in Bad Statistics with tags , , , , , , on November 23, 2010 by telescoper

My earlier post on Bayesian probability seems to have generated quite a lot of readers, so this lunchtime I thought I’d add a little bit of background. The previous discussion started from the result

$P(B|AC) = K^{-1}P(B|C)P(A|BC) = K^{-1} P(AB|C)$

where

$K=P(A|C).$

Although this is called Bayes’ theorem, the general form of it as stated here was actually first written down, not by Bayes but by Laplace. What Bayes’ did was derive the special case of this formula for “inverting” the binomial distribution. This distribution gives the probability of x successes in n independent “trials” each having the same probability of success, p; each “trial” has only two possible outcomes (“success” or “failure”). Trials like this are usually called Bernoulli trials, after Daniel Bernoulli. If we ask the question “what is the probability of exactly x successes from the possible n?”, the answer is given by the binomial distribution:

$P_n(x|n,p)= C(n,x) p^x (1-p)^{n-x}$

where

$C(n,x)= n!/x!(n-x)!$

is the number of distinct combinations of x objects that can be drawn from a pool of n.

You can probably see immediately how this arises. The probability of x consecutive successes is p multiplied by itself x times, or px. The probability of (n-x) successive failures is similarly (1-p)n-x. The last two terms basically therefore tell us the probability that we have exactly x successes (since there must be n-x failures). The combinatorial factor in front takes account of the fact that the ordering of successes and failures doesn’t matter.

The binomial distribution applies, for example, to repeated tosses of a coin, in which case p is taken to be 0.5 for a fair coin. A biased coin might have a different value of p, but as long as the tosses are independent the formula still applies. The binomial distribution also applies to problems involving drawing balls from urns: it works exactly if the balls are replaced in the urn after each draw, but it also applies approximately without replacement, as long as the number of draws is much smaller than the number of balls in the urn. I leave it as an exercise to calculate the expectation value of the binomial distribution, but the result is not surprising: E(X)=np. If you toss a fair coin ten times the expectation value for the number of heads is 10 times 0.5, which is five. No surprise there. After another bit of maths, the variance of the distribution can also be found. It is np(1-p).

So this gives us the probability of x given a fixed value of p. Bayes was interested in the inverse of this result, the probability of p given x. In other words, Bayes was interested in the answer to the question “If I perform n independent trials and get x successes, what is the probability distribution of p?”. This is a classic example of inverse reasoning. He got the correct answer, eventually, but by very convoluted reasoning. In my opinion it is quite difficult to justify the name Bayes’ theorem based on what he actually did, although Laplace did specifically acknowledge this contribution when he derived the general result later, which is no doubt why the theorem is always named in Bayes’ honour.

This is not the only example in science where the wrong person’s name is attached to a result or discovery. In fact, it is almost a law of Nature that any theorem that has a name has the wrong name. I propose that this observation should henceforth be known as Coles’ Law.

So who was the mysterious mathematician behind this result? Thomas Bayes was born in 1702, son of Joshua Bayes, who was a Fellow of the Royal Society (FRS) and one of the very first nonconformist ministers to be ordained in England. Thomas was himself ordained and for a while worked with his father in the Presbyterian Meeting House in Leather Lane, near Holborn in London. In 1720 he was a minister in Tunbridge Wells, in Kent. He retired from the church in 1752 and died in 1761. Thomas Bayes didn’t publish a single paper on mathematics in his own name during his lifetime but despite this was elected a Fellow of the Royal Society (FRS) in 1742. Presumably he had Friends of the Right Sort. He did however write a paper on fluxions in 1736, which was published anonymously. This was probably the grounds on which he was elected an FRS.

The paper containing the theorem that now bears his name was published posthumously in the Philosophical Transactions of the Royal Society of London in 1764.

P.S. I understand that the authenticity of the picture is open to question. Whoever it actually is, he looks  to me a bit like Laurence Olivier…