Bayes, Laplace and Bayes’ Theorem

Posted in Bad Statistics with tags , , , , , , , , on October 1, 2014 by telescoper

A  couple of interesting pieces have appeared which discuss Bayesian reasoning in the popular media. One is by Jon Butterworth in his Grauniad science blog and the other is a feature article in the New York Times. I’m in early today because I have an all-day Teaching and Learning Strategy Meeting so before I disappear for that I thought I’d post a quick bit of background.

One way to get to Bayes’ Theorem is by starting with

$P(A|C)P(B|AC)=P(B|C)P(A|BC)=P(AB|C)$

where I refer to three logical propositions A, B and C and the vertical bar “|” denotes conditioning, i.e. $P(A|B)$ means the probability of A being true given the assumed truth of B; “AB” means “A and B”, etc. This basically follows from the fact that “A and B” must always be equivalent to “B and A”.  Bayes’ theorem  then follows straightforwardly as

$P(B|AC) = K^{-1}P(B|C)P(A|BC) = K^{-1} P(AB|C)$

where

$K=P(A|C).$

Many versions of this, including the one in Jon Butterworth’s blog, exclude the third proposition and refer to A and B only. I prefer to keep an extra one in there to remind us that every statement about probability depends on information either known or assumed to be known; any proper statement of probability requires this information to be stated clearly and used appropriately but sadly this requirement is frequently ignored.

Although this is called Bayes’ theorem, the general form of it as stated here was actually first written down not by Bayes, but by Laplace. What Bayes did was derive the special case of this formula for “inverting” the binomial distribution. This distribution gives the probability of x successes in n independent “trials” each having the same probability of success, p; each “trial” has only two possible outcomes (“success” or “failure”). Trials like this are usually called Bernoulli trials, after Daniel Bernoulli. If we ask the question “what is the probability of exactly x successes from the possible n?”, the answer is given by the binomial distribution:

$P_n(x|n,p)= C(n,x) p^x (1-p)^{n-x}$

where

$C(n,x)= \frac{n!}{x!(n-x)!}$

is the number of distinct combinations of x objects that can be drawn from a pool of n.

You can probably see immediately how this arises. The probability of x consecutive successes is p multiplied by itself x times, or px. The probability of (n-x) successive failures is similarly (1-p)n-x. The last two terms basically therefore tell us the probability that we have exactly x successes (since there must be n-x failures). The combinatorial factor in front takes account of the fact that the ordering of successes and failures doesn’t matter.

The binomial distribution applies, for example, to repeated tosses of a coin, in which case p is taken to be 0.5 for a fair coin. A biased coin might have a different value of p, but as long as the tosses are independent the formula still applies. The binomial distribution also applies to problems involving drawing balls from urns: it works exactly if the balls are replaced in the urn after each draw, but it also applies approximately without replacement, as long as the number of draws is much smaller than the number of balls in the urn. I leave it as an exercise to calculate the expectation value of the binomial distribution, but the result is not surprising: E(X)=np. If you toss a fair coin ten times the expectation value for the number of heads is 10 times 0.5, which is five. No surprise there. After another bit of maths, the variance of the distribution can also be found. It is np(1-p).

So this gives us the probability of x given a fixed value of p. Bayes was interested in the inverse of this result, the probability of p given x. In other words, Bayes was interested in the answer to the question “If I perform n independent trials and get x successes, what is the probability distribution of p?”. This is a classic example of inverse reasoning, in that it involved turning something like P(A|BC) into something like P(B|AC), which is what is achieved by the theorem stated at the start of this post.

Bayes got the correct answer for his problem, eventually, but by very convoluted reasoning. In my opinion it is quite difficult to justify the name Bayes’ theorem based on what he actually did, although Laplace did specifically acknowledge this contribution when he derived the general result later, which is no doubt why the theorem is always named in Bayes’ honour.

This is not the only example in science where the wrong person’s name is attached to a result or discovery. Stigler’s Law of Eponymy strikes again!

So who was the mysterious mathematician behind this result? Thomas Bayes was born in 1702, son of Joshua Bayes, who was a Fellow of the Royal Society (FRS) and one of the very first nonconformist ministers to be ordained in England. Thomas was himself ordained and for a while worked with his father in the Presbyterian Meeting House in Leather Lane, near Holborn in London. In 1720 he was a minister in Tunbridge Wells, in Kent. He retired from the church in 1752 and died in 1761. Thomas Bayes didn’t publish a single paper on mathematics in his own name during his lifetime but was elected a Fellow of the Royal Society (FRS) in 1742.

The paper containing the theorem that now bears his name was published posthumously in the Philosophical Transactions of the Royal Society of London in 1763. In his great Philosophical Essay on Probabilities Laplace wrote:

Bayes, in the Transactions Philosophiques of the Year 1763, sought directly the probability that the possibilities indicated by past experiences are comprised within given limits; and he has arrived at this in a refined and very ingenious manner, although a little perplexing.

The reasoning in the 1763 paper is indeed perplexing, and I remain convinced that the general form we now we refer to as Bayes’ Theorem should really be called Laplace’s Theorem. Nevertheless, Bayes did establish an extremely important principle that is reflected in the title of the New York Times piece I referred to at the start of this piece. In a nutshell this is that probabilities of future events can be updated on the basis of past measurements or, as I prefer to put it, “one person’s posterior is another’s prior”.

Hubble versus Slipher

Posted in History, The Universe and Stuff with tags , , , , , on September 15, 2012 by telescoper

Since I’m here at a conference celebrating the scientific achievements of Vesto M. Slipher, I thought I’d take the opportunity to make a few remarks about Slipher’s work and legacy.

I often use this picture in popular talks to illustrate the correlation between distance (x-axis) and apparent recession velocity (y-axis) that has become universally known as Hubble’s Law. This is an early version of such a plot published by Edwin Hubble in 1929.

In public talks I rarely have time to go into the details of this, but it is worth saying that only the results on the x-axis were Hubble’s own measurements. Hubble only contributed half of the above plot, i.e. the distance measurements, and these turned out to be wrong by a factor of about 10 owing to an incorrect identification of the stars used as standard candles. All the recession velocities on the y-axis – obtained by looking at the displacement of lines in the target galaxy’s spectrum – were in fact obtained by Vesto Slipher at the Lowell Observatory here in Flagstaff, Arizona. Hubble used these data from Slipher with permission, but gave no credit to Slipher in the references to his 1929 work. A later, and more convincing, version of this plot published in 1931 by Hubble and Humason, was accompanied by a generous acknowledgement to Slipher’s contribution. However, by then, Hubble’s name was firmly associated with the plot and Slipher’s contribution was largely forgotten for many years subsequently.

This episode isn’t at all atypical of Hubble’s behaviour. He was an extremely ambitious man who was an expert in the art of promoting himself and the Mount Wilson Observatory where he worked. Slipher was a very different type of man: quiet, self-effacing, and very much a team player, dedicated to scientific accuracy rather than his own reputation.

It’s worth saying further that the key observation that led to the understanding that the Universe is expanding is the fact that most of the spectra obtained by Slipher, over the years subsequent to his first measurement of the spectrum of the Andromeda Nebula (M31) celebrated by this conference, showed a redshift indicating velocity away from the observer. Even without distance measurements this leads directly an interpretation in terms of cosmic expansion. Ironically, the first spectrum he obtained, M31 shows a blue shift, as do a few others plotted with negative velocities in the above diagram, but the more distant sources exclusively show a redshift.

As a scientist should be, Slipher was very careful about the interpretation of this result. The more distant objects are fainter and thus more difficult to observe. Could it arise from some systematic artifact? Or could there be an unknown physical effect that produces a redshift dependent on the size of the source? These questions could only be answered when accurate distances to the nebulae were established, so Hubble’s contribution was by no means negligible. It’s completely untrue, however, to say that Hubble discovered the expansion of the Universe, so there’s yet another example of Stigler’s Law of Eponymy whenever anyone talks about the Hubble expansion.

One of the great things about coming to this meeting was the chance to meet Alan Slipher, grandson of Vesto Slipher. He and other members of his family refer to Vesto as “VM”, by the way, which I hadn’t realised before. VM lived a long life, dying in 1969 just short of his 94th birthday, so Alan knew him well until age 17 or so. He spoke most warmly and movingly after yesterday’s conference dinner about his memories of his grandfather, who he clearly looked up to. His words confirmed the impression I’d already formed, that Slipher was an extremely cautious and serious scientist as well as a kindly and humble man.

The contrasting personalities of Slipher and Hubble are further illustrated by correspondence between the two that is archived at the Lowell Observatory. Slipher comes across as kindly and cooperative, Hubble as pompous and self-regarding. I know which of the two I admire the best, both and scientist and human being.