## Bunn on Bayes

Just a quickie to advertise a nice blog post by Ted Bunn in which he takes down an article in Science by Bradley Efron, which is about frequentist statistics. I’ll leave it to you to read his piece, and the offending article, but couldn’t resist nicking his little graphic that sums up the matter for me:

The point is that as scientists we are interested in the probability of a model (or hypothesis) *given the evidence* (or data) arising from an experiment (or observation). This requires inverse, or inductive, reasoning and it is therefore explicitly Bayesian. Frequentists focus on a different question, about the probability of the data *given the model*, which is not the same thing at all, and is not what scientists actually need. There are examples in which a frequentist method accidentally gives the correct (i.e. Bayesian) answer, but they are nevertheless still answering the wrong question.

I will make one further comment arising from the following excerpt from the Efron piece.

Bayes’ 1763 paper was an impeccable exercise in probability theory. The trouble and the subsequent busts came from overenthusiastic application of the theorem in the absence of genuine prior information, with Pierre-Simon Laplace as a prime violator.

I think this is completely wrong. There is *always* prior information, even if it is minimal, but the point is that frequentist methods always ignore it even if it is “genuine” (whatever that means). It’s not always easy to encode this information in a properly defined prior probability of course, but at least a Bayesian will not deliberately answer the wrong question in order to avoid thinking about it.

It is ironic that the pioneers of probability theory, such as Laplace, adopted a Bayesian rather than frequentist interpretation for his probabilities. Frequentism arose during the nineteenth century and held sway until recently. I recall giving a conference talk about Bayesian reasoning only to be heckled by the audience with comments about “new-fangled, trendy Bayesian methods”. Nothing could have been less apt. Probability theory pre-dates the rise of sampling theory and all the frequentist-inspired techniques that modern-day statisticians like to employ and which, in my opinion, have added nothing but confusion to the scientific analysis of statistical data.

Follow @telescoper
June 18, 2013 at 12:03 am

The key point in this discussion is what probability IS. If you are prepared to persist with that question to a coherent answer then you will be in a good position in the present discussion. Even frequentists don’t say that probability is proportion (ie, relative *frequency*) in a finite number of observations, only in an infinite number which you never actually observe (!) and in which the uncontrolled variables change “randomly” – a word whose definition is ultimately circular. If, though, you take probability as a measure of how strongly the assumed truth of one proposition implies another then you get the usual probability calculus, and a coherent theory of how to reason about the physical world in which prior information is regarded as helpful, not a hindrance. In any case, ‘prior’ is relative to an experiment; really there is only information, regardless of whether it is prior to one experiment, or posterior from another, or noticed by accident. If you happen to be certain what the answer is beforehand then you would be mad to neglect that knowledge and you assign all discrepancies from that value, in the data, to noise. Unhappily frequentists do neglect that knowledge. Laplace and his defender Bunn do not attempt to define probability but they nevertheless get the reasoning absolutely right.

June 18, 2013 at 10:26 am

As a (sort of) pragmatist, I think the question of what probability IS can be answered by thinking about what you want probability to DO. If you follow that logic you end up with the Bayesian interpretation as a measure of rational belief…

June 18, 2013 at 5:02 pm

I prefer the “strength of implication” of one binary proposition by another, for several reasons:

1. You are reminded that a probability always has two arguments, so that nonsense about unconditional probabilities gets strangled at birth.

2. No need to get into arguments about what “rational” means or all the psychological baggage that comes with the word “belief”.

3. This is what you want in every real problem involving reasoning under uncertainty, and RT Cox showed in 1946 that it obeys the sum and product rules.

June 18, 2013 at 3:13 pm

Both sides on this ‘debate’ can be frustratingly inflexible. Bunn is right about the virtues of the Bayesian approach in terms of answering questions about the relative probabilities of models. But I can’t see why Bayesians seem incapable of recognizing some of the problems with the approach.

As everyone agrees, Bayes is a recipe for updating your belief. And in a solipsistic world, that’s the end of it. But what to do when different people have different beliefs? This comes down to defining probability, which I would do in the same way as in the origin of the subject: probabilities are numbers you use to generate odds, on the basis of which you would be willing to wager money. Thus you can decide whose beliefs are more nearly correct by seeing who loses money over a sequence of bets (on a number of different experiments, to avoid the issue of repeated trials).

It’s easy to generate beliefs that are falsifiable. If you toss a coin, I could tell you that the probability of tails is zero, because I have a prior that you’re going to cheat me with a double-headed coin. You might object that this is an unreasonable prior, but I can believe whatever I want to. One tail shows us both that I was wrong,

and I also lose an infinite amount of money if I bet according to odds dictated by my belief.

The problem here was that I picked a prior without any good reason to do so – but the rules of the game don’t forbid that. What I’d be happier with is the ability to refuse to specify a prior at all where I don’t feel I know it. As an astronomical example, suppose you have a noisy measurement of the radio flux from a gamma-ray source, and you want to put confidence limits on the true radio flux. You need the bivariate luminosity function, which you don’t know. You could take the Jaynes approach and go for a 1/flux prior, in which case you get the bonkers result that the flux is always zero, independent of the data. So what you need is data from, say, 100 objects, which then gives you an idea of the LF, and you can then put sensible confidence limits on object 101. I suspect that this is what Efron is getting at with what he calls ’empirical Bayes’.

I’ve made such comments here before, and Peter labelled me as a frequentist. But I like to use Bayes where I can. Still, it bothers me that different Bayesians can and do get different probabilities in the face of the same data – i.e. that priors are uncertain. Most Bayesians seem to ignore this uncertainty, and I can’t see why.

June 18, 2013 at 3:15 pm

It’s not a problem if you say what your prior is! Garbage in, garbage out is not restricted to Bayesian reasoning. We all start from assumptions – the important thing is to say what they are and reason consistently from them.

June 18, 2013 at 5:23 pm

John,

How to assign a probability from certain types of information remains a research problem – including if you want the probability distribution for a parameter before making noisy measurements of it, ie if you want to assign its prior probability distribution from your prior information about it. Since any reasoning inequivalent to Bayes’ theorem violates the sum and/or product rules (from which Bayes’ theorem follows), I take this limitation to mean that we should be working to learn how to assign probabilities in wider cases, not that the methodology is faulty.

Here’s an example where you know how to assign the prior: a bead is located on a horizontal circular wire, but you have no prior idea where. By symmetry, the probability density is constant wrt angle.

As for non-Bayesian methods: suppose your prior information tells you that the parameter of interest is definitely NOT in some window. Since the posterior is equal to the product of the prior and the likelihood, renormalised, this feature carries over into the posterior – exactly as intuition demands. But no method taken out of the sampling-theoretical toolkit can do that. Use of prior information is a virtue, not a vice.