## Have you got a proper posterior?

There’s an interesting paper on the arXiv today by Tak et al. with the title `How proper are Bayesian models in the astronomical literature?’ The title isn’t all that appropriate, because the problem is not really with `models’, but with the choice of prior (which should be implied by the model and other information known or assumed to be true). Moreover, I’m not sure whether the word `Bayesian’ applies to the *model* in any meaningful way.

Anyway, The abstract is as follows:

The well-known Bayes theorem assumes that a posterior distribution is a probability distribution. However, the posterior distribution may no longer be a probability distribution if an improper prior distribution (non-probability measure) such as an unbounded uniform prior is used. Improper priors are often used in the astronomical literature to reflect on a lack of prior knowledge, but checking whether the resulting posterior is a probability distribution is sometimes neglected. It turns out that 24 articles out of 75 articles (32\%) published online in two renowned astronomy journals (ApJ and MNRAS) between Jan 1, 2017 and Oct 15, 2017 make use of Bayesian analyses without rigorously establishing posterior propriety. A disturbing aspect is that a Gibbs-type Markov chain Monte Carlo (MCMC) method can produce a seemingly reasonable posterior sample even when the posterior is not a probability distribution (Hobert and Casella, 1996). In such cases, researchers may erroneously make probabilistic inferences without noticing that the MCMC sample is from a non-existent probability distribution. We review why checking posterior propriety is fundamental in Bayesian analyses when improper priors are used and discuss how we can set up scientifically motivated proper priors to avoid the pitfalls of using improper priors.

This paper makes a point that I have wondered about on a number of occasions. One of the problems, in my opinion, is that astrophysicists don’t think enough about their choice of prior. An improper prior is basically a statement of ignorance about the result one expects in advance of incoming data. However, very often we know more than we think we do. I’ve lost track of the number of papers I’ve seen in which the authors blithely assume a flat prior when that makes no sense whatsoever on the basis of what information is available and, indeed, on the structure of the model within which the data are to be interpreted. I discuss a simple example here.

In my opinion the prior is not (as some frequentists contend) some kind of aberration. It plays a clear logical role in Bayesian inference. It can build into the analysis constraints that are implied by the choice of model framework. Even if it is used as a subjective statement of prejudice, the Bayesian approach at least requires one to put that prejudice on the table where it can be seen.

There are undoubtedly situations where we don’t know enough to assign a proper prior. That’s not necessarily a problem. Improper priors can – and do – lead to proper posterior distributions if (and it’s an important if) they include, or the likelihood subsequently imposes, a cutoff on the prior space. The onus should be on the authors of a paper to show that their likelihood is such that it does this and produces a posterior which is well-defined probability measure (specifically that it is normalisable, ie can be made to integrate to unity). It seems that astronomers don’t always do this!

Follow @telescoper
December 12, 2017 at 1:06 pm

An improper (ie, non-normalisable) prior is not to be feared provided that, if a cutoff is imposed on it, dependence of the posterior on the cutoff value dwindles as this value becomes arbitrarily large. If, however, your *posterior* is non-normalisable then you are being given a very bright warning light that something is wrong, and you should not dream of publishing until you have fixed it. As Peter says about Bayesian methods, it is better to have a warning light on your dashboard than not.

December 13, 2017 at 8:38 am

One can argue the same thing for frequentistic approaches: Is the hypothesis test you choose to perform suitable for your problem? For that you need your “prior” knowledge on the problem and that is far worse than a badly chosen Bayes prior, because the test will always give you a significance independent on how reasonable it is for your problem. There will be *no* warning lights.

But I agree with the authors that you should check if what you get as posterior is really a probability distribution (or in general, that your approach is reasonable).

A shame is that scientists (at least at my university) are not rigorously educated in statistics. In physics we have no mandatory statistics class at all… A masterstudent once asked me what sigma means. 🙂