## Throwing a Fit

I’ve just been to a very interesting and stimulating seminar by Subir Sarkar from Oxford, who spoke about *Cosmology Beyond the Standard Model*, a talk into which he packed a huge number of provocative comments and interesting arguments. His abstract is here:

Precision observations of the cosmic microwave backround and of the large-scale clustering of galaxies have supposedly confirmed the indication from the Hubble diagram of Type Ia supernovae that the universe is dominated by some form of dark energy which is causing the expansion rate to accelerate. Although hailed as having established a ‘standard model’ for cosmology, this raises a profound problem for fundamental physics. I will discuss whether the observations can be equally well explained in alternative inhomogeneous cosmological models that do not require dark energy and will be tested by forthcoming observations.

He made no attempt to be balanced and objective, but it was a thoroughly enjoyable polemic making the point that it is possible that the dark energy whose presence we infer from cosmological observations might just be an artifact of using an oversimplified model to interpret the data. I actually agreed with quite a lot of what he said, and certainly think the subject needs people willing to question the somewhat shaky foundations on which the standard concordance cosmology is built.

But near the end, Subir almost spoiled the whole thing by making a comment that made me decide to make another entry in my Room 101 of statistical horrors. He was talking about the spectrum of fluctuations in the temperature of the Cosmic Microwave Background as measured by the Wilkinson Microwave Anisotropy Probe (WMAP):

I’ve mentioned the importance of this plot in previous posts. In his talk, Subir wanted to point out that the measured spectrum isn’t actually fit all that well by the concordance cosmology prediction shown by the solid line.

A simple way of measuring goodness-of-fit is to work out the value of chi-squared which relates to the sum of the squares of the residuals between the data and the fit. If you do this with the WMAP data you will find that the value of chi-squared is actually a bit high, so high indeed that there is only a 7 per cent chance of such a value arising in a concordance Universe. The reason is probably to do with the behaviour at low harmonics (i.e. large scales) where there are some points that do appear to lie off the model curve. This means that the best fit concordance model isn’t a really brilliant fit, but it is acceptable at the usual 5% significance level.

I won’t quibble with this number, although strictly speaking the data points aren’t entirely independent so the translation of chi-squared into a probability is not quite as easy as it may seem. I’d also stress that I think it is valuable to show that the concordance model isn’t by any means perfect. However, in Subir’s talk the chi-squared result morphed into a statement that the probability of the concordance model being right is only 7 per cent.

No! The probability of chi-squared given the model is 7%, but that’s quite different to the probability of the model given the value of chi-squared…

This is a thinly disguised example of the prosecutor’s fallacy which came up in my post about Sir Roy Meadow and his testimony in the case against Sally Clark that resulted in a wrongful conviction for the murder of her two children.

Of course the consequences of this polemicist’s fallacy aren’t so drastic. The Universe won’t go to prison. And it didn’t really spoil what was a fascinating talk. But it did confirm in my mind that statistics is like alcohol. It makes clever people say very silly things.

February 19, 2009 at 2:12 pm

My favourite example of how “the probability of the data given the model” is not the same as “the probability of the model given the data” came from a talk up here last month, by Roberto Trotta.

The model is that the person talking to me is a woman.

The data is that the person talking to me is pregnant.

The probability of the data, given the model, is pretty low – the incidence of pregnancy in the population is about 3%..

But the probability of the model, given this data, is very, very high.

February 21, 2009 at 12:16 am

Hi Peter,

Forgive my ignorance, but how does one go about translating a chi-squared into a probability if the data points are not all independent?

I have to admit that I wish folks in my adopted field thought a little more about their data and what it actually says.

Adrian

February 21, 2009 at 11:49 am

Adrian

What you have to do is look at the full covariance matrix describing the errors in all the points. Essentially chi-squared is the sum of the diagonal elements of this, but cross-terms are important if the errors are correlated. It’s not pretty to do, but knowledge of this matrix allows one to generalize the usual chi-squared approach.

Peter

May 5, 2009 at 8:51 pm

Hi Peter,

I heard on the grapevine that you have featured me on your blog and I see that I have indeed been so honoured although not necessarily as I’d have liked!

I am familiar of course with the idea that “the probability of the data given the model” is not the same as “the probability of the model given the data” – incidentally the amusing illustration of this in Brendan’s post above is due to Louis Lyons (from whom Roberto Trotta must have heard it when he was a postdoc at Oxford). What I actually said in my talk was “There is only a 7% chance that the concordance modcel is the correct description of the WMAP data”

May 5, 2009 at 9:24 pm

(sorry hit the submit button in error … message continued below)

I agree that this was not a precise statement statistically speaking, nevertheless most physicists would agree that if the reduced \chi^2 (per degree of freedom) exceeds unity significantly then the model is unlikely to be correct. So I think you are being a bit polemical yourself in calling this a “polemicist’s fallacy”!

Incidentally the high \chi^2 is not so much due to “the behaviour at low harmonics (i.e. large scales)” but because of the outliers (“glitches”) around \ell ~100-300 (see Fig. 17 of astro-ph/0603451). The situation has admittedly improved since the WMAP-1 data release when the probability was only 3%. The glitch at \ell ~ 208 is now much less prominent – Hinshaw et al account for this by saying “We believe that this feature was predominantly a noise fluctuation in the first-year data”. I have no idea what that means! I am even more confused by their assertion: “In the absence of an established theoretical framework in which to interpret these glitches (beyond the Gaussian, random phase paradigm), they will likely remain curiosities”. Surely observers should be interested in exploring why their “best-fit” model is not such a great fit, quite independently of whether theorists have a plausible explanation?! (I am glad that by contrast particle physicists energetically pursue all indications of deviations from their Standard Model, rather than simply ignore them and hope they will go away.)

By the same token can one not say: “In the absence of an established theoretical framework in which to accomodate dark energy, the apparent acceleration of the Hubble expansion will remain a curiosity” – you heard it here first!

Regards – Subir

September 23, 2011 at 3:41 am

[…] (Image credit: WMAP team, retrieved from Peter Coles’ site.) […]

September 24, 2011 at 9:58 am

[…] (Image credit: WMAP team, retrieved from Peter Coles’ site.) […]