Archive for the Bad Statistics Category

Why we should abandon “statistical significance”

Posted in Bad Statistics with tags , on September 27, 2017 by telescoper

So a nice paper by McShane et al. has appeared on the arXiv with the title Abandon Statistical Significance and abstract:

In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration–often scant–given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.

This piece is in part a reaction to a paper by Benjamin et al. in Nature Human Behaviour that argues for the adoption of a standard threshold of p=0.005 rather than the more usual p=0.05. This latter paper has generated a lot of interest, but I think it misses the point entirely. The fundamental problem is not what number is chosen for the threshold p-value, but what this statistic does (and does not) mean. It seems to me the p-value is usually an answer to a question which is quite different from that which a scientist would want to ask, which is what the data have to say about a given hypothesis. I’ve banged on about Bayesian methods quite enough on this blog so I won’t repeat the arguments here, except that such approaches focus on the probability of a hypothesis being right given the data, rather than on properties that the data might have given the hypothesis.

While I generally agree with the arguments given in McShane et al, I don’t think it goes far enough. I think p-values are so misleading, if I had my way I’d ban them altogether!


On the Edgeworth Series…

Posted in Bad Statistics, The Universe and Stuff with tags , , on September 12, 2017 by telescoper

There’s a nice paper on the arXiv today by Elena Sellentin, Andrew Jaffe and Alan Heavens about the use of the Edgeworth series in statistical cosmology; it is evidently the first in a series about the Edgeworth series.

Here is the abstract:

Non-linear gravitational collapse introduces non-Gaussian statistics into the matter fields of the late Universe. As the large-scale structure is the target of current and future observational campaigns, one would ideally like to have the full probability density function of these non-Gaussian fields. The only viable way we see to achieve this analytically, at least approximately and in the near future, is via the Edgeworth expansion. We hence rederive this expansion for Fourier modes of non-Gaussian fields and then continue by putting it into a wider statistical context than previously done. We show that in its original form, the Edgeworth expansion only works if the non-Gaussian signal is averaged away. This is counterproductive, since we target the parameter-dependent non-Gaussianities as a signal of interest. We hence alter the analysis at the decisive step and now provide a roadmap towards a controlled and unadulterated analysis of non-Gaussianities in structure formation (with the Edgeworth expansion). Our central result is that, although the Edgeworth expansion has pathological properties, these can be predicted and avoided in a careful manner. We also show that, despite the non-Gaussianity coupling all modes, the Edgeworth series may be applied to any desired subset of modes, since this is equivalent (to the level of the approximation) to marginalising over the exlcuded modes. In this first paper of a series, we restrict ourselves to the sampling properties of the Edgeworth expansion, i.e.~how faithfully it reproduces the distribution of non-Gaussian data. A follow-up paper will detail its Bayesian use, when parameters are to be inferred.

The Edgeworth series – a method of approximating a probability distribution in terms of a series determined by its cumulants – has found a number of cosmological applications over the years, but it does suffer from a number of issues, one of the most important being that it is not guaranteed to be a proper probability distribution, in that the resulting probabilities can be negative…

I’ve been thinking about how to avoid this issue myself, and mentioned a possibility in the talk I gave at South Kensington Technical Imperial College earlier this summer. The idea is to represent the cosmological density field (usually denoted δ) in terms of the square of the modulus of a (complex) wave function ψ i.e. |ψψ*|. It then turns out that the evolution equations for cosmic fluid can be rewritten as a kind of Schrodinger equation. One powerful advantage of this approach is that whatever you do in terms of approximating ψ, the resulting density ψψ* is bound to be positive. This finesses the problem of negative probabilities but at the price of introducing more complexity (geddit?) into the fluid equations. On the other hand, it does mean that even first-order perturbative evolution of ψ guarantees a sensible probability distribution whereas first-order evolution of δ does not and has


Summer’s Ending

Posted in Bad Statistics, Biographical, Cricket with tags , , , , , on September 11, 2017 by telescoper

There’s no escaping the signs that summer is drawing to a close. The weather took a decidedly autumnal turn  at the end of last week, and though I resisted the temptation to turn the central heating on at Chateau Coles I fear it won’t be long before I have to face reality and take that step. I hope I can hold out at least until the conventional end of summer, the autumnal equinox, which this year happens at 21.02 BST on Friday, 22 September.

Saturday saw the Last Night of the BBC Proms season. I’ve enjoyed a great many of the concerts but I only listened to a bit of the first half of the Last Night as I find the jingoism of the second half rather hard to stomach. I did catch Nina Stemme on the wireless giving it some welly in the Liebestod from Tristan und Insolde, though.  Pretty good, but difficult to compare with my favourite version by Kirsten Flagstad.

One of the highlights of the season, just a few days ago, was Sir András Schiff’s late-night performance of Book I of The Well Tempered Clavier which had me captivated for two hours, until well past my usual bedtime…

However, as the Proms season ends in London the music-making continues in Cardiff with a new series of international concerts at St David’s Hall and Welsh National Opera’s new season at the Wales Millennium Centre (which starts on 23rd September). I notice also that, having finished his complete Beethoven cycle,  Llŷr Williams is embarking on a series of recitals of music by Schubert, starting on November 9th at the Royal Welsh College of Music and Drama.

Another sign that summer is over is that the last Test Match of the summer has ended. Excellent bowling by Jimmy Anderson (and, in the first innings, by Ben Stokes) meant that England had only a small total to chase, which they managed comfortably. Victory at Lord’s gives England a 2-1 win for the series over West Indies. That outcome is welcome for England fans, but it doesn’t do much to build confidence for the forthcoming Ashes series in Australia. England’s pace bowlers have shown they can prosper in English conditions, when the Duke ball can be made to swing, but in Australia with the Kookaburra they may find success much harder to come by. More importantly, however, only two of England’s five top-order batsmen are of proven international class, making their batting lineup extremely fragile. So much depends on Cook and Root, as I don’t think it is at all obvious who should take the other three positions, despite a whole summer of experimentation.

There are a few one-day internationals and Twenty20 matches coming up as well as three full weeks of County Championship fixtures. In particular, there are two home games for Glamorgan in the next two weeks (one against Northants, starting tomorrow, and one next week against Gloucestershire). Their last match (away against Derbyshire) was drawn because three of the four days were lost to rain, but weather permitting there should still be a few opportunities to see cricket at Sophia Gardens this year.

And of course it will soon be time to for the start of the new academic year, welcoming new students (including the first intake on our MSc courses in Data-Intensive Physics and Astrophysics and new PhD students in Data-Intensive Science who form the first intake of our new Centre for Doctoral Training). All that happens just a couple of weeks from today, and we’re having a big launch event on 25th-26th September to welcome the new intake and introduce them to our industrial and academic partners.

Anyway, that reminds me that I have quite a lot to do before term starts so I’d better get on with it, especially if I’m going to make time to watch a few days of cricket between now and the end of the month!

Random Image

Posted in Bad Statistics on September 10, 2017 by telescoper

No time for a proper post today so here’s a random* picture made by my student, Will..

*In some sense…

More Worthless University Rankings

Posted in Bad Statistics, Education with tags , , , on September 6, 2017 by telescoper

The Times Higher World University Rankings, which were released this week. The main table can be found here and the methodology used to concoct them here.

Here I wish to reiterate the objection I made last year and the year before that to the way these tables are manipulated year on year to create an artificial “churn” that renders them unreliable and impossible to interpret in any objective way. In other words, they’re worthless. This year the narrative text includes:

This year’s list of the best universities in the world is led by two UK universities for the first time. The University of Oxford has held on to the number one spot for the second year in a row, while the University of Cambridge has jumped from fourth to second place.

Overall, European institutions occupy half of the top 200 places, with the Netherlands and Germany joining the UK as the most-represented countries. Italy, Spain and the Netherlands each have new number ones.

Another notable trend is the continued rise of China. The Asian giant is now home to two universities in the top 30: Peking and Tsinghua. The Beijing duo now outrank several prestigious institutions in Europe and the US. Meanwhile, almost all Chinese universities have improved, signalling that the country’s commitments to investment has bolstered results year-on-year.

In contrast, two-fifths of the US institutions in the top 200 (29 out of 62) have dropped places. In total, 77 countries feature in the table.

These comments are all predicated on the assumption that any changes since the last tables represent changes in data (which in turn are assumed to be relevant to how good a university is) rather than changes in the methodology used to analyse that data. Unfortunately, every single year the Times Higher changes its methodology. This time we are told:

This year, we have made a slight improvement to how we handle our papers per academic staff calculation, and expanded the number of broad subject areas that we use.

What has been the effect of these changes? We are not told. The question that must be asked is how can we be sure that any change in league table position for an institution from year to year represents a change in “performance”,rather than a change in the way metrics are constructed and/or combined? Would you trust the outcome of a medical trial in which the response of two groups of patients (e.g. one given medication and the other placebo) were assessed with two different measurement techniques?

There is an obvious and easy way to test for the size of this effect, which is to construct a parallel set of league tables, with this year’s input data but last year’s methodology, which would make it easy to isolate changes in methodology from changes in the performance indicators. The Times Higher – along with other purveyors of similar statistical twaddle – refuses to do this. No scientifically literate person would accept the result of this kind of study unless the systematic effects can be shown to be under control. There is a very easy way for the Times Higher to address this question: all they need to do is publish a set of league tables using, say, the 2016/17 methodology and the 2017/18 data, for comparison with those constructed using this year’s methodology on the 2017/18 data. Any differences between these two tables will give a clear indication of the reliability (or otherwise) of the rankings.

I challenged the Times Higher to do this last year, and they refused. You can draw your own conclusions about why.

P.S. For the record, Cardiff University is 162nd in this year’s table, a rise of 20 places on last year. My former institution, the University of Sussex, is up two places to joint 147th. Whether these changes are anything other than artifacts of the data analysis I very much doubt.

On the Time Lags of the LIGO signals

Posted in Bad Statistics, The Universe and Stuff with tags , , , on August 10, 2017 by telescoper

It seems that a lot of rumours are flying around on social media and elsewhere about the discussions that have been going on here in Copenhagen between members of the Niels Bohr Institute and of the LIGO scientific collaboration concerning matters arising from the `Danish Paper‘.  The most prominent among these appears to be the LIGO team and the Danish team have agreed on everything and that the Danish authors have conceded that they were mistaken in their claims. I have even been told that my recent blog posts gave the impression that this was the case. I’m not sure how, as all I’ve said is that the discussions reached agreement on some matters. I did not say what matters or whose position had changed.

I feel, therefore, that some clarification is necessary. Since I am a member of neither party to this controversy I have to tread carefully, and there are some things which I feel I should not discuss at all. I was invited to participate in the discussions as a neutral observer as a courtesy and I certainly don’t want to betray any confidences. On one thing, however, I can be perfectly clear. The Danish team (Cresswell et al.) have not retracted their claims and they reject the suggestion that their paper was wrong.

To reinforce this, I draw your attention to the fact that a revised version of `The Danish Paper’ has now been accepted for publication (in the Journal of Cosmology and Astroparticle Physics) and that this paper is now available on the arXiv. The referees raised a large number of queries, and in response to them all the revised version is almost double the length of the original.

Here is the arXiv entry page:

The main body of the paper has not been significantly modified and their main result – of an unexplained 7ms correlation in the background signal (referred to in the abstract as `noise’) – has not “gone away”. If you want to understand more, read the paper!

I’m sure there will be much more discussion of this and I will comment as appropriate when appropriate. In the meantime this remains very much a live issue.

P.S. In the interest of full disclosure I should mention that I did read over part of the revised version of the Danish paper and made some suggestions with regard to style and flow. I therefore have a mention in the acknowledgments of the final version. I was warned that I might expect some trouble for agreeing to be associated with the paper in this way but, as  Sam Spade says in The Maltese Falcon `I don’t mind a reasonable amount of trouble’…

Yellow Stars, Red Stars and Bayesian Inference

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , on May 25, 2017 by telescoper

I came across a paper on the arXiv yesterday with the title `Why do we find ourselves around a yellow star instead of a red star?’.  Here’s the abstract:

M-dwarf stars are more abundant than G-dwarf stars, so our position as observers on a planet orbiting a G-dwarf raises questions about the suitability of other stellar types for supporting life. If we consider ourselves as typical, in the anthropic sense that our environment is probably a typical one for conscious observers, then we are led to the conclusion that planets orbiting in the habitable zone of G-dwarf stars should be the best place for conscious life to develop. But such a conclusion neglects the possibility that K-dwarfs or M-dwarfs could provide more numerous sites for life to develop, both now and in the future. In this paper we analyze this problem through Bayesian inference to demonstrate that our occurrence around a G-dwarf might be a slight statistical anomaly, but only the sort of chance event that we expect to occur regularly. Even if M-dwarfs provide more numerous habitable planets today and in the future, we still expect mid G- to early K-dwarfs stars to be the most likely place for observers like ourselves. This suggests that observers with similar cognitive capabilities as us are most likely to be found at the present time and place, rather than in the future or around much smaller stars.

Athough astrobiology is not really my province,  I was intrigued enough to read on, until I came to the following paragraph in which the authors attempt to explain how Bayesian Inference works:

We approach this problem through the framework of Bayesian inference. As an example, consider a fair coin that is tossed three times in a row. Suppose that all three tosses turn up Heads. Can we conclude from this experiment that the coin must be weighted? In fact, we can still maintain our hypothesis that the coin is fair because the chances of getting three Heads in a row is 1/8. Many events with a probability of 1/8 occur every day, and so we should not be concerned about an event like this indicating that our initial assumptions are flawed. However, if we were to flip the same coin 70 times in a row with all 70 turning up Heads, we would readily conclude that the experiment is fixed. This is because the probability of flipping 70 Heads in a row is about 10-22, which is an exceedingly unlikely event that has probably never happened in the history of the universe. This
informal description of Bayesian inference provides a way to assess the probability of a hypothesis in light of new evidence.

Obviously I agree with the statement right at the end that `Bayesian inference provides a way to assess the probability of a hypothesis in light of new evidence’. That’s certainly what Bayesian inference does, but this `informal description’ is really a frequentist rather than a Bayesian argument, in that it only mentions the probability of given outcomes not the probability of different hypotheses…

Anyway, I was so unconvinced by this description’ that I stopped reading at that point and went and did something else. Since I didn’t finish the paper I won’t comment on the conclusions, although I am more than usually sceptical. You might disagree of course, so read the paper yourself and form your own opinion! For me, it goes in the file marked Bad Statistics!