Archive for the Bad Statistics Category

On the Edgeworth Series…

Posted in Bad Statistics, The Universe and Stuff with tags , , on September 12, 2017 by telescoper

There’s a nice paper on the arXiv today by Elena Sellentin, Andrew Jaffe and Alan Heavens about the use of the Edgeworth series in statistical cosmology; it is evidently the first in a series about the Edgeworth series.

Here is the abstract:

Non-linear gravitational collapse introduces non-Gaussian statistics into the matter fields of the late Universe. As the large-scale structure is the target of current and future observational campaigns, one would ideally like to have the full probability density function of these non-Gaussian fields. The only viable way we see to achieve this analytically, at least approximately and in the near future, is via the Edgeworth expansion. We hence rederive this expansion for Fourier modes of non-Gaussian fields and then continue by putting it into a wider statistical context than previously done. We show that in its original form, the Edgeworth expansion only works if the non-Gaussian signal is averaged away. This is counterproductive, since we target the parameter-dependent non-Gaussianities as a signal of interest. We hence alter the analysis at the decisive step and now provide a roadmap towards a controlled and unadulterated analysis of non-Gaussianities in structure formation (with the Edgeworth expansion). Our central result is that, although the Edgeworth expansion has pathological properties, these can be predicted and avoided in a careful manner. We also show that, despite the non-Gaussianity coupling all modes, the Edgeworth series may be applied to any desired subset of modes, since this is equivalent (to the level of the approximation) to marginalising over the exlcuded modes. In this first paper of a series, we restrict ourselves to the sampling properties of the Edgeworth expansion, i.e.~how faithfully it reproduces the distribution of non-Gaussian data. A follow-up paper will detail its Bayesian use, when parameters are to be inferred.

The Edgeworth series – a method of approximating a probability distribution in terms of a series determined by its cumulants – has found a number of cosmological applications over the years, but it does suffer from a number of issues, one of the most important being that it is not guaranteed to be a proper probability distribution, in that the resulting probabilities can be negative…

I’ve been thinking about how to avoid this issue myself, and mentioned a possibility in the talk I gave at South Kensington Technical Imperial College earlier this summer. The idea is to represent the cosmological density field (usually denoted δ) in terms of the square of the modulus of a (complex) wave function ψ i.e. |ψψ*|. It then turns out that the evolution equations for cosmic fluid can be rewritten as a kind of Schrodinger equation. One powerful advantage of this approach is that whatever you do in terms of approximating ψ, the resulting density ψψ* is bound to be positive. This finesses the problem of negative probabilities but at the price of introducing more complexity (geddit?) into the fluid equations. On the other hand, it does mean that even first-order perturbative evolution of ψ guarantees a sensible probability distribution whereas first-order evolution of δ does not and has

 

Advertisements

Summer’s Ending

Posted in Bad Statistics, Biographical, Cricket with tags , , , , , on September 11, 2017 by telescoper

There’s no escaping the signs that summer is drawing to a close. The weather took a decidedly autumnal turn  at the end of last week, and though I resisted the temptation to turn the central heating on at Chateau Coles I fear it won’t be long before I have to face reality and take that step. I hope I can hold out at least until the conventional end of summer, the autumnal equinox, which this year happens at 21.02 BST on Friday, 22 September.

Saturday saw the Last Night of the BBC Proms season. I’ve enjoyed a great many of the concerts but I only listened to a bit of the first half of the Last Night as I find the jingoism of the second half rather hard to stomach. I did catch Nina Stemme on the wireless giving it some welly in the Liebestod from Tristan und Insolde, though.  Pretty good, but difficult to compare with my favourite version by Kirsten Flagstad.

One of the highlights of the season, just a few days ago, was Sir András Schiff’s late-night performance of Book I of The Well Tempered Clavier which had me captivated for two hours, until well past my usual bedtime…

However, as the Proms season ends in London the music-making continues in Cardiff with a new series of international concerts at St David’s Hall and Welsh National Opera’s new season at the Wales Millennium Centre (which starts on 23rd September). I notice also that, having finished his complete Beethoven cycle,  Llŷr Williams is embarking on a series of recitals of music by Schubert, starting on November 9th at the Royal Welsh College of Music and Drama.

Another sign that summer is over is that the last Test Match of the summer has ended. Excellent bowling by Jimmy Anderson (and, in the first innings, by Ben Stokes) meant that England had only a small total to chase, which they managed comfortably. Victory at Lord’s gives England a 2-1 win for the series over West Indies. That outcome is welcome for England fans, but it doesn’t do much to build confidence for the forthcoming Ashes series in Australia. England’s pace bowlers have shown they can prosper in English conditions, when the Duke ball can be made to swing, but in Australia with the Kookaburra they may find success much harder to come by. More importantly, however, only two of England’s five top-order batsmen are of proven international class, making their batting lineup extremely fragile. So much depends on Cook and Root, as I don’t think it is at all obvious who should take the other three positions, despite a whole summer of experimentation.

There are a few one-day internationals and Twenty20 matches coming up as well as three full weeks of County Championship fixtures. In particular, there are two home games for Glamorgan in the next two weeks (one against Northants, starting tomorrow, and one next week against Gloucestershire). Their last match (away against Derbyshire) was drawn because three of the four days were lost to rain, but weather permitting there should still be a few opportunities to see cricket at Sophia Gardens this year.

And of course it will soon be time to for the start of the new academic year, welcoming new students (including the first intake on our MSc courses in Data-Intensive Physics and Astrophysics and new PhD students in Data-Intensive Science who form the first intake of our new Centre for Doctoral Training). All that happens just a couple of weeks from today, and we’re having a big launch event on 25th-26th September to welcome the new intake and introduce them to our industrial and academic partners.

Anyway, that reminds me that I have quite a lot to do before term starts so I’d better get on with it, especially if I’m going to make time to watch a few days of cricket between now and the end of the month!

Random Image

Posted in Bad Statistics on September 10, 2017 by telescoper

No time for a proper post today so here’s a random* picture made by my student, Will..

*In some sense…

More Worthless University Rankings

Posted in Bad Statistics, Education with tags , , , on September 6, 2017 by telescoper

The Times Higher World University Rankings, which were released this week. The main table can be found here and the methodology used to concoct them here.

Here I wish to reiterate the objection I made last year and the year before that to the way these tables are manipulated year on year to create an artificial “churn” that renders them unreliable and impossible to interpret in any objective way. In other words, they’re worthless. This year the narrative text includes:

This year’s list of the best universities in the world is led by two UK universities for the first time. The University of Oxford has held on to the number one spot for the second year in a row, while the University of Cambridge has jumped from fourth to second place.

Overall, European institutions occupy half of the top 200 places, with the Netherlands and Germany joining the UK as the most-represented countries. Italy, Spain and the Netherlands each have new number ones.

Another notable trend is the continued rise of China. The Asian giant is now home to two universities in the top 30: Peking and Tsinghua. The Beijing duo now outrank several prestigious institutions in Europe and the US. Meanwhile, almost all Chinese universities have improved, signalling that the country’s commitments to investment has bolstered results year-on-year.

In contrast, two-fifths of the US institutions in the top 200 (29 out of 62) have dropped places. In total, 77 countries feature in the table.

These comments are all predicated on the assumption that any changes since the last tables represent changes in data (which in turn are assumed to be relevant to how good a university is) rather than changes in the methodology used to analyse that data. Unfortunately, every single year the Times Higher changes its methodology. This time we are told:

This year, we have made a slight improvement to how we handle our papers per academic staff calculation, and expanded the number of broad subject areas that we use.

What has been the effect of these changes? We are not told. The question that must be asked is how can we be sure that any change in league table position for an institution from year to year represents a change in “performance”,rather than a change in the way metrics are constructed and/or combined? Would you trust the outcome of a medical trial in which the response of two groups of patients (e.g. one given medication and the other placebo) were assessed with two different measurement techniques?

There is an obvious and easy way to test for the size of this effect, which is to construct a parallel set of league tables, with this year’s input data but last year’s methodology, which would make it easy to isolate changes in methodology from changes in the performance indicators. The Times Higher – along with other purveyors of similar statistical twaddle – refuses to do this. No scientifically literate person would accept the result of this kind of study unless the systematic effects can be shown to be under control. There is a very easy way for the Times Higher to address this question: all they need to do is publish a set of league tables using, say, the 2016/17 methodology and the 2017/18 data, for comparison with those constructed using this year’s methodology on the 2017/18 data. Any differences between these two tables will give a clear indication of the reliability (or otherwise) of the rankings.

I challenged the Times Higher to do this last year, and they refused. You can draw your own conclusions about why.

P.S. For the record, Cardiff University is 162nd in this year’s table, a rise of 20 places on last year. My former institution, the University of Sussex, is up two places to joint 147th. Whether these changes are anything other than artifacts of the data analysis I very much doubt.

On the Time Lags of the LIGO signals

Posted in Bad Statistics, The Universe and Stuff with tags , , , on August 10, 2017 by telescoper

It seems that a lot of rumours are flying around on social media and elsewhere about the discussions that have been going on here in Copenhagen between members of the Niels Bohr Institute and of the LIGO scientific collaboration concerning matters arising from the `Danish Paper‘.  The most prominent among these appears to be the LIGO team and the Danish team have agreed on everything and that the Danish authors have conceded that they were mistaken in their claims. I have even been told that my recent blog posts gave the impression that this was the case. I’m not sure how, as all I’ve said is that the discussions reached agreement on some matters. I did not say what matters or whose position had changed.

I feel, therefore, that some clarification is necessary. Since I am a member of neither party to this controversy I have to tread carefully, and there are some things which I feel I should not discuss at all. I was invited to participate in the discussions as a neutral observer as a courtesy and I certainly don’t want to betray any confidences. On one thing, however, I can be perfectly clear. The Danish team (Cresswell et al.) have not retracted their claims and they reject the suggestion that their paper was wrong.

To reinforce this, I draw your attention to the fact that a revised version of `The Danish Paper’ has now been accepted for publication (in the Journal of Cosmology and Astroparticle Physics) and that this paper is now available on the arXiv. The referees raised a large number of queries, and in response to them all the revised version is almost double the length of the original.

Here is the arXiv entry page:

The main body of the paper has not been significantly modified and their main result – of an unexplained 7ms correlation in the background signal (referred to in the abstract as `noise’) – has not “gone away”. If you want to understand more, read the paper!

I’m sure there will be much more discussion of this and I will comment as appropriate when appropriate. In the meantime this remains very much a live issue.

P.S. In the interest of full disclosure I should mention that I did read over part of the revised version of the Danish paper and made some suggestions with regard to style and flow. I therefore have a mention in the acknowledgments of the final version. I was warned that I might expect some trouble for agreeing to be associated with the paper in this way but, as  Sam Spade says in The Maltese Falcon `I don’t mind a reasonable amount of trouble’…

Yellow Stars, Red Stars and Bayesian Inference

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , on May 25, 2017 by telescoper

I came across a paper on the arXiv yesterday with the title `Why do we find ourselves around a yellow star instead of a red star?’.  Here’s the abstract:

M-dwarf stars are more abundant than G-dwarf stars, so our position as observers on a planet orbiting a G-dwarf raises questions about the suitability of other stellar types for supporting life. If we consider ourselves as typical, in the anthropic sense that our environment is probably a typical one for conscious observers, then we are led to the conclusion that planets orbiting in the habitable zone of G-dwarf stars should be the best place for conscious life to develop. But such a conclusion neglects the possibility that K-dwarfs or M-dwarfs could provide more numerous sites for life to develop, both now and in the future. In this paper we analyze this problem through Bayesian inference to demonstrate that our occurrence around a G-dwarf might be a slight statistical anomaly, but only the sort of chance event that we expect to occur regularly. Even if M-dwarfs provide more numerous habitable planets today and in the future, we still expect mid G- to early K-dwarfs stars to be the most likely place for observers like ourselves. This suggests that observers with similar cognitive capabilities as us are most likely to be found at the present time and place, rather than in the future or around much smaller stars.

Athough astrobiology is not really my province,  I was intrigued enough to read on, until I came to the following paragraph in which the authors attempt to explain how Bayesian Inference works:

We approach this problem through the framework of Bayesian inference. As an example, consider a fair coin that is tossed three times in a row. Suppose that all three tosses turn up Heads. Can we conclude from this experiment that the coin must be weighted? In fact, we can still maintain our hypothesis that the coin is fair because the chances of getting three Heads in a row is 1/8. Many events with a probability of 1/8 occur every day, and so we should not be concerned about an event like this indicating that our initial assumptions are flawed. However, if we were to flip the same coin 70 times in a row with all 70 turning up Heads, we would readily conclude that the experiment is fixed. This is because the probability of flipping 70 Heads in a row is about 10-22, which is an exceedingly unlikely event that has probably never happened in the history of the universe. This
informal description of Bayesian inference provides a way to assess the probability of a hypothesis in light of new evidence.

Obviously I agree with the statement right at the end that `Bayesian inference provides a way to assess the probability of a hypothesis in light of new evidence’. That’s certainly what Bayesian inference does, but this `informal description’ is really a frequentist rather than a Bayesian argument, in that it only mentions the probability of given outcomes not the probability of different hypotheses…

Anyway, I was so unconvinced by this description’ that I stopped reading at that point and went and did something else. Since I didn’t finish the paper I won’t comment on the conclusions, although I am more than usually sceptical. You might disagree of course, so read the paper yourself and form your own opinion! For me, it goes in the file marked Bad Statistics!

Polls Apart

Posted in Bad Statistics, Politics with tags , , , , , , , on May 9, 2017 by telescoper

Time for some random thoughts about political opinion polls, the light of Sunday’s French Presidential Election result.

We all know that Emmanuel Macron beat Marine Le Pen in the second round ballot: he won 66.1% of the votes cast to Le Pen’s 33.9%. That doesn’t count the very large number of spoilt ballots or abstentions (25.8% in total). The turnout was down on previous elections, but at 74.2% it’s still a lot higher than we can expect in the UK at the forthcoming General Election.

The French opinion polls were very accurate in predicting the first round results, getting the percentage results for the four top candidates within a percentage or two which is as good as it gets for typical survey sizes.

Nate Silver Harry Enten has written a post on Nate Silver’s FiveThirtyEight site claiming that the French opinion polls for the second round “runoff” were inaccurate. He bases this on the observation that the “average poll” in between the two rounds of voting gave Macron a lead of about 22% (61%-39%). That’s true, but it assumes that opinions did not shift in the latter stages of the campaign. In particular it ignores Marine Le Pen’s terrible performance in the one-on-one TV debate against Macron on 4th May. Polls conducted after that show (especially a big one with a sample of 5331 by IPSOS) gave a figure more like 63-37, i.e. a 26 point lead.

In any case it can be a bit misleading to focus on the difference between the two vote shares. In a two-horse race, if you’re off by +3 for one candidate you will be off by -3 for the other. In other words, underestimating Macron’s vote automatically means over-estimating Le Pen’s. A ‘normal’ sampling error looks twice as bad if you frame it in terms of differences like this.  The last polls giving Macron at 63% are only off by 3%, which is a normal sampling error…

The polls were off by more than they have been in previous years (where they have typically predicted the spread within 4%. There’s also the question of how the big gap between the two candidates may have influenced voter behaviour,  increasing the number of no-shows.

So I don’t think the French opinion polls did as badly as all that. What still worries me, though, is the different polls consistently gave results that agreed with the others to within 1% or so, when there really should be sampling fluctuations. Fishy.

By way of a contrast, consider a couple of recent opinion polls conducted by YouGov in Wales. The first, conducted in April, gave the following breakdown of likely votes:

Poll

The apparent ten-point lead for the Conservatives over Labour (which is traditionally dominant in Wales) created a lot of noise in the media as it showed the Tories up 12% on the previous such poll taken in January (and Labour down 3%); much of the Conservative increase was due to a collapse in the UKIP share. Here’s the long-term picture from YouGov:

Wales-01

As an aside I’ll mention that ‘barometer’ surveys like this are sometimes influenced by changes in weightings and other methodological factors that can artificially produce different outcomes. I don’t know if anything changed in this regard between January 2017 and May 2017 that might have contributed to the large swing to the Tories, so let’s just assume that it’s “real”.

This “sensational” result gave various  pundits (e.g. Cardiff’s own Roger Scully) the opportunity to construct various narratives about the various implications for the forthcoming General Election.

Note, however, the sample sample size (1029), which implies an uncertainty of ±3% or so in the result. It came as no surprise to me, then, to see that the next poll by YouGov was a bit different: Conservatives on 41% (+1), but Labour on 35% (+5). That’s still grim for Labour, of course, but not quite as grim as being 10 points behind.

So what happened in the two weeks between these two polls? Well, one thing is that many places had local elections which resulted in lots of campaigning. In my ward, at least, that made a big difference: Labour increased its share of the vote compared to the 2012 elections (on a 45% turnout, which is high for local elections). Maybe then it’s true that Labour has been “fighting back” since the end of April.

Alternatively, and to my mind more probably, what we’re seeing is just the consequence of very large sampling errors. I think it’s likely that the Conservatives are in the lead, but by an extremely uncertain margin.

But why didn’t we see fluctuations of this magnitude in the French opinion polls of similar size?

Answers on a postcard, or through the comments box, please.