Archive for Sampling

The Insignificance of ORB

Posted in Bad Statistics with tags , , , on April 5, 2016 by telescoper

A piece about opinion polls ahead of the EU Referendum which appeared in today’s Daily Torygraph has spurred me on to make a quick contribution to my bad statistics folder.

The piece concerned includes the following statement:

David Cameron’s campaign to warn voters about the dangers of leaving the European Union is beginning to win the argument ahead of the referendum, a new Telegraph poll has found.

The exclusive poll found that the “Remain” campaign now has a narrow lead after trailing last month, in a sign that Downing Street’s tactic – which has been described as “Project Fear” by its critics – is working.

The piece goes on to explain

The poll finds that 51 per cent of voters now support Remain – an increase of 4 per cent from last month. Leave’s support has decreased five points to 44 per cent.

This conclusion is based on the results of a survey by ORB in which the number of participants was 800. Yes, eight hundred.

How much can we trust this result on statistical grounds?

Suppose the fraction of the population having the intention to vote in a particular way in the EU referendum is p. For a sample of size n with x respondents indicating that they hen one can straightforwardly estimate p \simeq x/n. So far so good, as long as there is no bias induced by the form of the question asked nor in the selection of the sample which, given the fact that such polls have been all over the place seems rather unlikely.

A little bit of mathematics involving the binomial distribution yields an answer for the uncertainty in this estimate of p in terms of the sampling error:

\sigma = \sqrt{\frac{p(1-p)}{n}}

For the sample size of 800 given, and an actual value p \simeq 0.5 this amounts to a standard error of about 2%. About 95% of samples drawn from a population in which the true fraction is p will yield an estimate within p \pm 2\sigma, i.e. within about 4% of the true figure. In other words the typical variation between two samples drawn from the same underlying population is about 4%. In other other words, the change reported between the two ORB polls mentioned above can be entirely explained by sampling variation and does not at all imply any systematic change of public opinion between the two surveys.

I need hardly point out that in a two-horse race (between “Remain” and “Leave”) an increase of 4% in the Remain vote corresponds to a decrease in the Leave vote by the same 4% so a 50-50 population vote can easily generate a margin as large as  54-46 in such a small sample.

Why do pollsters bother with such tiny samples? With such a large margin error they are basically meaningless.

I object to the characterization of the Remain campaign as “Project Fear” in any case. I think it’s entirely sensible to point out the serious risks that an exit from the European Union would generate for the UK in loss of trade, science funding, financial instability, and indeed the near-inevitable secession of Scotland. But in any case this poll doesn’t indicate that anything is succeeding in changing anything other than statistical noise.

Statistical illiteracy is as widespread amongst politicians as it is amongst journalists, but the fact that silly reports like this are commonplace doesn’t make them any less annoying. After all, the idea of sampling uncertainty isn’t all that difficult to understand. Is it?

And with so many more important things going on in the world that deserve better press coverage than they are getting, why does a “quality” newspaper waste its valuable column inches on this sort of twaddle?