Archive for statistics

Galaxies, Glow-worms and Chicken Eyes

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , , , on February 26, 2014 by telescoper

I just came across a news item based on a research article in Physical Review E by Jiao et al. with the abstract:

Optimal spatial sampling of light rigorously requires that identical photoreceptors be arranged in perfectly regular arrays in two dimensions. Examples of such perfect arrays in nature include the compound eyes of insects and the nearly crystalline photoreceptor patterns of some fish and reptiles. Birds are highly visual animals with five different cone photoreceptor subtypes, yet their photoreceptor patterns are not perfectly regular. By analyzing the chicken cone photoreceptor system consisting of five different cell types using a variety of sensitive microstructural descriptors, we find that the disordered photoreceptor patterns are “hyperuniform” (exhibiting vanishing infinite-wavelength density fluctuations), a property that had heretofore been identified in a unique subset of physical systems, but had never been observed in any living organism. Remarkably, the patterns of both the total population and the individual cell types are simultaneously hyperuniform. We term such patterns “multihyperuniform” because multiple distinct subsets of the overall point pattern are themselves hyperuniform. We have devised a unique multiscale cell packing model in two dimensions that suggests that photoreceptor types interact with both short- and long-ranged repulsive forces and that the resultant competition between the types gives rise to the aforementioned singular spatial features characterizing the system, including multihyperuniformity. These findings suggest that a disordered hyperuniform pattern may represent the most uniform sampling arrangement attainable in the avian system, given intrinsic packing constraints within the photoreceptor epithelium. In addition, they show how fundamental physical constraints can change the course of a biological optimization process. Our results suggest that multihyperuniform disordered structures have implications for the design of materials with novel physical properties and therefore may represent a fruitful area for future research.

The point made in the paper is that the photoreceptors found in the eyes of chickens possess a property called disordered hyperuniformity which means that the appear disordered on small scales but exhibit order over large distances. Here’s an illustration:


It’s an interesting paper, but I’d like to quibble about something it says in the accompanying news story. The caption with the above diagram states

Left: visual cell distribution in chickens; right: a computer-simulation model showing pretty much the exact same thing. The colored dots represent the centers of the chicken’s eye cells.

Well, as someone who has spent much of his research career trying to discern and quantify patterns in collections of points – in my case they tend to be galaxies rather than photoreceptors – I find it difficult to defend the use of the phrase “pretty much the exact same thing”. It’s notoriously difficult to look at realizations of stochastic point processes and decided whether they are statistically similar or not. For that you generally need quite sophisticated mathematical analysis.  In fact, to my eye, the two images above don’t look at all like “pretty much the exact same thing”. I’m not at all sure that the model works as well as it is claimed, as the statistical analysis presented in the paper is relatively simple: I’d need to see some more quantitative measures of pattern morphology and clustering, especially higher-order correlation functions, before I’m convinced.

Anyway, all this reminded me of a very old post of mine about the difficulty of discerning patterns in distributions of points. Take the two (not very well scanned)  images here as examples:


You will have to take my word for it that one of these is a realization of a two-dimensional Poisson point process (which is, in a well-defined sense completely “random”) and the other contains spatial correlations between the points. One therefore has a real pattern to it, and one is a realization of a completely unstructured random process.

I sometimes show this example in popular talks and get the audience to vote on which one is the random one. The vast majority usually think that the one on the right is the one that is random and the left one is the one with structure to it. It is not hard to see why. The right-hand pattern is very smooth (what one would naively expect for a constant probability of finding a point at any position in the two-dimensional space) , whereas the  left one seems to offer a profusion of linear, filamentary features and densely concentrated clusters.

In fact, it’s the left picture that was generated by a Poisson process using a Monte Carlo random number generator. All the structure that is visually apparent is imposed by our own sensory apparatus, which has evolved to be so good at discerning patterns that it finds them when they’re not even there!

The right process is also generated by a Monte Carlo technique, but the algorithm is more complicated. In this case the presence of a point at some location suppresses the probability of having other points in the vicinity. Each event has a zone of avoidance around it; the points are therefore anticorrelated. The result of this is that the pattern is much smoother than a truly random process should be. In fact, this simulation has nothing to do with galaxy clustering really. The algorithm used to generate it was meant to mimic the behaviour of glow-worms (a kind of beetle) which tend to eat each other if they get too close. That’s why they spread themselves out in space more uniformly than in the random pattern. In fact, the tendency displayed in this image of the points to spread themselves out more smoothly than a random distribution is in in some ways reminiscent of the chicken eye problem.

The moral of all this is that people are actually pretty hopeless at understanding what “really” random processes look like, probably because the word random is used so often in very imprecise ways and they don’t know what it means in a specific context like this. The point about random processes, even simpler ones like repeated tossing of a coin, is that coincidences happen much more frequently than one might suppose. By the same token, people are also pretty hopeless at figuring out whether two distributions of points resemble each other in some kind of statistical sense, because that can only be made precise if one defines some specific quantitative measure of clustering pattern, which is not easy to do.

Double Indemnity – Statistics Noir

Posted in Film with tags , , , , on February 20, 2014 by telescoper

The other day I decided to treat myself by watching a DVD of the  film  Double Indemnity. It’s a great movie for many reasons, not least because when it was released in 1944 it immediately established much of the language and iconography of the genre that has come to be known as film noir, which I’ve written about on a number of occasions on this blog; see here for example. Like many noir movies the plot revolves around the destructive relationship between a femme fatale and male anti-hero and, as usual for the genre, the narrative strategy involves use of flashbacks and a first-person voice-over. The photography is done in such a way as to surround the protagonists with dark, threatening shadows. In fact almost every interior in the film (including the one shown in the clip below) has Venetian blinds for this purpose. These chiaroscuro lighting effects charge even the most mundane encounters with psychological tension or erotic suspense.


To the left is an example still from Double Indemnity which shows a number of trademark features. The shadows cast by venetian blinds on the wall, the cigarette being smoked by Barbara Stanwyck and the curious construction of the mise en scene are all very characteristic of the style. What is even more wonderful about this particular shot however is the way the shadow of Fred McMurray’s character enters the scene before he does. The Barbara Stanwyck character is just about to shoot him with a pearl-handled revolver; this image suggests that he is already on his way to the underworld as he enters the room.

I won’t repeat any more of the things I’ve already said about this great movie, but I will say a couple of things that struck me watching it again at the weekend. The first is that even after having seen it dozens of times of the year I still found it intense and gripping. The other is that I think one of the contributing factors to its greatness which is not often discussed is a wonderful cameo by Edward G Robinson , who steals every scene he appears in as the insurance investigator Barton Keyes. Here’s an example, which I’ve chosen because it provides an interesting illustration of the the scientific use of statistical information, another theme I’ve visited frequently on this blog:

Statistical Challenges in 21st Century Cosmology

Posted in The Universe and Stuff with tags , , on December 2, 2013 by telescoper

I received the following email about a forthcoming conference which is probably of interest to a (statistically) significant number of readers of this blog so I thought I’d share it here with an encouragement to attend:


IAUS306 – Statistical Challenges in 21st Century Cosmology

We are pleased to announce the IAU Symposium 306 on Statistical Challenges in 21st Century Cosmology, which will take place in Lisbon, Portugal from 26-29 May 2014, with a tutorial day on 25 May.  Apologies if you receive this more than once.

Full exploitation of the very large surveys of the Cosmic Microwave Background, Large-Scale Structure, weak gravitational lensing and future 21cm surveys will require use of the best statistical techniques to answer the major cosmological questions of the 21st century, such as the nature of Dark Energy and gravity.

Thus it is timely to emphasise the importance of inference in cosmology, and to promote dialogue between astronomers and statisticians. This has been recognized by the creation of the IAU Working Group in Astrostatistics and Astroinformatics in 2012.

IAU Symposium 306 will be devoted to problems of inference in cosmology, from data processing to methods and model selection, and will have an important element of cross-disciplinary involvement from the statistics communities.

Keynote speakers

• Cosmic Microwave Background :: Graca Rocha (USA / Portugal)

• Weak Gravitational Lensing :: Masahiro Takada (Japan)

• Combining probes :: Anais Rassat (Switzerland)

• Statistics of Fields :: Sabino Matarrese (Italy)

• Large-scale structure :: Licia Verde (Spain)

• Bayesian methods :: David van Dyk (UK)

• 21cm cosmology :: Mario Santos (South Africa / Portugal)

• Massive parameter estimation :: Ben Wandelt (France)

• Overwhelmingly large datasets :: Alex Szalay (USA)

• Errors and nonparametric estimation :: Aurore Delaigle (Australia)

You are invited to submit an abstract for a contributed talk or poster for the meeting, via the meeting website. The deadline for abstract submission is 21st March 2014. Full information on the scientific rationale, programme, proceedings, critical dates, and local arrangements will be on the symposium web site here.


13 January 2014 – Grant requests

21 March 2014 – Abstract submission

4 April 2014 – Notification of abstract acceptance

11 April 2014 – Close of registration

30 June 2014 – Manuscript submission

Australia: Cyclones go up to Eleven!

Posted in Bad Statistics with tags , , , , , , , on October 14, 2013 by telescoper

I saw a story on the web this morning which points out that Australians can expect 11 cyclones this season.

It’s not a very good headline, because it’s a bit misleading about what the word “expected” means. In fact the number eleven is the average number of cyclones, which is not necessarily the number expected, despite the fact that “expected value” or “expectation value” . If you don’t understand this criticism, ask yourself how many legs you’d expect a randomly-chosen person to have. You’d probably settle on the answer “two”, but that is the most probable number, i.e. the mode, which in this case exceeds the average. If one person in a thousand has only one leg then a group of a thousand has 1999 legs between them, so the average (or arithmetic mean) is 1.999. Most people therefore have more than the average number of legs…

I’ve always found it quite annoying that physicists use the term “expectation value” to mean “average” because it implies that the average is the value you would expect. In the example given above you wouldn’t expect a person to have the average number of legs – if you assume that the actual number is an integer, it’s actually impossible to find a person with 1.999! In other words, the probability of finding someone in that group with the average number of legs in the group is exactly zero.

The same confusion happens when newspapers talk about the “average wage” which is considerably higher than the wage most people receive.

In any case the point is that there is undoubtedly a considerable uncertainty in the prediction of eleven cyclones per season, and one would like to have some idea how large an error bar is associated with that value.

Anyway, statistical pedantry notwithstanding, it is indeed impressive that the number of cyclones in a season goes all the way up to eleven..

Physics and Statistics

Posted in Bad Statistics, Education with tags , , , on August 16, 2013 by telescoper

Predictably, yesterday’s newspapers and other media  were full of feeble articles about the A-level results, and I don’t just mean the gratuitous pictures of pretty girls opening envelopes and/or jumping in the air.  I’ve never met a journalist who understood the concept of statistical significance, which seems to account for the way they feel able to write whatever they like about any numbers that happen to be newsworthy without feeling constrained by mathematical common-sense.  Sometimes it’s the ridiculous over-interpretation of opinion polls (which usually have a sampling uncertainty of ±3 %), sometimes its league tables. This time it’s the number of students getting the top grades at A-level.

The BBC, for example, made a lot of fuss about the fall in the % of A and A* A-level grades, to  26.3% this year from 26.6% last year. Anyone with a modicum of statistical knowledge would know, however, that whether this drop means anything at all depends on how many results were involved: the sampling uncertainty depends on size N approximately as √N. For a cohort of 300000 this turns into a percentage uncertainty of about 0.57%, which is about twice as large as the reported fall.  The result is therefore “in the noise” – in the sense that there’s no evidence that it was actually harder to get a high grade this year compared with last year – but that didn’t prove a barrier to those editors intent on filling their newspapers and websites with meaningless guff.

Almost hidden among the bilge was an interesting snippet about Physics. It seems that the number of students taking Physics A-level this year has exceeded 35,000 in 2013.  That was set as a government target for 2014, so it has been reached a year early.  The difference between the number that took Physics this year (35,569) and those who took it in 2006 (27,368) is certainly significant. Whether this is the so-called Brian Cox effect or something else, it’s very good news for the future health of the subject.

On the other hand, the proportion of female Physics students remains around 20%. Over the last three years the proportion has been 20.8%, 21.3% and 20.6% so numerically this year is down on last year, but the real message in these figures is that despite strenuous efforts to increase this fraction, there is no significant change.

As I write I’m formally still on Clearing business, sitting beside the telephone in case anyone needs to talk to me. However, at close of play yesterday the School of Mathematical and Physical Sciences had exceeded its recruitment target by quite a healthy margin.  We’re still open for Clearing, though, as our recent expansion means we can take a few more suitably qualified students. Physics and Astronomy did particularly well, and we’re set to welcome our biggest-ever intake into the first year in September 2013. I’m really looking forward to meeting them all.

While I’m on about statistics, here’s another thing. When I was poring over this year’s NSS results, I noticed that only 39 Physics departments appeared in the survey. When I last counted them there were 115 universities in the UK. This number doesn’t include about 50 colleges and other forms of higher education institutions which are also sometimes included in lists of universities. Anyway, my point is that at most about a third of British universities have a physics department.

Now that is a shocking statistic…

(Lack of) Diversity in STEM Subjects

Posted in Science Politics with tags , , , , , , on May 10, 2013 by telescoper

Among the things I learnt over the last few days was some interesting information about the diversity (or, rather, lack of diversity) of undergraduates taking undergraduate degrees in STEM subjects in the UK universities. For those of you not up on the lingo, `STEM’ is short for Science, Technology, Engineering and Mathematics. Last year the Institute of Physics produced a report that contains a wealth of statistical information about the demographics of the undergraduate population, from which the following numbers are only a small component.

























For completeness I should point out that these numbers refer to first-year undergraduates in 2010-11; I have no particular reason to suppose there has been a qualitative change since then. “BME” stands for “Black and Minority Ethnic”, and “Socio-Economic” refers to students whose with parents not employed in managerial or professional positions.

Overall, the figures here at the University of Sussex are roughly in line with, but slightly better than, these national statistics; the proportion of female students in our Physics intake for 2010/11, for example, was 27%.

There are some interesting (and rather disappointing) things to remark. First is that the proportion of Physics students who are female remains low; Physics scores very badly on ethnic diversity too. Mathematics on the other hand seems a much more attractive subject for female students.  Notice also how Physics and Chemistry attract a very small proportion of overseas students compared to Engineering.

In summary, therefore, we can see that Physics is a subject largely studied by white  middle-class European males. What are we doing wrong?

Despite considerable efforts to promote Physics to a more diverse constituency,  the proportion of, e.g., female physics students seems to have been bumping along at around 20% for ages.  Interestingly, all the anecdotal evidence suggests that those women who do Physics at University do disproportionately well, in the sense that female students constitute a  much larger fraction of First-class graduates than 20%. This strongly suggests that the problem lies at school level; some additional IOP information and discussion on this can be found here.

I’m just passing these figures on for information, as I’m quite often asked about them during, e.g., admissions-related activities. I don’t have any really compelling suggestions, but I would like to invite the blogosphere to comment and/or make suggestions as to promote diversity in STEM disciplines.

Never mind the table, look at the sample size!

Posted in Bad Statistics with tags , , , on April 29, 2013 by telescoper

This morning I was just thinking that it’s been a while since I’ve filed anything in the category marked bad statistics when I glanced at today’s copy of the Times Higher and found something that’s given me an excuse to rectify my lapse. Last week saw the publication of said organ’s new Student Experience Survey which ranks  British Universities in order of the responses given by students to questions about various aspects of the teaching, social life and so  on. I had a go at this table a few years ago, but they still keep trotting it out. Here are the main results, sorted in decreasing order:

University Score Resp.
1 University of East Anglia 84.8 119
2 University of Oxford 84.2 259
3 University of Sheffield 83.9 192
3 University of Cambridge 83.9 245
5 Loughborough University 82.8 102
6 University of Bath 82.7 159
7 University of Leeds 82.5 219
8 University of Dundee 82.4 103
9 York St John University 81.2 88
10 Lancaster University 81.1 100
11 University of Southampton 80.9 191
11 University of Birmingham 80.9 198
11 University of Nottingham 80.9 270
14 Cardiff University 80.8 113
14 Newcastle University 80.8 125
16 Durham University 80.3 188
17 University of Warwick 80.2 205
18 University of St Andrews 79.8 109
18 University of Glasgow 79.8 131
20 Queen’s University Belfast 79.2 101
21 University of Hull 79.1 106
22 University of Winchester 79 106
23 Northumbria University 78.9 100
23 University of Lincoln 78.9 103
23 University of Strathclyde 78.9 107
26 University of Surrey 78.8 102
26 University of Leicester 78.8 105
26 University of Exeter 78.8 130
29 University of Chester 78.7 102
30 Heriot-Watt University 78.6 101
31 Keele University 78.5 102
32 University of Kent 78.4 110
33 University of Reading 78.1 101
33 Bangor University 78.1 101
35 University of Huddersfield 78 104
36 University of Central Lancashire 77.9 121
37 Queen Mary, University of London 77.8 103
37 University of York 77.8 106
39 University of Edinburgh 77.7 170
40 University of Manchester 77.4 252
41 Imperial College London 77.3 148
42 Swansea University 77.1 103
43 Sheffield Hallam University 77 102
43 Teesside University 77 103
45 Brunel University 76.6 110
46 University of Portsmouth 76.4 107
47 University of Gloucestershire 76.3 53
47 Robert Gordon University 76.3 103
47 Aberystwyth University 76.3 104
50 University of Essex 76 103
50 University of Glamorgan 76 108
50 Plymouth University 76 112
53 University of Sunderland 75.9 100
54 Canterbury Christ Church University 75.8 102
55 De Montfort University 75.7 103
56 University of Bradford 75.5 52
56 University of Sussex 75.5 102
58 Nottingham Trent University 75.4 103
59 University of Roehampton 75.1 102
60 University of Ulster 75 101
60 Staffordshire University 75 102
62 Royal Veterinary College 74.8 50
62 Liverpool John Moores University 74.8 102
64 University of Bristol 74.7 137
65 University of Worcester 74.4 101
66 University of Derby 74.2 101
67 University College London 74.1 102
68 University of Aberdeen 73.9 105
69 University of the West of England 73.8 101
69 Coventry University 73.8 102
71 University of Hertfordshire 73.7 105
72 London School of Economics 73.5 51
73 Royal Holloway, University of London 73.4 104
74 University of Stirling 73.3 54
75 King’s College London 73.2 105
76 Bournemouth University 73.1 103
77 Southampton Solent University 72.7 102
78 Goldsmiths, University of London 72.5 52
78 Leeds Metropolitan University 72.5 106
80 Manchester Metropolitan University 72.2 104
81 University of Liverpool 72 104
82 Birmingham City University 71.8 101
83 Anglia Ruskin University 71.7 102
84 Glasgow Caledonian University 71.1 100
84 Kingston University 71.1 102
86 Aston University 71 52
86 University of Brighton 71 106
88 University of Wolverhampton 70.9 103
89 Oxford Brookes University 70.5 106
90 University of Salford 70.2 102
91 University of Cumbria 69.2 51
92 Napier University 68.8 101
93 University of Greenwich 68.5 102
94 University of Westminster 68.1 101
95 University of Bedfordshire 67.9 100
96 University of the Arts London 66 54
97 City University London 65.4 102
97 London Metropolitan University 65.4 103
97 The University of the West of Scotland 65.4 103
100 Middlesex University 65.1 104
101 University of East London 61.7 51
102 London South Bank University 61.2 50
Average scores 75.5 11459
YouthSight is the source of the data that have been used to compile the table of results for the Times Higher Education Student Experience Survey, and it retains the ownership of those data. Each higher education institution’s score has been indexed to give a percentage of the maximum score attainable. For each of the 21 attributes, students were given a seven-point scale and asked how strongly they agreed or disagreed with a number of statements based on their university experience.

My current employer, the University of Sussex, comes out right on the average (75.5)  and is consequently in the middle in this league table. However, let’s look at this in a bit more detail.  The number of students whose responses produced the score of 75.5 was just 102. That’s by no means the smallest sample in the survey, either. The University of Sussex has over 13,000 students. The score in this table is therefore obtained from less than 1% of the relevant student population. How representative can the results be, given that the sample is so incredibly small?

What is conspicuous by its absence from this table is any measure of the “margin-of-error” of the estimated score. What I mean by this is how much the sample score would change for Sussex if a different set of 102 students were involved. Unless every Sussex student scores exactly 75.5 then the score will vary from sample to sample. The smaller the sample, the larger the resulting uncertainty.

Given a survey of this type it should be quite straightforward to calculate the spread of scores from student to student within a sample from a given University in terms of the standard deviation, σ, as well as the mean score. Unfortunately, this survey does not include this information. However, lets suppose for the sake of argument that the standard deviation for Cardiff is quite small, say 10% of the mean value, i.e. 7.55. I imagine that it’s much larger than that, in fact, but this is just meant to be by way of an illustration.

If you have a sample size of  N then the standard error of the mean is going to be roughly (σ⁄√N) which, for Sussex, is about 0.75. Assuming everything has a normal distribution, this would mean that the “true” score for the full population of Sussex students has a 95% chance of being within two standard errors of the mean, i.e. between 74 and 77. This means Sussex could really be as high as 43rd place or as low as 67th, and that’s making very conservative assumptions about how much one student differs from another within each institution.

That example is just for illustration, and the figures may well be wrong, but my main gripe is that I don’t understand how these guys can get away with publishing results like this without listing the margin of error at all. Perhaps its because that would make it obvious how unreliable the rankings are? Whatever the reason we’d never get away with publishing results without errors in a serious scientific journal.

This sampling uncertainty almost certainly accounts for the big changes from year to year in these tables. For instance, the University of Lincoln is 23rd in this year’s table, but last year was way down in 66th place. Has something dramatic happened there to account for this meteoric rise? I doubt it. It’s more likely to be just a sampling fluctuation.

In fact I seriously doubt whether any of the scores in this table is significantly different from the mean score; the range from top to bottom is only 61 to 85 showing a considerable uniformity across all 102 institutions listed. What a statistically literate person should take from this table is that (a) it’s a complete waste of time and (b) wherever you go to University you’ll probably have a good experience!


Get every new post delivered to your Inbox.

Join 4,142 other followers