Archive for statistics

The Dark Matter of Astronomy Hype

Posted in Astrohype, Bad Statistics, The Universe and Stuff with tags , , , , on April 16, 2018 by telescoper

Just before Easter (and, perhaps more significantly, just before April Fool’s Day) a paper by van Dokkum et al. was published in Nature with the title A Galaxy Lacking Dark Matter. As is often the case with scientific publications presented in Nature, the press machine kicked into action and stories about this mysterious galaxy appeared in print and online all round the world.

So what was the result? Here’s the abstract of the Nature paper:

 

Studies of galaxy surveys in the context of the cold dark matter paradigm have shown that the mass of the dark matter halo and the total stellar mass are coupled through a function that varies smoothly with mass. Their average ratio Mhalo/Mstars has a minimum of about 30 for galaxies with stellar masses near that of the Milky Way (approximately 5 × 1010 solar masses) and increases both towards lower masses and towards higher masses. The scatter in this relation is not well known; it is generally thought to be less than a factor of two for massive galaxies but much larger for dwarf galaxies. Here we report the radial velocities of ten luminous globular-cluster-like objects in the ultra-diffuse galaxy NGC1052–DF2, which has a stellar mass of approximately 2 × 108 solar masses. We infer that its velocity dispersion is less than 10.5 kilometres per second with 90 per cent confidence, and we determine from this that its total mass within a radius of 7.6 kiloparsecs is less than 3.4 × 108 solar masses. This implies that the ratio Mhalo/Mstars is of order unity (and consistent with zero), a factor of at least 400 lower than expected. NGC1052–DF2 demonstrates that dark matter is not always coupled with baryonic matter on galactic scales.

 

I had a quick look at the paper at the time and wasn’t very impressed by the quality of the data. To see why look at the main plot, a histogram formed from just ten observations (of globular clusters used as velocity tracers):

I didn’t have time to read the paper thoroughly before the Easter weekend,  but did draft a sceptical blog on the paper only to decide not to publish it as I thought it might be too inflammatory even by my standards! Suffice to say that I was unconvinced.

Anyway, it turns out I was far from the only astrophysicist to have doubts about this result; you can find a nice summary of the discussion on social media here and here. Fortunately, people more expert than me have found the time to look in more detail at the Dokkum et al. claim. There’s now a paper on the arXiv by Martin et al.

It was recently proposed that the globular cluster system of the very low surface-brightness galaxy NGC1052-DF2 is dynamically very cold, leading to the conclusion that this dwarf galaxy has little or no dark matter. Here, we show that a robust statistical measure of the velocity dispersion of the tracer globular clusters implies a mundane velocity dispersion and a poorly constrained mass-to-light ratio. Models that include the possibility that some of the tracers are field contaminants do not yield a more constraining inference. We derive only a weak constraint on the mass-to-light ratio of the system within the half-light radius or within the radius of the furthest tracer (M/L_V<8.1 at the 90-percent confidence level). Typical mass-to-light ratios measured for dwarf galaxies of the same stellar mass as NGC1052-DF2 are well within this limit. With this study, we emphasize the need to properly account for measurement uncertainties and to stay as close as possible to the data when determining dynamical masses from very small data sets of tracers.

More information about this system has been posted by Pieter van Dokkum on his website here.

Whatever turns out in the final analysis of NGC1052-DF2 it is undoubtedly an interesting system. It may indeed turn out to  have less dark matter than expected though I don’t think the evidence available right now warrants such an inference with such confidence. What worries me most however, is the way this result was presented in the media, with virtually no regard for the manifest statistical uncertainty inherent in the analysis. This kind of hype can be extremely damaging to science in general, and to explain why I’ll go off on a rant that I’ve indulged in a few times before on this blog.

A few years ago there was an interesting paper  (in Nature of all places), the opening paragraph of which reads:

The past few years have seen a slew of announcements of major discoveries in particle astrophysics and cosmology. The list includes faster-than-light neutrinos; dark-matter particles producing γ-rays; X-rays scattering off nuclei underground; and even evidence in the cosmic microwave background for gravitational waves caused by the rapid inflation of the early Universe. Most of these turned out to be false alarms; and in my view, that is the probable fate of the rest.

The piece went on to berate physicists for being too trigger-happy in claiming discoveries, the BICEP2 fiasco being a prime example. I agree that this is a problem, but it goes far beyond physics. In fact its endemic throughout science. A major cause of it is abuse of statistical reasoning.

Anyway, I thought I’d take the opportunity to re-iterate why I statistics and statistical reasoning are so important to science. In fact, I think they lie at the very core of the scientific method, although I am still surprised how few practising scientists are comfortable with even basic statistical language. A more important problem is the popular impression that science is about facts and absolute truths. It isn’t. It’s a <em>process</em>. In order to advance it has to question itself. Getting this message wrong – whether by error or on purpose -is immensely dangerous.

Statistical reasoning also applies to many facets of everyday life, including business, commerce, transport, the media, and politics. Probability even plays a role in personal relationships, though mostly at a subconscious level. It is a feature of everyday life that science and technology are deeply embedded in every aspect of what we do each day. Science has given us greater levels of comfort, better health care, and a plethora of labour-saving devices. It has also given us unprecedented ability to destroy the environment and each other, whether through accident or design.

Civilized societies face rigorous challenges in this century. We must confront the threat of climate change and forthcoming energy crises. We must find better ways of resolving conflicts peacefully lest nuclear or chemical or even conventional weapons lead us to global catastrophe. We must stop large-scale pollution or systematic destruction of the biosphere that nurtures us. And we must do all of these things without abandoning the many positive things that science has brought us. Abandoning science and rationality by retreating into religious or political fundamentalism would be a catastrophe for humanity.

Unfortunately, recent decades have seen a wholesale breakdown of trust between scientists and the public at large. This is due partly to the deliberate abuse of science for immoral purposes, and partly to the sheer carelessness with which various agencies have exploited scientific discoveries without proper evaluation of the risks involved. The abuse of statistical arguments have undoubtedly contributed to the suspicion with which many individuals view science.

There is an increasing alienation between scientists and the general public. Many fewer students enrol for courses in physics and chemistry than a a few decades ago. Fewer graduates mean fewer qualified science teachers in schools. This is a vicious cycle that threatens our future. It must be broken.

The danger is that the decreasing level of understanding of science in society means that knowledge (as well as its consequent power) becomes concentrated in the minds of a few individuals. This could have dire consequences for the future of our democracy. Even as things stand now, very few Members of Parliament are scientifically literate. How can we expect to control the application of science when the necessary understanding rests with an unelected “priesthood” that is hardly understood by, or represented in, our democratic institutions?

Very few journalists or television producers know enough about science to report sensibly on the latest discoveries or controversies. As a result, important matters that the public needs to know about do not appear at all in the media, or if they do it is in such a garbled fashion that they do more harm than good.

Years ago I used to listen to radio interviews with scientists on the Today programme on BBC Radio 4. I even did such an interview once. It is a deeply frustrating experience. The scientist usually starts by explaining what the discovery is about in the way a scientist should, with careful statements of what is assumed, how the data is interpreted, and what other possible interpretations might be and the likely sources of error. The interviewer then loses patience and asks for a yes or no answer. The scientist tries to continue, but is badgered. Either the interview ends as a row, or the scientist ends up stating a grossly oversimplified version of the story.

Some scientists offer the oversimplified version at the outset, of course, and these are the ones that contribute to the image of scientists as priests. Such individuals often believe in their theories in exactly the same way that some people believe religiously. Not with the conditional and possibly temporary belief that characterizes the scientific method, but with the unquestioning fervour of an unthinking zealot. This approach may pay off for the individual in the short term, in popular esteem and media recognition – but when it goes wrong it is science as a whole that suffers. When a result that has been proclaimed certain is later shown to be false, the result is widespread disillusionment.

The worst example of this tendency that I can think of is the constant use of the phrase “Mind of God” by theoretical physicists to describe fundamental theories. This is not only meaningless but also damaging. As scientists we should know better than to use it. Our theories do not represent absolute truths: they are just the best we can do with the available data and the limited powers of the human mind. We believe in our theories, but only to the extent that we need to accept working hypotheses in order to make progress. Our approach is pragmatic rather than idealistic. We should be humble and avoid making extravagant claims that can’t be justified either theoretically or experimentally.

The more that people get used to the image of “scientist as priest” the more dissatisfied they are with real science. Most of the questions asked of scientists simply can’t be answered with “yes” or “no”. This leaves many with the impression that science is very vague and subjective. The public also tend to lose faith in science when it is unable to come up with quick answers. Science is a process, a way of looking at problems not a list of ready-made answers to impossible problems. Of course it is sometimes vague, but I think it is vague in a rational way and that’s what makes it worthwhile. It is also the reason why science has led to so many objectively measurable advances in our understanding of the World.

I don’t have any easy answers to the question of how to cure this malaise, but do have a few suggestions. It would be easy for a scientist such as myself to blame everything on the media and the education system, but in fact I think the responsibility lies mainly with ourselves. We are usually so obsessed with our own research, and the need to publish specialist papers by the lorry-load in order to advance our own careers that we usually spend very little time explaining what we do to the public or why.

I think every working scientist in the country should be required to spend at least 10% of their time working in schools or with the general media on “outreach”, including writing blogs like this. People in my field – astronomers and cosmologists – do this quite a lot, but these are areas where the public has some empathy with what we do. If only biologists, chemists, nuclear physicists and the rest were viewed in such a friendly light. Doing this sort of thing is not easy, especially when it comes to saying something on the radio that the interviewer does not want to hear. Media training for scientists has been a welcome recent innovation for some branches of science, but most of my colleagues have never had any help at all in this direction.

The second thing that must be done is to improve the dire state of science education in schools. Over the last two decades the national curriculum for British schools has been dumbed down to the point of absurdity. Pupils that leave school at 18 having taken “Advanced Level” physics do so with no useful knowledge of physics at all, even if they have obtained the highest grade. I do not at all blame the students for this; they can only do what they are asked to do. It’s all the fault of the educationalists, who have done the best they can for a long time to convince our young people that science is too hard for them. Science can be difficult, of course, and not everyone will be able to make a career out of it. But that doesn’t mean that it should not be taught properly to those that can take it in. If some students find it is not for them, then so be it. We don’t everyone to be a scientist, but we do need many more people to understand how science really works.

I realise I must sound very gloomy about this, but I do think there are good prospects that the gap between science and society may gradually be healed. The fact that the public distrust scientists leads many of them to question us, which is a very good thing. They should question us and we should be prepared to answer them. If they ask us why, we should be prepared to give reasons. If enough scientists engage in this process then what will emerge is and understanding of the enduring value of science. I don’t just mean through the DVD players and computer games science has given us, but through its cultural impact. It is part of human nature to question our place in the Universe, so science is part of what we are. It gives us purpose. But it also shows us a way of living our lives. Except for a few individuals, the scientific community is tolerant, open, internationally-minded, and imbued with a philosophy of cooperation. It values reason and looks to the future rather than the past. Like anyone else, scientists will always make mistakes, but we can always learn from them. The logic of science may not be infallible, but it’s probably the best logic there is in a world so filled with uncertainty.

 

 

 

Advertisements

Isotropic Random Fields in Astrophysics

Posted in The Universe and Stuff with tags , , on June 29, 2017 by telescoper

So the little workshop  on `Isotropic Random Fields in Astrophysics’ I announced some time ago, sponsored via a “seedcorn” grant by the Data Innovation Research Institute, has finally arrived, and having spent most of the day at it I’m now catching up with some other stuff in the office before adjourning for the conference dinner.

 

This meeting is part of a series of activities aimed at bringing together world-leading experts in the analysis of big astrophysical data sets, specifically those arising from the (previous) Planck (shown above) and (future) Euclid space missions, with mathematical experts in the spectral theory of scalar vector or tensor valued isotropic random fields. Our aim is to promote collaboration between mathematicians interested in probability theory and statistical analysis and theoretical and observational astrophysicists both within Cardiff university and further afield.

 

It’s been a very interesting day of interleaving talks by cosmologists and mathematicians followed by an open-ended discussion session where we talked about unsolved problems and lines for future research. It’s clear that there are some language difficulties between the two communities but I hope this meeting helps to break down a few barriers and stimulate some new joint research projects.

 

 

 

Isotropic Random Fields in Astrophysics – Workshop Announcement!

Posted in The Universe and Stuff with tags , , on May 8, 2017 by telescoper

We have a little workshop coming up in Cardiff at the end of June, sponsored via a “seedcorn” grant by the Data Innovation Research Institute.

This meeting is part of a series of activities aimed at bringing together world-leading experts in the analysis of big astrophysical data sets, specifically those arising from the (previous) Planck and (future) Euclid space missions, with mathematical experts in the spectral theory of scalar vector or tensor valued isotropic random fields. Our aim is to promote collaboration between mathematicians interested in probability theory and statistical analysis and theoretical and observational astrophysicists both within Cardiff university and further afield.

The workshop page can be found here. We have a great list of invited speakers from as far afield as Japan and California (as well as some from much closer to home) and we’re also open for contributed talks. We’ll be publishing the full programme of titles and abstracts soon. Registration is free of charge, but you do need to register so we can be sure we have enough space, enough coffee and enough lunch! That goes whether you want to give a contributed talk, or just come along and listen!

It’s only a short (two-day) meeting and are aiming for an informal atmosphere with plenty of time for discussions, with roughly a 50-50 blend of astrophysicists and mathematicians and to achieve that aim we’d particularly welcome a few more contributed talks from the mathematical side of the house, but we still have spaces for more astrophysics talks too! We’d also welcome more contributions from early career researchers, especially PhD students.

Please feel free to pass this around your colleagues.

 

The Neyman-Scott ‘Paradox’

Posted in Bad Statistics, Cute Problems with tags , , , , on November 25, 2016 by telescoper

I just came across this interesting little problem recently and thought I’d share it here. It’s usually called the ‘Neyman-Scott’ paradox. Before going on it’s worth mentioning that Elizabeth Scott (the second half of Neyman-Scott) was an astronomer by background. Her co-author was Jerzy Neyman. As has been the case for many astronomers, she contributed greatly to the development of the field of statistics. Anyway, I think this example provides another good illustration of the superiority of Bayesian methods for estimating parameters, but I’ll let you make your own mind up about what’s going on.

The problem is fairly technical so I’ve done done a quick version in latex that you can download

here, but I’ve also copied into this post so you can read it below:

 

neyman-scott1

neyman-scott2

I look forward to receiving Frequentist Flak or Bayesian Benevolence through the comments box below!

What does “Big Data” mean to you?

Posted in The Universe and Stuff with tags , , , , on April 7, 2016 by telescoper

On several occasions recently I’ve had to talk about Big Data for one reason or another. I’m always at a disadvantage when I do that because I really dislike the term.Clearly I’m not the only one who feels this way:

say-big-data-one-more-time

For one thing the term “Big Data” seems to me like describing the Ocean as “Big Water”. For another it’s not really just the how big the data set is that matters. Size isn’t everything, after all. There is much truth in Stalin’s comment that “Quantity has a quality all its own” in that very large data sets allow you to do things you wouldn’t even try with smaller ones, but it can be complexity rather than sheer size that also requires new methods of analysis.

Planck_CMB_large

The biggest event in my own field of cosmology in the last few years has been the Planck mission. The data set is indeed huge: the above map of the temperature pattern in the cosmic microwave background has no fewer than 167 million pixels. That certainly caused some headaches in the analysis pipeline, but I think I would argue that this wasn’t really a Big Data project. I don’t mean that to be insulting to anyone, just that the main analysis of the Planck data was aimed at doing something very similar to what had been done (by WMAP), i.e. extracting the power spectrum of temperature fluctuations:

Planck_power_spectrum_origIt’s a wonderful result of course that extends the measurements that WMAP made up to much higher frequencies, but Planck’s goals were phrased in similar terms to those of WMAP – to pin down the parameters of the standard model to as high accuracy as possible. For me, a real “Big Data” approach to cosmic microwave background studies would involve doing something that couldn’t have been done at all with a smaller data set. An example that springs to mind is looking for indications of effects beyond the standard model.

Moreover what passes for Big Data in some fields would be just called “data” in others. For example, the Atlas Detector on the  Large Hadron Collider  represents about 150 million sensors delivering data 40 million times per second. There are about 600 million collisions per second, out of which perhaps one hundred per second are useful. The issue here is then one of dealing with an enormous rate of data in such a way as to be able to discard most of it very quickly. The same will be true of the Square Kilometre Array which will acquire exabytes of data every day out of which perhaps one petabyte will need to be stored. Both these projects involve data sets much bigger and more difficult to handle that what might pass for Big Data in other arenas.

Books you can buy at airports about Big Data generally list the following four or five characteristics:

  1. Volume
  2. Velocity
  3. Variety
  4. Veracity
  5. Variability

The first two are about the size and acquisition rate of the data mentioned above but the others are more about qualitatively different matters. For example, in cosmology nowadays we have to deal with data sets which are indeed quite large, but also very different in form.  We need to be able to do efficient joint analyses of heterogeneous data structures with very different sampling properties and systematic errors in such a way that we get the best science results we can. Now that’s a Big Data challenge!

 

The Insignificance of ORB

Posted in Bad Statistics with tags , , , on April 5, 2016 by telescoper

A piece about opinion polls ahead of the EU Referendum which appeared in today’s Daily Torygraph has spurred me on to make a quick contribution to my bad statistics folder.

The piece concerned includes the following statement:

David Cameron’s campaign to warn voters about the dangers of leaving the European Union is beginning to win the argument ahead of the referendum, a new Telegraph poll has found.

The exclusive poll found that the “Remain” campaign now has a narrow lead after trailing last month, in a sign that Downing Street’s tactic – which has been described as “Project Fear” by its critics – is working.

The piece goes on to explain

The poll finds that 51 per cent of voters now support Remain – an increase of 4 per cent from last month. Leave’s support has decreased five points to 44 per cent.

This conclusion is based on the results of a survey by ORB in which the number of participants was 800. Yes, eight hundred.

How much can we trust this result on statistical grounds?

Suppose the fraction of the population having the intention to vote in a particular way in the EU referendum is p. For a sample of size n with x respondents indicating that they hen one can straightforwardly estimate p \simeq x/n. So far so good, as long as there is no bias induced by the form of the question asked nor in the selection of the sample which, given the fact that such polls have been all over the place seems rather unlikely.

A little bit of mathematics involving the binomial distribution yields an answer for the uncertainty in this estimate of p in terms of the sampling error:

\sigma = \sqrt{\frac{p(1-p)}{n}}

For the sample size of 800 given, and an actual value p \simeq 0.5 this amounts to a standard error of about 2%. About 95% of samples drawn from a population in which the true fraction is p will yield an estimate within p \pm 2\sigma, i.e. within about 4% of the true figure. In other words the typical variation between two samples drawn from the same underlying population is about 4%. In other other words, the change reported between the two ORB polls mentioned above can be entirely explained by sampling variation and does not at all imply any systematic change of public opinion between the two surveys.

I need hardly point out that in a two-horse race (between “Remain” and “Leave”) an increase of 4% in the Remain vote corresponds to a decrease in the Leave vote by the same 4% so a 50-50 population vote can easily generate a margin as large as  54-46 in such a small sample.

Why do pollsters bother with such tiny samples? With such a large margin error they are basically meaningless.

I object to the characterization of the Remain campaign as “Project Fear” in any case. I think it’s entirely sensible to point out the serious risks that an exit from the European Union would generate for the UK in loss of trade, science funding, financial instability, and indeed the near-inevitable secession of Scotland. But in any case this poll doesn’t indicate that anything is succeeding in changing anything other than statistical noise.

Statistical illiteracy is as widespread amongst politicians as it is amongst journalists, but the fact that silly reports like this are commonplace doesn’t make them any less annoying. After all, the idea of sampling uncertainty isn’t all that difficult to understand. Is it?

And with so many more important things going on in the world that deserve better press coverage than they are getting, why does a “quality” newspaper waste its valuable column inches on this sort of twaddle?