Archive for the Bad Statistics Category

The Dark Matter of Astronomy Hype

Posted in Astrohype, Bad Statistics, The Universe and Stuff with tags , , , , on April 16, 2018 by telescoper

Just before Easter (and, perhaps more significantly, just before April Fool’s Day) a paper by van Dokkum et al. was published in Nature with the title A Galaxy Lacking Dark Matter. As is often the case with scientific publications presented in Nature, the press machine kicked into action and stories about this mysterious galaxy appeared in print and online all round the world.

So what was the result? Here’s the abstract of the Nature paper:


Studies of galaxy surveys in the context of the cold dark matter paradigm have shown that the mass of the dark matter halo and the total stellar mass are coupled through a function that varies smoothly with mass. Their average ratio Mhalo/Mstars has a minimum of about 30 for galaxies with stellar masses near that of the Milky Way (approximately 5 × 1010 solar masses) and increases both towards lower masses and towards higher masses. The scatter in this relation is not well known; it is generally thought to be less than a factor of two for massive galaxies but much larger for dwarf galaxies. Here we report the radial velocities of ten luminous globular-cluster-like objects in the ultra-diffuse galaxy NGC1052–DF2, which has a stellar mass of approximately 2 × 108 solar masses. We infer that its velocity dispersion is less than 10.5 kilometres per second with 90 per cent confidence, and we determine from this that its total mass within a radius of 7.6 kiloparsecs is less than 3.4 × 108 solar masses. This implies that the ratio Mhalo/Mstars is of order unity (and consistent with zero), a factor of at least 400 lower than expected. NGC1052–DF2 demonstrates that dark matter is not always coupled with baryonic matter on galactic scales.


I had a quick look at the paper at the time and wasn’t very impressed by the quality of the data. To see why look at the main plot, a histogram formed from just ten observations (of globular clusters used as velocity tracers):

I didn’t have time to read the paper thoroughly before the Easter weekend,  but did draft a sceptical blog on the paper only to decide not to publish it as I thought it might be too inflammatory even by my standards! Suffice to say that I was unconvinced.

Anyway, it turns out I was far from the only astrophysicist to have doubts about this result; you can find a nice summary of the discussion on social media here and here. Fortunately, people more expert than me have found the time to look in more detail at the Dokkum et al. claim. There’s now a paper on the arXiv by Martin et al.

It was recently proposed that the globular cluster system of the very low surface-brightness galaxy NGC1052-DF2 is dynamically very cold, leading to the conclusion that this dwarf galaxy has little or no dark matter. Here, we show that a robust statistical measure of the velocity dispersion of the tracer globular clusters implies a mundane velocity dispersion and a poorly constrained mass-to-light ratio. Models that include the possibility that some of the tracers are field contaminants do not yield a more constraining inference. We derive only a weak constraint on the mass-to-light ratio of the system within the half-light radius or within the radius of the furthest tracer (M/L_V<8.1 at the 90-percent confidence level). Typical mass-to-light ratios measured for dwarf galaxies of the same stellar mass as NGC1052-DF2 are well within this limit. With this study, we emphasize the need to properly account for measurement uncertainties and to stay as close as possible to the data when determining dynamical masses from very small data sets of tracers.

More information about this system has been posted by Pieter van Dokkum on his website here.

Whatever turns out in the final analysis of NGC1052-DF2 it is undoubtedly an interesting system. It may indeed turn out to  have less dark matter than expected though I don’t think the evidence available right now warrants such an inference with such confidence. What worries me most however, is the way this result was presented in the media, with virtually no regard for the manifest statistical uncertainty inherent in the analysis. This kind of hype can be extremely damaging to science in general, and to explain why I’ll go off on a rant that I’ve indulged in a few times before on this blog.

A few years ago there was an interesting paper  (in Nature of all places), the opening paragraph of which reads:

The past few years have seen a slew of announcements of major discoveries in particle astrophysics and cosmology. The list includes faster-than-light neutrinos; dark-matter particles producing γ-rays; X-rays scattering off nuclei underground; and even evidence in the cosmic microwave background for gravitational waves caused by the rapid inflation of the early Universe. Most of these turned out to be false alarms; and in my view, that is the probable fate of the rest.

The piece went on to berate physicists for being too trigger-happy in claiming discoveries, the BICEP2 fiasco being a prime example. I agree that this is a problem, but it goes far beyond physics. In fact its endemic throughout science. A major cause of it is abuse of statistical reasoning.

Anyway, I thought I’d take the opportunity to re-iterate why I statistics and statistical reasoning are so important to science. In fact, I think they lie at the very core of the scientific method, although I am still surprised how few practising scientists are comfortable with even basic statistical language. A more important problem is the popular impression that science is about facts and absolute truths. It isn’t. It’s a <em>process</em>. In order to advance it has to question itself. Getting this message wrong – whether by error or on purpose -is immensely dangerous.

Statistical reasoning also applies to many facets of everyday life, including business, commerce, transport, the media, and politics. Probability even plays a role in personal relationships, though mostly at a subconscious level. It is a feature of everyday life that science and technology are deeply embedded in every aspect of what we do each day. Science has given us greater levels of comfort, better health care, and a plethora of labour-saving devices. It has also given us unprecedented ability to destroy the environment and each other, whether through accident or design.

Civilized societies face rigorous challenges in this century. We must confront the threat of climate change and forthcoming energy crises. We must find better ways of resolving conflicts peacefully lest nuclear or chemical or even conventional weapons lead us to global catastrophe. We must stop large-scale pollution or systematic destruction of the biosphere that nurtures us. And we must do all of these things without abandoning the many positive things that science has brought us. Abandoning science and rationality by retreating into religious or political fundamentalism would be a catastrophe for humanity.

Unfortunately, recent decades have seen a wholesale breakdown of trust between scientists and the public at large. This is due partly to the deliberate abuse of science for immoral purposes, and partly to the sheer carelessness with which various agencies have exploited scientific discoveries without proper evaluation of the risks involved. The abuse of statistical arguments have undoubtedly contributed to the suspicion with which many individuals view science.

There is an increasing alienation between scientists and the general public. Many fewer students enrol for courses in physics and chemistry than a a few decades ago. Fewer graduates mean fewer qualified science teachers in schools. This is a vicious cycle that threatens our future. It must be broken.

The danger is that the decreasing level of understanding of science in society means that knowledge (as well as its consequent power) becomes concentrated in the minds of a few individuals. This could have dire consequences for the future of our democracy. Even as things stand now, very few Members of Parliament are scientifically literate. How can we expect to control the application of science when the necessary understanding rests with an unelected “priesthood” that is hardly understood by, or represented in, our democratic institutions?

Very few journalists or television producers know enough about science to report sensibly on the latest discoveries or controversies. As a result, important matters that the public needs to know about do not appear at all in the media, or if they do it is in such a garbled fashion that they do more harm than good.

Years ago I used to listen to radio interviews with scientists on the Today programme on BBC Radio 4. I even did such an interview once. It is a deeply frustrating experience. The scientist usually starts by explaining what the discovery is about in the way a scientist should, with careful statements of what is assumed, how the data is interpreted, and what other possible interpretations might be and the likely sources of error. The interviewer then loses patience and asks for a yes or no answer. The scientist tries to continue, but is badgered. Either the interview ends as a row, or the scientist ends up stating a grossly oversimplified version of the story.

Some scientists offer the oversimplified version at the outset, of course, and these are the ones that contribute to the image of scientists as priests. Such individuals often believe in their theories in exactly the same way that some people believe religiously. Not with the conditional and possibly temporary belief that characterizes the scientific method, but with the unquestioning fervour of an unthinking zealot. This approach may pay off for the individual in the short term, in popular esteem and media recognition – but when it goes wrong it is science as a whole that suffers. When a result that has been proclaimed certain is later shown to be false, the result is widespread disillusionment.

The worst example of this tendency that I can think of is the constant use of the phrase “Mind of God” by theoretical physicists to describe fundamental theories. This is not only meaningless but also damaging. As scientists we should know better than to use it. Our theories do not represent absolute truths: they are just the best we can do with the available data and the limited powers of the human mind. We believe in our theories, but only to the extent that we need to accept working hypotheses in order to make progress. Our approach is pragmatic rather than idealistic. We should be humble and avoid making extravagant claims that can’t be justified either theoretically or experimentally.

The more that people get used to the image of “scientist as priest” the more dissatisfied they are with real science. Most of the questions asked of scientists simply can’t be answered with “yes” or “no”. This leaves many with the impression that science is very vague and subjective. The public also tend to lose faith in science when it is unable to come up with quick answers. Science is a process, a way of looking at problems not a list of ready-made answers to impossible problems. Of course it is sometimes vague, but I think it is vague in a rational way and that’s what makes it worthwhile. It is also the reason why science has led to so many objectively measurable advances in our understanding of the World.

I don’t have any easy answers to the question of how to cure this malaise, but do have a few suggestions. It would be easy for a scientist such as myself to blame everything on the media and the education system, but in fact I think the responsibility lies mainly with ourselves. We are usually so obsessed with our own research, and the need to publish specialist papers by the lorry-load in order to advance our own careers that we usually spend very little time explaining what we do to the public or why.

I think every working scientist in the country should be required to spend at least 10% of their time working in schools or with the general media on “outreach”, including writing blogs like this. People in my field – astronomers and cosmologists – do this quite a lot, but these are areas where the public has some empathy with what we do. If only biologists, chemists, nuclear physicists and the rest were viewed in such a friendly light. Doing this sort of thing is not easy, especially when it comes to saying something on the radio that the interviewer does not want to hear. Media training for scientists has been a welcome recent innovation for some branches of science, but most of my colleagues have never had any help at all in this direction.

The second thing that must be done is to improve the dire state of science education in schools. Over the last two decades the national curriculum for British schools has been dumbed down to the point of absurdity. Pupils that leave school at 18 having taken “Advanced Level” physics do so with no useful knowledge of physics at all, even if they have obtained the highest grade. I do not at all blame the students for this; they can only do what they are asked to do. It’s all the fault of the educationalists, who have done the best they can for a long time to convince our young people that science is too hard for them. Science can be difficult, of course, and not everyone will be able to make a career out of it. But that doesn’t mean that it should not be taught properly to those that can take it in. If some students find it is not for them, then so be it. We don’t everyone to be a scientist, but we do need many more people to understand how science really works.

I realise I must sound very gloomy about this, but I do think there are good prospects that the gap between science and society may gradually be healed. The fact that the public distrust scientists leads many of them to question us, which is a very good thing. They should question us and we should be prepared to answer them. If they ask us why, we should be prepared to give reasons. If enough scientists engage in this process then what will emerge is and understanding of the enduring value of science. I don’t just mean through the DVD players and computer games science has given us, but through its cultural impact. It is part of human nature to question our place in the Universe, so science is part of what we are. It gives us purpose. But it also shows us a way of living our lives. Except for a few individuals, the scientific community is tolerant, open, internationally-minded, and imbued with a philosophy of cooperation. It values reason and looks to the future rather than the past. Like anyone else, scientists will always make mistakes, but we can always learn from them. The logic of science may not be infallible, but it’s probably the best logic there is in a world so filled with uncertainty.





What happens if you ask people to pick a number `at random’ between 1 and 100?

Posted in Bad Statistics with tags on April 11, 2018 by telescoper

I saw this circulating on Twitter and thought I would share it here; it was originally posted on reddit.

The graph shows the results obtained when 6750 people were asked to pick an integer `at random’ between 1 and 100. You might naively expect the histogram to be flat (give or take some Poisson errors), consistent with each number having the same probability of being picked, but there are clearly some numbers that are more likely to be chosen than a constant probability would imply. The most popular picks are in fact 69, 77 and 7 (in descending order).

It’s well known amongst purveyors of conjuring tricks and the like that if you ask people to pick a number between 1 and 10, far more people choose 7 than any other number. And I suppose 77 is an extension of that. More interestingly, however, the top result implies that, given the choice, more people seem to prefer a 69 to anything else…

Anyway, it proves a point that I’ve made more than a few times on this blog, namely that people generally have a very poor idea of what randomness is and are particularly bad at making random choices or generating random sequences.

P.S. Please direct any criticism of the graph (e.g. why the x-axis goes up to 104 or why the x-values are given to two decimal places) to the reddit page…

Metrics for `Academic Reputation’

Posted in Bad Statistics, Science Politics with tags , , , on April 9, 2018 by telescoper

This weekend I came across a provocative paper on the arXiv with the title Measuring the academic reputation through citation records via PageRank. Here is the abstract:

The objective assessment of the prestige of an academic institution is a difficult and hotly debated task. In the last few years, different types of University Rankings have been proposed to quantify the excellence of different research institutions in the world. Albeit met with criticism in some cases, the relevance of university rankings is being increasingly acknowledged: indeed, rankings are having a major impact on the design of research policies, both at the institutional and governmental level. Yet, the debate on what rankings are  exactly measuring is enduring. Here, we address the issue by measuring a quantitative and reliable proxy of the academic reputation of a given institution and by evaluating its correlation with different university rankings. Specifically, we study citation patterns among universities in five different Web of Science Subject Categories and use the PageRank algorithm on the five resulting citation networks. The rationale behind our work is that scientific citations are driven by the reputation of the reference so that the PageRank algorithm is expected to yield a rank which reflects the reputation of an academic institution in a specific field. Our results allow to quantifying the prestige of a set of institutions in a certain research field based only on hard bibliometric data. Given the volume of the data analysed, our findings are statistically robust and less prone to bias, at odds with ad hoc surveys often employed by ranking bodies in order to attain similar results. Because our findings are found to correlate extremely well with the ARWU Subject rankings, the approach we propose in our paper may open the door to new, Academic Ranking methodologies that go beyond current methods by reconciling the qualitative evaluation of Academic Prestige with its quantitative measurements via publication impact.

(The link to the description of the PageRank algorithm was added by me; I also corrected a few spelling mistakes in the abstract). You can find the full paper here (PDF).

For what it’s worth, I think the paper contains some interesting ideas (e.g. treating citations as a `tree’ rather than a simple `list’) but the authors make some assumptions that I find deeply questionable (e.g. that being cited among a short reference listed is somehow of higher value than in a long list). The danger is that using such information in a metric could form an incentive to further bad behaviour (such as citation cartels).

I have blogged quite a few times about the uses and abuses of citations (see tag here) , and I won’t rehearse these arguments here. I will say, however, that I do agree with the idea of sharing citations among the authors of the paper rather than giving each and every author credit for the total. Many astronomers disagree with this point of view, but surely it is perverse to argue that the 100th author of a paper with 51 citations deserves more credit than the sole author of paper with 49?

Above all, though, the problem with constructing a metric for `Academic Reputation’ is that the concept is so difficult to define in the first place…

Is the Cosmological Flatness Problem really a problem?

Posted in Bad Statistics, The Universe and Stuff with tags , , on March 26, 2018 by telescoper

A comment elsewhere on this blog drew my attention to a paper on the arXiv by Marc Holman with the following abstract:

Modern observations based on general relativity indicate that the spatial geometry of the expanding, large-scale Universe is very nearly Euclidean. This basic empirical fact is at the core of the so-called “flatness problem”, which is widely perceived to be a major outstanding problem of modern cosmology and as such forms one of the prime motivations behind inflationary models. An inspection of the literature and some further critical reflection however quickly reveals that the typical formulation of this putative problem is fraught with questionable arguments and misconceptions and that it is moreover imperative to distinguish between different varieties of problem. It is shown that the observational fact that the large-scale Universe is so nearly flat is ultimately no more puzzling than similar “anthropic coincidences”, such as the specific (orders of magnitude of the) values of the gravitational and electromagnetic coupling constants. In particular, there is no fine-tuning problem in connection to flatness of the kind usually argued for. The arguments regarding flatness and particle horizons typically found in cosmological discourses in fact address a mere single issue underlying the standard FLRW cosmologies, namely the extreme improbability of these models with respect to any “reasonable measure” on the “space of all space-times”. This issue may be expressed in different ways and a phase space formulation, due to Penrose, is presented here. A horizon problem only arises when additional assumptions – which are usually kept implicit and at any rate seem rather speculative – are made.

It’s an interesting piece on a topic that I’ve blogged about before. I think it’s well worth reading because many of the discussions of this issue you will find in the literature are very confused and confusing. Apart from mine of course.

Have you got a proper posterior?

Posted in Bad Statistics, The Universe and Stuff with tags , , , , on December 12, 2017 by telescoper

There’s an interesting paper on the arXiv today by Tak et al. with the title `How proper are Bayesian models in the astronomical literature?’ The title isn’t all that appropriate, because the problem is not really with `models’, but with the choice of prior (which should be implied by the model and other information known or assumed to be true). Moreover, I’m not sure whether the word `Bayesian’ applies to the model in any meaningful way.

Anyway, The abstract is as follows:

The well-known Bayes theorem assumes that a posterior distribution is a probability distribution. However, the posterior distribution may no longer be a probability distribution if an improper prior distribution (non-probability measure) such as an unbounded uniform prior is used. Improper priors are often used in the astronomical literature to reflect on a lack of prior knowledge, but checking whether the resulting posterior is a probability distribution is sometimes neglected. It turns out that 24 articles out of 75 articles (32\%) published online in two renowned astronomy journals (ApJ and MNRAS) between Jan 1, 2017 and Oct 15, 2017 make use of Bayesian analyses without rigorously establishing posterior propriety. A disturbing aspect is that a Gibbs-type Markov chain Monte Carlo (MCMC) method can produce a seemingly reasonable posterior sample even when the posterior is not a probability distribution (Hobert and Casella, 1996). In such cases, researchers may erroneously make probabilistic inferences without noticing that the MCMC sample is from a non-existent probability distribution. We review why checking posterior propriety is fundamental in Bayesian analyses when improper priors are used and discuss how we can set up scientifically motivated proper priors to avoid the pitfalls of using improper priors.

This paper makes a point that I have wondered about on a number of occasions. One of the problems, in my opinion, is that astrophysicists don’t think enough about their choice of prior. An improper prior is basically a statement of ignorance about the result one expects in advance of incoming data. However, very often we know more than we think we do. I’ve lost track of the number of papers I’ve seen in which the authors blithely assume a flat prior when that makes no sense whatsoever on the basis of what information is available and, indeed, on the structure of the model within which the data are to be interpreted. I discuss a simple example here.

In my opinion the prior is not (as some frequentists contend) some kind of aberration. It plays a clear logical role in Bayesian inference. It can build into the analysis constraints that are implied by the choice of model framework. Even if it is used as a subjective statement of prejudice, the Bayesian approach at least requires one to put that prejudice on the table where it can be seen.

There are undoubtedly situations where we don’t know enough to assign a proper prior. That’s not necessarily a problem. Improper priors can – and do – lead to proper posterior distributions if (and it’s an important if) they include, or the  likelihood subsequently imposes, a cutoff on the prior space. The onus should be on the authors of a paper to show that their likelihood is such that it does this and produces a posterior which is well-defined probability measure (specifically that it is normalisable, ie can be made to integrate to unity). It seems that astronomers don’t always do this!

Joseph Bertrand and the Monty Hall Problem

Posted in Bad Statistics, History, mathematics with tags , , , , on October 4, 2017 by telescoper

The death a few days ago of Monty Hall reminded me of something I was going to write about the Monty Hall Problem, as it did with another blogger I follow, namely that (unsrurprisingly) Stigler’s Law of Eponymy applies to this problem.

The earliest version of the problem now called the Monty Hall Problem dates from a book, first published in 1889, called Calcul des probabilités written by Joseph Bertrand. It’s a very interesting book, containing much of specific interest to astronomers as well as general things for other scientists. Ypu can read it all online here, if you can read French.

As it happens, I have a copy of the book and here is the relevant problem. If you click on the image it should be legible.

It’s actually Problem 2 of Chapter 1, suggesting that it’s one of the easier, introductory questions. Interesting that it has endured so long, even if it has evolved slightly!

I won’t attempt a full translation into English, but the problem is worth describing as it is actually more interesting than the Monty Hall Problem (with the three doors). In the Bertrand version there are three apparently identical boxes (coffrets) each of which has two drawers (tiroirs). In each drawer of each box there is a medal. In the first box there are two gold medals. The second box contains two silver medals. The third box contains one gold and one silver.

The boxes are shuffled, and you pick a box `at random’ and open one drawer `randomly chosen’ from the two. What is the probability that the other drawer of the same box contains a medal that differs from the first?

Now the probability that you select a box with two different medals in the first place is just 1/3, as it has to be the third box: the other two contain identical medals.

However, once you open one drawer and find (say) a silver medal then the probability of the other one being different (i.e. gold) changes because the knowledge gained by opening the drawer eliminates (in this case) the possibility that you selected the first box (which has only gold medals in it). The probability of the two medals being different is therefore 1/2.

That’s a very rough translation of the part of Bertrand’s discussion on the first page. I leave it as an exercise for the reader to translate the second part!

I just remembered that this is actually the same as the three-card problem I posted about here.

Why we should abandon “statistical significance”

Posted in Bad Statistics with tags , on September 27, 2017 by telescoper

So a nice paper by McShane et al. has appeared on the arXiv with the title Abandon Statistical Significance and abstract:

In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration–often scant–given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.

This piece is in part a reaction to a paper by Benjamin et al. in Nature Human Behaviour that argues for the adoption of a standard threshold of p=0.005 rather than the more usual p=0.05. This latter paper has generated a lot of interest, but I think it misses the point entirely. The fundamental problem is not what number is chosen for the threshold p-value, but what this statistic does (and does not) mean. It seems to me the p-value is usually an answer to a question which is quite different from that which a scientist would want to ask, which is what the data have to say about a given hypothesis. I’ve banged on about Bayesian methods quite enough on this blog so I won’t repeat the arguments here, except that such approaches focus on the probability of a hypothesis being right given the data, rather than on properties that the data might have given the hypothesis.

While I generally agree with the arguments given in McShane et al, I don’t think it goes far enough. I think p-values are so misleading, if I had my way I’d ban them altogether!