Six key trends in contemporary statistics that really could revolutionise astronomical data analysis …

June 11, 2015

I’ve just come across this very interesting astrostatistics site, and I thought I’d reblog a piece from it. In fact I did make a very crude attempt back in the 90s to do something very like the SPDE analysis described here, but it came to nothing and I dropped the idea. Now it seems that there’s been a great deal of more recent activity in this area which I knew nothing about so it might be worth reviving interest in it.

Now. Where did I put those notes?

One More for the Bad Statistics in Astronomy File…

May 20, 2015

It’s been a while since I last posted anything in the file marked Bad Statistics, but I can remedy that this morning with a comment or two on the following paper by Robertson et al. which I found on the arXiv via the Astrostatistics Facebook page. It’s called Stellar activity mimics a habitable-zone planet around Kapteyn’s star and it the abstract is as follows:

Kapteyn’s star is an old M subdwarf believed to be a member of the Galactic halo population of stars. A recent study has claimed the existence of two super-Earth planets around the star based on radial velocity (RV) observations. The innermost of these candidate planets–Kapteyn b (P = 48 days)–resides within the circumstellar habitable zone. Given recent progress in understanding the impact of stellar activity in detecting planetary signals, we have analyzed the observed HARPS data for signatures of stellar activity. We find that while Kapteyn’s star is photometrically very stable, a suite of spectral activity indices reveals a large-amplitude rotation signal, and we determine the stellar rotation period to be 143 days. The spectral activity tracers are strongly correlated with the purported RV signal of “planet b,” and the 48-day period is an integer fraction (1/3) of the stellar rotation period. We conclude that Kapteyn b is not a planet in the Habitable Zone, but an artifact of stellar activity.

It’s not really my area of specialism but it seemed an interesting conclusions so I had a skim through the rest of the paper. Here’s the pertinent figure, Figure 3,


It looks like difficult data to do a correlation analysis on and there are lots of questions to be asked  about  the form of the errors and how the bunching of the data is handled, to give just two examples.I’d like to have seen a much more comprehensive discussion of this in the paper. In particular the statistic chosen to measure the correlation between variates is the Pearson product-moment correlation coefficient, which is intended to measure linear association between variables. There may indeed be correlations in the plots shown above, but it doesn’t look to me that a straight line fit characterizes it very well. It looks to me in some of the  cases that there are simply two groups of data points…

However, that’s not the real reason for flagging this one up. The real reason is the following statement in the text:



No matter how the p-value is arrived at (see comments above), it says nothing about the “probability of no correlation”. This is an error which is sadly commonplace throughout the scientific literature, not just astronomy.  The point is that the p-value relates to the probability that the given value of the test statistic (in this case the Pearson product-moment correlation coefficient, r) would arise by chace in the sample if the null hypothesis H (in this case that the two variates are uncorrelated) were true. In other words it relates to P(r|H). It does not tells us anything directly about the probability of H. That would require the use of Bayes’ Theorem. If you want to say anything at all about the probability of a hypothesis being true or not you should use a Bayesian approach. And if you don’t want to say anything about the probability of a hypothesis being true or not then what are you trying to do anyway?

If I had my way I would ban p-values altogether, but it people are going to use them I do wish they would be more careful about the statements make about them.

Astrostatistics at NAM

March 22, 2011

I’m using the opportunity of my enforced layoff to remind astronomers that this year’s forthcoming Royal Astronomical Society National Astronomy Meeting, incorporating the MIST and UKSP meetings, will be taking place at the splendid Venue Cymru conference centre, Llandudno, North Wales, from Sunday 17 April to Thursday 21 April.

The period for egistration has been extended , and you can now also submit abstracts of either oral or poster presentations to be considered for inclusion in the various sessions described in the science programme.

I’m organising a session on Recent Developments in Astro-statistics. I haven’t exactly been overwhelmed with offers to speak and there are still one or two slots available, so if you’d like to give a talk in that session please register and upload an abstract to the website. You can’t do the latter until you have done the former. Astro-statistics will be interpreted widely, so I hope to have a varied programme including as many applications of statistics to astronomy and astrophysics as I can get!

NAM is a particularly good opportunity for younger researchers – PhD students and postdocs – to present their work to a big audience so I particularly encourage such persons to submit abstracts. Would more senior readers please pass this message on to anyone they think might want to give a talk?

If you have any questions please feel free to use the comments box (or contact me privately).

Oh, and I should have mentioned that Andrew Jaffe is also touting for trade for the cosmology sessions he’s organising…