Archive for Software

LIGO and Open Science

Posted in The Universe and Stuff, Science Politics, Open Access with tags , , , , on August 8, 2017 by telescoper

I’ve just come from another meeting here at the Niels Bohr Institute between some members of the LIGO Scientific Collaboration and the authors of the `Danish Paper‘. As with the other one I attended last week it was both interesting and informative. I’m not going to divulge any of the details of the discussion, but I anticipate further developments that will put some of them into the public domain fairly soon and will comment on them as and when that happens.

I think an important aspect of the way science works is that when a given individual or group publishes a result, it should be possible for others to reproduce it (or not as the case may be). In normal-sized laboratory physics it suffices to explain the experimental set-up in the published paper in sufficient detail for another individual or group to build an equivalent replica experiment if they want to check the results. In `Big Science’, e.g. with LIGO or the Large Hadron Collider, it is not practically possible for other groups to build their own copy, so the best that can be done is to release the data coming from the experiment. A basic problem with reproducibility obviously arises when this does not happen.

In astrophysics and cosmology, results in scientific papers are often based on very complicated analyses of large data sets. This is also the case for gravitational wave experiments. Fortunately in astrophysics these days researchers are generally pretty good at sharing their data, but there are a few exceptions in that field. Particle physicists, by contrast, generally treat all their data as proprietary.

Even allowing open access to data doesn’t always solve the reproducibility problem. Often extensive numerical codes are needed to process the measurements and extract meaningful output. Without access to these pipeline codes it is impossible for a third party to check the path from input to output without writing their own version, assuming that there is sufficient information to do that in the first place. That researchers should publish their software as well as their results is quite a controversial suggestion, but I think it’s the best practice for science. In any case there are often intermediate stages between `raw’ data and scientific results, as well as ancillary data products of various kinds. I think these should all be made public. Doing that could well entail a great deal of effort, but I think in the long run that it is worth it.

I’m not saying that scientific collaborations should not have a proprietary period, just that this period should end when a result is announced, and that any such announcement should be accompanied by a release of the data products and software needed to subject the analysis to independent verification.

Now, if you are interested in trying to reproduce the analysis of data from the first detection of gravitational waves by LIGO, you can go here, where you can not only download the data but also find a helpful tutorial on how to analyse it.

This seems at first sight to be fully in the spirit of open science, but if you visit that page you will find this disclaimer:

 

In other words, one can’t check the LIGO data analysis because not all the data and tools necessary to do that are not publicly available.  I know for a fact that this is the case because of the meetings going on here at NBI!

Given that the detection of gravitational waves is one of the most important breakthroughs ever made in physics, I think this is a matter of considerable regret. I also find it difficult to understand the reasoning that led the LIGO consortium to think it was a good plan only to go part of the way towards open science, by releasing only part of the information needed to reproduce the processing of the LIGO signals and their subsequent statistical analysis. There may be good reasons that I know nothing about, but at the moment it seems to me to me to represent a wasted opportunity.

I know I’m an extremist when it comes to open science, and there are probably many who disagree with me, so I thought I’d do a mini-poll on this issue:

Any other comments welcome through the box below!

Advertisements

Software Use in Astronomy

Posted in Education, The Universe and Stuff with tags , , , , on July 21, 2015 by telescoper

I just saw an interesting paper which hit the arXiv last week and thought I would share it here. It’s called Software Use in Astronomy: An Informal Survey and the abstract is here:

softwareA couple of things are worth remarking upon. One concerns Python. Although I’m not surprised that Python is Top of the Pops amongst astronomers – like many Physics & Astronomy departments we actually teach it to undergraduates here at the University of Sussex – it is notable that its popularity is a relatively recent phenomenon and it’s quite impressive how rapidly it has caught on.

Another interesting thingis the continuing quite heavy use of Fortran. Most computer scientists would consider this to be an obsolete language, and is presumably mainly used because of inertia: some important and well established codes are written in it and presumably it’s too much effort to rewrite them from scratch in something more modern. I would have thought that Fortran would have been used primarily by older academics, i.e. old dogs who can’t learn new programming tricks. However, that doesn’t really seem to be the case based on the last sentence of the abstract.

Finally, it’s quite surprising that over 40% of astronomers claim to have had no training in software development. We do try to embed that particular skill in graduate programmes nowadays, but it seems that doesn’t always work!

Anyway, do read the paper yourself. It’s very interesting. Any further comments through the box below please, but please ensure they compile before submitting them…

 

Should Open Access Include Open Software?

Posted in Open Access, Science Politics, The Universe and Stuff with tags , , , , on February 4, 2013 by telescoper

Very busy today, so just time for a quick post (and associated poll) about Open Science.

As you all know I’ve been using this blog for a while to bang on about Open Access to scientific publications. I’m not going to repeat my position in detail here except to say that I’m in favour of Open Access but not at the immense cost envisaged by the Finch Report.

I thought however that it might be useful to float some opinions about wider issues related to open science. In particular, the question that often troubles me is that is open access to scientific results actually enough, or do we have to go a lot further?

I think an important aspect of the way science works is that when a given individual or group publishes a result, it should be possible for others to reproduce it (or not as the case may be). Traditional journal publications don’t always allow this. In my own field of astrophysics/cosmology, for example, results in scientific papers are often based on very complicated analyses of large data sets. This is increasingly the case in other fields too. A basic problem obviously arises when data are not made public. Fortunately in astrophysics these days researchers are pretty good at sharing their data, although this hasn’t always been the case.

However, even allowing open access to data doesn’t always solve the reproducibility problem. Often extensive numerical codes are needed to process the measurements and extract meaningful output. Without access to these pipeline codes it is impossible for a third party to check the path from input to output without writing their own version assuming that there is sufficient information to do that in the first place. That researchers should publish their software as well as their results is quite a controversial suggestion, but I think it’s the best practice for science. There isn’t a uniform policy in astrophysics and cosmology, but I sense that quite a few people out there agree with me. Cosmological numerical simulations, for example, can be performed by anyone with a sufficiently big computer using GADGET the source codes of which are freely available. Likewise, for CMB analysis, there is the excellent CAMB code, which can be downloaded at will; this is in a long tradition of openly available numerical codes, including CMBFAST and HealPix.

I suspect some researchers might be reluctant to share the codes they have written because they feel they won’t get sufficient credit for work done using them. I don’t think this is true, as researchers are generally very appreciative of such openness and publications describing the corresponding codes are generously cited. In any case I don’t think it’s appropriate to withhold such programs from the wider community, which prevents them being either scrutinized or extended as well as being used to further scientific research. In other words excessively proprietorial attitudes to data analysis software are detrimental to the spirit of open science.

Anyway, my views aren’t guaranteed to be representative of the community, so I’d like to ask for a quick show of hands via a poll…

…and you are of course welcome to comment via the usual box.