Open Science and Open Software

As the regular readers of this blog – both of them – will know, I’ve been banging on from time to time about Open Access to scientific publications. After posting a video featuring Volker Springel and the GADGET-4 code I thought I’d return to an issue that came up briefly in my recent talk about Open Access and the Open Journal of Astrophysics here which is the question whether open access to scientific results enough, or do we have to go a lot further?

An important aspect of the way science works is that when a given individual or group publishes a result, it should be possible for others to reproduce it (or not as the case may be). Traditional journal publications don’t always allow this. In my own field of astrophysics/cosmology, for example, results in scientific papers are often based on very complicated analyses of large data sets. This is increasingly the case in other fields too. A basic problem obviously arises when data are not made public. Fortunately in astrophysics these days researchers are pretty good at sharing their data, although this hasn’t always been the case.

However, even allowing open access to data doesn’t always solve the reproducibility problem. Often extensive numerical codes are needed to process the measurements and extract meaningful output. Without access to these pipeline codes it is impossible for a third party to check the path from input to output without writing their own version assuming that there is sufficient information to do that in the first place. That researchers should publish their software as well as their results is quite a controversial suggestion, but I think it’s the best practice for science. There isn’t a uniform policy in astrophysics and cosmology, but I sense that quite a few people out there agree with me. Cosmological numerical simulations, for example, can be performed by anyone with a sufficiently big computer using GADGET the source codes of which are freely available. Likewise, for CMB analysis, there is the excellent CAMB code, which can be downloaded at will; this is in a long tradition of openly available numerical codes, including CMBFAST and HealPix. Researchers in these and other areas do tend to share their software on open-access repositories, especially GitHub.

I suspect some researchers might be reluctant to share the codes they have written because they feel they won’t get sufficient credit for work done using them. I don’t think this is true, as researchers are generally very appreciative of such openness and publications describing the corresponding codes are generously cited. In any case I don’t think it’s appropriate to withhold such programs from the wider community, which prevents them being either scrutinized or extended as well as being used to further scientific research. In other words excessively proprietorial attitudes to data analysis software are detrimental to the spirit of open science.

Anyway, my views are by no means guaranteed to be representative of the community, so I’d like to ask for a quick show of hands via a poll that I started about 8 years ago.

You are of course welcome to comment via the usual box, as long as you respect my comments policy…

7 Responses to “Open Science and Open Software”

  1. Anton Garrett Says:

    If commercially purchased code was used in the analysis of data in a research paper then the authors of the paper will have committed, as part of purchase, not to make that code freely available.

  2. Hans Kristian Eriksen Says:

    I think the importance of Open Source software and analysis is being recognized more and more widely in the cosmology community. Reproducibility and validation are obviously important issues, but also cost efficiency and general community building are critical elements. Being able to reuse tested software represents a huge cost saving for many experiments. A recent example of this is the BeyondPlanck project (http://beyondplanck.science; https://arxiv.org/abs/2011.05609), which aims to put together a complete Bayesian end-to-end analysis pipeline for Planck LFI, and make it publicly available under a GPL lisence. Likewise, the Cosmoglobe project (http://cosmoglobe.uio.no) aims to establish a cross-experiment and community-wide model of the microwave sky with the same technology, with an online kick-off meeting taking place in less than three weeks (https://cosmoglobe.uio.com/conference) and (so far) 18 experiments participating. While it is still early days, and it’s by no means not clear how things will actually work out, it’s great to see that so many are interested in coming together and talk about these issues in a more structured manner 🙂

  3. Hans Kristian Eriksen Says:

    Correct conference link: https://cosmoglobe.herokuapp.com/conference

  4. A problem is that if one confirms a result, either using the same code or different code, either using the same data or different data, then it can be difficult to get it published, so many people don’t bother. It seems to me that if reproduction/confirmation is a goal, then journals should pledge to publish such confirmation results.

    A related problem is that journals might be too keen to publish results which dispute results which are relatively well established.

    I certainly agree that testing published results is very important (check out the acknowledgements).

  5. I agree that it should be possible to reproduce published results. But there are many reasons why specific software may not be openly available. It may not be owned by the author. It may be commercial software. The author may intend to commercialize it. The calculations should be described in enough detail that it can be repeated. But this does not require that the specific software used by the author is published.

    Instead of software, the research may have used a new instrument, and in that case we would not require that the instrument is provided.

    • Hans Kristian Eriksen Says:

      Clearly, there exist several good reasons for why a specific piece of software must be kept closed. However, my claim is that the vast majority of current cosmology codes *can* be made public without major complications. Furthermore, it makes perfect sense that it should be, since most of it is actually produced through taxpayer funding. And, indeed, the same applies to the data collected by the experiments — in my opinion, those shouldn’t be considered the private property of whoever happened to collect them, but rather the public property of everybody. The main reason software is different from instruments is one of practicality and cost: it’s much cheaper and easier to distribute computer codes than it is to distribute an instrument. But, of course, that *is* what’s being done with open calls for observation time at many facilities.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: