Should Open Access Include Open Software?

Very busy today, so just time for a quick post (and associated poll) about Open Science.

As you all know I’ve been using this blog for a while to bang on about Open Access to scientific publications. I’m not going to repeat my position in detail here except to say that I’m in favour of Open Access but not at the immense cost envisaged by the Finch Report.

I thought however that it might be useful to float some opinions about wider issues related to open science. In particular, the question that often troubles me is that is open access to scientific results actually enough, or do we have to go a lot further?

I think an important aspect of the way science works is that when a given individual or group publishes a result, it should be possible for others to reproduce it (or not as the case may be). Traditional journal publications don’t always allow this. In my own field of astrophysics/cosmology, for example, results in scientific papers are often based on very complicated analyses of large data sets. This is increasingly the case in other fields too. A basic problem obviously arises when data are not made public. Fortunately in astrophysics these days researchers are pretty good at sharing their data, although this hasn’t always been the case.

However, even allowing open access to data doesn’t always solve the reproducibility problem. Often extensive numerical codes are needed to process the measurements and extract meaningful output. Without access to these pipeline codes it is impossible for a third party to check the path from input to output without writing their own version assuming that there is sufficient information to do that in the first place. That researchers should publish their software as well as their results is quite a controversial suggestion, but I think it’s the best practice for science. There isn’t a uniform policy in astrophysics and cosmology, but I sense that quite a few people out there agree with me. Cosmological numerical simulations, for example, can be performed by anyone with a sufficiently big computer using GADGET the source codes of which are freely available. Likewise, for CMB analysis, there is the excellent CAMB code, which can be downloaded at will; this is in a long tradition of openly available numerical codes, including CMBFAST and HealPix.

I suspect some researchers might be reluctant to share the codes they have written because they feel they won’t get sufficient credit for work done using them. I don’t think this is true, as researchers are generally very appreciative of such openness and publications describing the corresponding codes are generously cited. In any case I don’t think it’s appropriate to withhold such programs from the wider community, which prevents them being either scrutinized or extended as well as being used to further scientific research. In other words excessively proprietorial attitudes to data analysis software are detrimental to the spirit of open science.

Anyway, my views aren’t guaranteed to be representative of the community, so I’d like to ask for a quick show of hands via a poll…

…and you are of course welcome to comment via the usual box.

14 Responses to “Should Open Access Include Open Software?”

  1. Publishing code is a good idea. But don’t underestimate the work required to publish code… A bet a number of researchers would be unhappy to publish their code due to various crimes against Good Practise, such as
    a) lack of comments/documentation
    b) profantities in variable names/comments
    c) ugly, ugly hacks
    d) special cases
    e) source code no longer available, “but the executable is still fine”
    f) no error checking
    g) comments like “FIX ME” or “DON’T REMOVE” or “shouldn’t work, but does”

    • telescoper Says:

      I suspect this is very true.

    • Adrian Burd Says:

      These have all been raised when I have tried to push for code publication in my field of research. Many large scale codes in oceanography and climate are freely available and have been for many years. However, many “in-house” smaller but still substantial codes are not.

      Another frequently heard objection is that by making code freely available, some users expect support, and scientists are not in the business nor are they necessarily equipped to provide such support.

      My own feeling is that, at least in earth sciences, there needs to be a shift in administration towards giving credit for this type of activity. As Will says, it takes considerable time and effort to get a code (and it’s associated documentation) into reasonable shape to be released. That time and effort is not always recognized (by either administrators or ones colleagues).

  2. In general I’m in favour of free code, but it’s often not free to make it in a releasable state, as Will suggests. What works well enough for you and your special case may not be general enough for anyone else.

    You then open yourself up to people grabbing the code and then wanting to be hand held when it doesn’t work.

    It can also get out of control. Gadget in particular is hard to pin down a consistent version – its usually Gadget-2 + X’s routines to do Y, and Z’s changes to do A and my own changes to …

    Long term storage is also an issue, a URL published in a paper may quickly go out of date.

  3. From the perspective of someone now on the outside of academia (but still doing science) I’m against this being a requirement.

    While I’d love all software to be as open as possible, a lot of companies enjoy the ability to publish papers that involve data pipelines, without having to publish the code for the pipeline itself.

    Such code can often be deeply integrated into the internally developed software, rely on external licences, include valuable analysis techniques, and so on. The net result would simply companies would stop publishing anything – and that might restrict academics who work with industry. I’d hate to see those sorts of ties severed, even if you’d not mourn the loss of papers coming out of industry (and I would).

    • telescoper Says:

      I was really talking about academic research – I can understand companies wanting to keep commercial software suites private.

  4. Adrian Burd Says:

    There is an interesting case in the biosciences a few years ago where several high level papers (5 or 6 if memory serves) in Science and PNAS had to be retracted after a bug was found in data-analysis code that had been developed in-house and was never questioned. The code sorted out protein structures and the bug flipped a sign resulting in the wrong structures being given to some pretty important molecules.

  5. ‘Always’, no. If I’ve put a lot of effort into writing a code that lets me interpret data that couldn’t previously be interpreted properly (a situation I’ve been in several times), that is effort I’ve spent getting a tool that nobody else has. I want a reward for that effort other than the warm fuzzies of knowing that other people have used the code; i.e. I want to either write papers with it or collaborate with others who will do so, while I still have that edge. That should be my decision to make.

    Of course some people write code as a public service, like Phillip’s; other people want their code to become the gold standard for analysis in a particular area, which means it’s really got to be public; there are plenty of situations where publishing source code makes sense. But it should be up to the author to decide.

    • telescoper Says:

      Fair enough, I suppose, but I’d say that this exposes a flaw in the much-vaunted system of peer review. How can a referee actually decide whether your analysis is correct if you keep your code secret?

      • They can reasonably ask for a full description of what the code does, such that they could in principle replicate it if they want to; they can compare the results to what’s already out there to see if they make sense; and they can ask me to run more tests and put the results in the paper. In other words, they can do all the same things that you would do if you were a referee in an experimental discipline given a paper about a complicated experimental setup that you can’t easily replicate.

        (Of course, it’s not the referee’s job to decide whether the analysis is correct, anyway, but I’ll take that as shorthand for what the referee’s job actually is…)

  6. I would say this is a no. People who make their code public should be prepared to update and support it -a huge effort (e.g. Cloudy). The argument that it allows the code to be checked does not fly. A public code with faults can be hugely more damaging, than a private code with faults. A much better way to deal with code errors is to have a set of template problems for which code outputs can be compared.

    I am much more concerned about the move to commercial software. Starlink, Midas, etc are bing replaced with IDL for which I need to pay. Is this money well spend?

    • telescoper Says:

      That seems to me to be a non sequitur.

      I don’t think any code would have to be updated (although it would be helpful if it were). The point is to make the code used for a particular analysis available to that others can see it and/or use it.

  7. […] Should Open Access Include Open Software? ( […]

  8. […] The software can be downloaded here. It looks a very useful package that includes code to calculate many of the bits and pieces used by cosmologists working on the theory of large-scale structure and galaxy evolution. It is also, I hope, an example of a trend towards greater use of open-source software, for which I congratulate the author! I think this is an important part of the campaign to create truly open science, as I blogged about here. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: