Archive for NASA/ADS

The H-index is Redundant…

Posted in Bad Statistics, Science Politics with tags , , , , , on January 28, 2012 by telescoper

An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) which has been the subject of a number of previous blog posts on here. The author’s surname is pronounced “sprout”, by the way.

The H-index is defined to be the largest number H such that the author has written at least H papers having H citations. It can easily be calculated by looking up all papers by a given author on a database such as NASA/ADS, sorting them by (decreasing) number of citations, and working down the list to the point where the number of citations of a paper falls below the number representing position in the list. Normalized quantities – obtained by dividing the number of citations a paper receives by the number of authors of that paper for each paper – can be used to form an alternative measure.

Here is the abstract of the paper:

Here are a couple of graphs which back up the claim of a near-perfect correlation between H-index and total citations:

The figure shows both total citations (right) and normalized citations (left); the latter, in my view, a much more sensible measure of individual contributions. The basic problem of course is that people don’t get citations, papers do. Apportioning appropriate credit for a multi-author paper is therefore extremely difficult. Does each author of a 100-author paper that gets 100 citations really deserve the same credit as a single author of a paper that also gets 100 citations? Clearly not, yet that’s what happens if you count total citations.

The correlation between H index and the square root of total citation numbers has been remarked upon before, but it is good to see it confirmed for the particular field of astrophysics.

Although I’m a bit unclear as to how the “sample” was selected I think this paper is a valuable contribution to the discussion, and I hope it helps counter the growing, and in my opinion already excessive, reliance on the H-index by grants panels and the like. Trying to condense all the available information about an applicant into a single number is clearly a futile task, and this paper shows that using H-index and total numbers doesn’t add anything as they are both measuring exactly the same thing.

A very interesting question emerges from this, however, which is why the relationship between total citation numbers and h-index has the form it does: the latter is always roughly half of the square-root of the former. This suggests to me that there might be some sort of scaling law describing onto which the distribution of cites-per-paper can be mapped for any individual. It would be interesting to construct a mathematical model of citation behaviour that could reproduce this apparently universal property….

If it ain’t open, it ain’t science

Posted in Open Access, Science Politics, The Universe and Stuff with tags , , , , , , on May 16, 2011 by telescoper

Last Friday (13th March) the Royal Society launched a study into “openness in science”, as part of which they are inviting submisions from individuals and organizations. According to the Royal Society website

Science has always been about open debate. But incidents such as the UEA email leaks have prompted the Royal Society to look at how open science really is.  With the advent of the Internet, the public now expect a greater degree of transparency. The impact of science on people’s lives, and the implications of scientific assessments for society and the economy are now so great that  people won’t just believe scientists when they say “trust me, I’m an expert.” It is not just scientists who want to be able to see inside scientific datasets, to see how robust they are and ask difficult questions about their implications. Science has to adapt.”

I think this is a timely and important study which at the very least will reveal how different the attitude to this issue is between different science disciplines. On one extreme we have fields like astronomy, where the practice of making all data publically available is increasingly common and where most scientific publications are available free of charge through the arXiv. On the other there are fields where experimental data are generally regarded as the private property of the scientists involved in collecting the measurements or doing the experiments.

I have quite a simple view on this, which is that the default should be that  data resulting from publically funded research should be in the public domain. I accept that this will not always be possible owing to  ethical issues, such as when human subjects are involved, but that should be the default position.I have two reasons for thinking this way. One is that it’s public money that funds us, so we have a moral responsibility to be as open as possible with the public. The other is that the scientific method only works when analyses can be fully scrutinized and, if necessary, replicated by other researchers. In other words, to seek to prevent one’s data becoming freely available is profoundly unscientific.

I’m actually both surprised and depressed at the reluctance of some scientists to make their data available for scrutiny by other scientists, let alone members of the general public. I can give an example of  my own experience of an encounter with a brick wall when trying to find out more about the statistics behind a study in the field of neuroscience. Other branches of physics are also way behind astronomy and cosmology in opening up their research.

If scientists are reluctant to share their data with other scientists it’s very difficult to believe they will be happy to put it all in the public domain. But I think they should. And I don’t mean just chucking terabytes of complicated unsorted data onto a website in such a way that it’s impossible in practice to make use of. I mean fully documented, carefully maintained databases with both raw data, analysis tools and data products. An exemplar is the excellent LAMBDA site which is a repository for data arising for research into the Cosmic Microwave Background.

I’ve ranted before (and will no doubt do so again) about the extremely negative effect the academic publishing industry has on the dissemination of results. At out latest Board of Studies meeting, the prospect of further cuts to our library budget was raised and the suggestion made that we might have to cancel some of our journal subscriptions. I, and most of my astronomy colleagues, frankly don’t really care if we cancel astronomy journals. All our relevant papers can be found on the arXiv and/or via the NASA/ADS system. My physics colleagues, on the other hand, are still in hock to the old-fashioned and ruinously expensive academic journal racket.

One of the questions the Royal Society study will ask is:

How do we make information more accessible and who will pay to do it?

I’m willing to hazard a guess that if we worked out how much universities and research laboratories are spending on pointless journal subscriptions, then we’d find that it’s more than enough to pay for the construction and maintenance of  sufficient  open access repositories.  The current system of publishing could easily be scrapped, and replaced by something radically different, but it won’t be easy to change to a new approach more suited to the era of the internet.  For example, at present  we are forced to  publish in “proper journals” for the purposes of research assessments, so that academic publishers wield immense power over university researchers. These vested interests will be difficult to overthrow, but I think there’s a growing realization that they are actively preventing science adjusting properly to the digital age.

Anyway, whether or not you agree with me, I hope you’ll agree that the Royal Society study is an important one so please take a look and contribute if you can.



Get every new post delivered to your Inbox.

Join 3,268 other followers