Archive for h-index

The H-index is Redundant…

Posted in Bad Statistics, Science Politics with tags , , , , , on January 28, 2012 by telescoper

An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) which has been the subject of a number of previous blog posts on here. The author’s surname is pronounced “sprout”, by the way.

The H-index is defined to be the largest number H such that the author has written at least H papers having H citations. It can easily be calculated by looking up all papers by a given author on a database such as NASA/ADS, sorting them by (decreasing) number of citations, and working down the list to the point where the number of citations of a paper falls below the number representing position in the list. Normalized quantities – obtained by dividing the number of citations a paper receives by the number of authors of that paper for each paper – can be used to form an alternative measure.

Here is the abstract of the paper:

Here are a couple of graphs which back up the claim of a near-perfect correlation between H-index and total citations:

The figure shows both total citations (right) and normalized citations (left); the latter, in my view, a much more sensible measure of individual contributions. The basic problem of course is that people don’t get citations, papers do. Apportioning appropriate credit for a multi-author paper is therefore extremely difficult. Does each author of a 100-author paper that gets 100 citations really deserve the same credit as a single author of a paper that also gets 100 citations? Clearly not, yet that’s what happens if you count total citations.

The correlation between H index and the square root of total citation numbers has been remarked upon before, but it is good to see it confirmed for the particular field of astrophysics.

Although I’m a bit unclear as to how the “sample” was selected I think this paper is a valuable contribution to the discussion, and I hope it helps counter the growing, and in my opinion already excessive, reliance on the H-index by grants panels and the like. Trying to condense all the available information about an applicant into a single number is clearly a futile task, and this paper shows that using H-index and total numbers doesn’t add anything as they are both measuring exactly the same thing.

A very interesting question emerges from this, however, which is why the relationship between total citation numbers and h-index has the form it does: the latter is always roughly half of the square-root of the former. This suggests to me that there might be some sort of scaling law describing onto which the distribution of cites-per-paper can be mapped for any individual. It would be interesting to construct a mathematical model of citation behaviour that could reproduce this apparently universal property….

Advice for the REF Panels

Posted in Finance, Science Politics with tags , , , , , on October 30, 2011 by telescoper

I thought I’d post a quick follow-up to last week’s item about the Research Excellence Framework (REF). You will recall that in that post I expressed serious doubts about the ability of the REF panel members to carry out a reliable assessment of the “ouputs” being submitted to this exercise, primarily because of the scale of the task in front of them. Each will have to read hundreds of papers, many of them far outside their own area of expertise. In the hope that it’s not too late to influence their approach, I thought I’d offer a few concrete suggestions as to how things might be improved. Most of my comments refer specifically to the Physics panel, but I have a feeling the themes I’ve addressed may apply in other disciplines.

The first area of  concern relates to citations, which we are told will be used during the assesment, although we’re not told precisely how this will be done. I’ve spent a few hours over the last few days looking at the accuracy and reliability various bibliometric databases and have come to the firm conclusion that Google Scholar is by far the best, certainly better than SCOPUS or Web of Knowledge. It’s also completely free. NASA/ADS is also free, and good for astronomy, but probably less complete for the rest of physics. I therefore urge the panel to ditch its commitment to use SCOPUS and adopt Google Scholar instead.

But choosing a sensible database is only part of the solution. Can citations be used sensibly at all for recently published papers? REF submissions must have been published no earlier than 2008 and the deadline is in 2013, so the longest time any paper can have had to garner citations will be five years. I think that’s OK for papers published early in the REF window, but obviously citations for those published in 2012 or 2013 won’t be as numerous.

However, the good thing about Google Scholar (and ADS) is that they include citations from the arXiv as well as from papers already published. Important papers get cited pretty much as soon as they appear on the arXiv, so including these citations will improve the process. That’s another strong argument for using Google Scholar.

The big problem with citation information is that citation rates vary significantly from field to field sit will be very difficult to use bibliometric data in a formulaic sense, but frankly it’s the only way the panel has to assess papers that lie far from their own expertise. Unless anyone else has a suggestion?

I suspect that what some panel members will do is to look beyond the four publications to guide their assessment. They might, for example, be tempted to look up the H-index of the author if they don’t know the area very well. “I don’t really understand the paper by Professor Poindexter but he has an H-index of 95 so is obviously a good chap and his work is probably therefore world-leading”. That sort of thing.

I think this approach would be very wrong indeed. For a start, it seriously disadvantages early career researchers who haven’t had time to build up a back catalogue of high-impact papers. Secondly, and more fundamentally still, it is contrary to the stated aim of the REF, which is to assess only the research carried out in the assessment period, i.e. 2008 to 2013. The H-index would include papers going back far further than 2008.

But as I pointed out in my previous post, it’s going to be impossible for the panel to perform accurate assessments of all the papers they are given: there will just be far too many and too diverse in content. They will obviously therefore have to do something other than what the rest of the community has been told they will do. It’s a sorry state of affairs that dishonesty is built into the system, but there you go. Given that the panel will be forced to cheat, let me suggest that they at least do so fairly. Better than using the H-index of each individual, use the H-index calculated over the REF period only. That will at least ensure that only research done in the REF period will count towards the REF assessment.

Another bone of contention is the assessment of the level of contribution authors have made to each paper, in other words the question of attribution. In astronomy and particle physics, many important papers have very long author lists and may be submitted to the REF by many different authors in different institutions. We are told that what the panel will do is judge whether a given individual has made a “significant” contribution to the paper. If so, that author will be accredited with the score given to the paper. If not, the grade assigned will be the lowest and that author will get no credit at all. Under this scheme one could be an author on a 4* paper but be graded “U”.

This is fair enough, in that it will penalise the “lurkers” who have made a career by attaching their names to papers on which they have made negligible contributions. We know that such people exist. But how will the panel decide what contribution is significant and what isn’t? What is the criterion?

Take the following example. Suppose the Higgs Boson is discovered at the LHC duringthe REF period. Just about every particle physics group in the UK will have authors on the ensuing paper, but the list is likely to be immensely long and include people who performed many different roles. Who decides where to draw the line on “significance”. I really don’t know the answer to this one, but a possibility might be to found in the use of the textual commentary that accompanies the submission of a research output. At present we are told that this should be used to explain what the author’s contribution to the paper was, but as far as I’m aware there is no mechanism to stop individuals hyping up their involvement.What I mean is I don’t think the panel will check for consistency between commentaries submitted by different people for the same institution.

I’d suggest that consortia  should be required to produce a standard form of words for the textual commentary, which will be used by every individual submitting the given paper and which lists all the other individuals in the UK submitting that paper as one of their four outputs. This will require co-authors to come to an agreement about their relative contributions in advance, which will no doubt lead to a lot of argument, but it seems to me the fairest way to do it. If the collaboration does not produce such an agreement then I suggest that paper be graded “U” throughout the exercise. This idea doesn’t answer the question “what does significant mean?”, but will at least put a stop to the worst of the game-playing that plagued the previous Research Assessment Exercise.

Another aspect of this relates to a question I asked several members of the Physics panel for the 2008 Research Assessment Exercise. Suppose Professor A at Oxbridge University and Dr B from The University of Neasden are co-authors on a paper and both choose to submit it as part of the REF return. Is there a mechanism to check that the grade given to the same piece of work is the same for both institutions? I never got a satisfactory answer in advance of the RAE but afterwards it became clear that the answer was “no”. I think that’s indefensible. I’d advise the panel to identify cases where the same paper is submitted by more than one institution and ensure that the grades they give are consistent.

Finally there’s the biggest problem. What on Earth does a grade like “4* (World Leading)” mean in the first place? This is clearly crucial because almost all the QR funding (in England at any rate) will be allocated to this grade. The percentage of outputs placed in this category varied enormously from field to field in the 2008 RAE and there is very strong evidence that the Physics panel judged much more harshly than the others. I don’t know what went on behind closed doors last time but whatever it was, it turned out to be very detrimental to the health of Physics as a discipline and the low fraction of 4* grades certainly did not present a fair reflection of the UK’s international standing in this area.

Ideally the REF panel could look at papers that were awarded 4* grades last time to see how the scoring went. Unfortunately, however, the previous panel shredded all this information, in order, one suspects, to avoid legal challenges. This more than any other individual act has led to deep suspicions amongs the Physics and Astronomy community about how the exercise was run. If I were in a position of influence I would urge the panel not to destroy the evidence. Most of us are mature enough to take disappointments in good grace as long as we trust the system.  After all, we’re used to unsuccessful grant applications nowadays.

That’s about twice as much as I was planning to write so I’ll end on that, but if anyone else has concrete suggestions on how to repair the REF  please file them through the comments box. They’ll probably be ignored, but you never know. Some members of the panel might take them on board.

Index Rerum

Posted in Biographical, Science Politics with tags , , , , , , , , , on September 29, 2009 by telescoper

Following on from yesterday’s post about the forthcoming Research Excellence Framework that plans to use citations as a measure of research quality, I thought I would have a little rant on the subject of bibliometrics.

Recently one particular measure of scientific productivity has established itself as the norm for assessing job applications, grant proposals and for other related tasks. This is called the h-index, named after the physicist Jorge Hirsch, who introduced it in a paper in 2005. This is quite a simple index to define and to calculate (given an appropriately accurate bibliographic database). The definition  is that an individual has an h-index of  h if that individual has published h papers with at least h citations. If the author has published N papers in total then the other N-h must have no more than h citations. This is a bit like the Eddington number.  A citation, as if you didn’t know,  is basically an occurrence of that paper in the reference list of another paper.

To calculate it is easy. You just go to the appropriate database – such as the NASA ADS system – search for all papers with a given author and request the results to be returned sorted by decreasing citation count. You scan down the list until the number of citations falls below the position in the ordered list.

Incidentally, one of the issues here is whether to count only refereed journal publications or all articles (including books and conference proceedings). The argument in favour of the former is that the latter are often of lower quality. I think that is in illogical argument because good papers will get cited wherever they are published. Related to this is the fact that some people would like to count “high-impact” journals only, but if you’ve chosen citations as your measure of quality the choice of journal is irrelevant. Indeed a paper that is highly cited despite being in a lesser journal should if anything be given a higher weight than one with the same number of citations published  in, e.g., Nature. Of course it’s just a matter of time before the hideously overpriced academic journals run by the publishing mafia go out of business anyway so before long this question will simply vanish.

The h-index has some advantages over more obvious measures, such as the average number of citations, as it is not skewed by one or two publications with enormous numbers of hits. It also, at least to some extent, represents both quantity and quality in a single number. For whatever reasons in recent times h has undoubtedly become common currency (at least in physics and astronomy) as being a quick and easy measure of a person’s scientific oomph.

Incidentally, it has been claimed that this index can be fitted well by a formula h ~ sqrt(T)/2 where T is the total number of citations. This works in my case. If it works for everyone, doesn’t  it mean that h is actually of no more use than T in assessing research productivity?

Typical values of h vary enormously from field to field – even within each discipline – and vary a lot between observational and theoretical researchers. In extragalactic astronomy, for example, you might expect a good established observer to have an h-index around 40 or more whereas some other branches of astronomy have much lower citation rates. The top dogs in the field of cosmology are all theorists, though. People like Carlos Frenk, George Efstathiou, and Martin Rees all have very high h-indices.  At the extreme end of the scale, string theorist Ed Witten is in the citation stratosphere with an h-index well over a hundred.

I was tempted to put up examples of individuals’ h-numbers but decided instead just to illustrate things with my own. That way the only person to get embarrased is me. My own index value is modest – to say the least – at a meagre 27 (according to ADS).   Does that mean Ed Witten is four times the scientist I am? Of course not. He’s much better than that. So how exactly should one use h as an actual metric,  for allocating funds or prioritising job applications,  and what are the likely pitfalls? I don’t know the answer to the first one, but I have some suggestions for other metrics that avoid some of its shortcomings.

One of these addresses an obvious deficiency of h. Suppose we have an individual who writes one brilliant paper that gets 100 citations and another who is one author amongst 100 on another paper that has the same impact. In terms of total citations, both papers register the same value, but there’s no question in my mind that the first case deserves more credit. One remedy is to normalise the citations of each paper by the number of authors, essentially sharing citations equally between all those that contributed to the paper. This is quite easy to do on ADS also, and in my case it gives  a value of 19. Trying the same thing on various other astronomers, astrophysicists and cosmologists reveals that the h index of an observer is likely to reduce by a factor of 3-4 when calculated in this way – whereas theorists (who generally work in smaller groups) suffer less. I imagine Ed Witten’s index doesn’t change much when calculated on a normalized basis, although I haven’t calculated it myself.

Observers  complain that this normalized measure is unfair to them, but I’ve yet to hear a reasoned argument as to why this is so. I don’t see why 100 people should get the same credit for a single piece of work:  it seems  like obvious overcounting to me.

Another possibility – if you want to measure leadership too – is to calculate the h index using only those papers on which the individual concerned is the first author. This is  a bit more of a fiddle to do but mine comes out as 20 when done in this way.  This is considerably higher than most of my professorial colleagues even though my raw h value is smaller. Using first author papers only is also probably a good way of identifying lurkers: people who add themselves to any paper they can get their hands on but never take the lead. Mentioning no names of  course.  I propose using the ratio of  unnormalized to normalized h-indices as an appropriate lurker detector…

Finally in this list of bibliometrica is the so-called g-index. This is defined in a slightly more complicated way than h: given a set of articles ranked in decreasing order of citation numbers, g is defined to be the largest number such that the top g articles altogether received at least g2 citations. This is a bit like h but takes extra account of the average citations of the top papers. My own g-index is about 47. Obviously I like this one because my number looks bigger, but I’m pretty confident others go up even more than mine!

Of course you can play with these things to your heart’s content, combining ideas from each definition: the normalized g-factor, for example. The message is, though, that although h definitely contains some information, any attempt to condense such complicated information into a single number is never going to be entirely successful.

Comments, particularly with suggestions of alternative metrics are welcome via the box. Even from lurkers.

Follow

Get every new post delivered to your Inbox.

Join 3,279 other followers