The H-index is Redundant…

An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) which has been the subject of a number of previous blog posts on here. The author’s surname is pronounced “sprout”, by the way.

The H-index is defined to be the largest number H such that the author has written at least H papers having H citations. It can easily be calculated by looking up all papers by a given author on a database such as NASA/ADS, sorting them by (decreasing) number of citations, and working down the list to the point where the number of citations of a paper falls below the number representing position in the list. Normalized quantities – obtained by dividing the number of citations a paper receives by the number of authors of that paper for each paper – can be used to form an alternative measure.

Here is the abstract of the paper:

Here are a couple of graphs which back up the claim of a near-perfect correlation between H-index and total citations:

The figure shows both total citations (right) and normalized citations (left); the latter, in my view, a much more sensible measure of individual contributions. The basic problem of course is that people don’t get citations, papers do. Apportioning appropriate credit for a multi-author paper is therefore extremely difficult. Does each author of a 100-author paper that gets 100 citations really deserve the same credit as a single author of a paper that also gets 100 citations? Clearly not, yet that’s what happens if you count total citations.

The correlation between H index and the square root of total citation numbers has been remarked upon before, but it is good to see it confirmed for the particular field of astrophysics.

Although I’m a bit unclear as to how the “sample” was selected I think this paper is a valuable contribution to the discussion, and I hope it helps counter the growing, and in my opinion already excessive, reliance on the H-index by grants panels and the like. Trying to condense all the available information about an applicant into a single number is clearly a futile task, and this paper shows that using H-index and total numbers doesn’t add anything as they are both measuring exactly the same thing.

A very interesting question emerges from this, however, which is why the relationship between total citation numbers and h-index has the form it does: the latter is always roughly half of the square-root of the former. This suggests to me that there might be some sort of scaling law describing onto which the distribution of cites-per-paper can be mapped for any individual. It would be interesting to construct a mathematical model of citation behaviour that could reproduce this apparently universal property….

15 Responses to “The H-index is Redundant…”

  1. “…already excessive, reliance on the H-index by grants panels…”

    as a member of a grants panel – which i believe has never considered H-indexes during its reviews… you may want to make that clear here.

    as you say – i know of no one who thinks H is an independent measure of “impact” compared to total citations. for me the best measure is the median rank of an individual’s papers against all high-impact journal articles in that general field from the same period (month, months or years depending upon the age of the article).

    …but i agree that the lower index of the H vs citations is potentially an interesting insight into what the former is measuring.

    • telescoper Says:

      Sure. I don’t believe anyone on the AGP looks at the h-index, but I at least do look at recent citation activity – really just to see if there is any!

      I know other panels, international ones primarily, which use the h-index more extensively.

      P.S….presume you mean “power index” in your last sentence?

  2. […] “An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) …” (more) […]

  3. Alan Heavens Says:

    If you publish at a uniform rate of R papers per year, and each accrues citations at a uniform rate of S per year, then (ignoring the fact that H must be an integer) I think you expect H^2 = A C, where C=total number of citations, and A=2r/(1+r)^2, where r=R/S. H^2/C has a maximum of 1/2, so we might expect
    H= a sqrt(C), with a<0.7

    • telescoper Says:

      Yes, a simple model like that can reproduce the mean relationship, but there’s clearly a large spread in S for different papers in reality so you would expect a considerable scatter; it’s not obvious why the relationship is so tight (at least not to me)..

      • Alan Heavens Says:

        Well, changing r by a factor of 100, from 0.1 to 10, changes the value of a from 0.4 up to 0.7 and back to 0.4, so it’s not very sensitive. On the other hand, perhaps there’s a correlation between how many papers people write per year and how many citations per year each one gets…

      • telescoper Says:

        ..particularly if there are many self-citations.

  4. Anton Garrett Says:

    I thought H was the negative of the author’s entropy. Any relation…?

  5. “Trying to condense all the available information about an applicant into a single number is clearly a futile task”

    It would seem bizarre if we weren’t so used to it: percentage marks for school and university assessment, GDP for measuring how well the country is doing, etc.

    • telescoper Says:

      It seems bizarre to me, nevertheless. Compressing a student’s academic record into one number always seems to me to be pointless, but we go on doing it…

    • Anton Garrett Says:

      The *existence* of such a measure is implied by the fact that we are prepared to rate candidates in order. Whether such a measure can be made explicit is another matter.

      • Anton Garrett Says:

        Phillip: The intransitive situation to which you refer definitely implies that no such ranking exists.

        In chess, are such things really consistent? Or is it because A is a better player than B who is a better player than C, but A always plays at a lower standard vs C because of non-chess factors such as bribery or threat?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: