The H-index is Redundant…

An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) which has been the subject of a number of previous blog posts on here. The author’s surname is pronounced “sprout”, by the way.

The H-index is defined to be the largest number H such that the author has written at least H papers having H citations. It can easily be calculated by looking up all papers by a given author on a database such as NASA/ADS, sorting them by (decreasing) number of citations, and working down the list to the point where the number of citations of a paper falls below the number representing position in the list. Normalized quantities – obtained by dividing the number of citations a paper receives by the number of authors of that paper for each paper – can be used to form an alternative measure.

Here is the abstract of the paper:

Here are a couple of graphs which back up the claim of a near-perfect correlation between H-index and total citations:

The figure shows both total citations (right) and normalized citations (left); the latter, in my view, a much more sensible measure of individual contributions. The basic problem of course is that people don’t get citations, papers do. Apportioning appropriate credit for a multi-author paper is therefore extremely difficult. Does each author of a 100-author paper that gets 100 citations really deserve the same credit as a single author of a paper that also gets 100 citations? Clearly not, yet that’s what happens if you count total citations.

The correlation between H index and the square root of total citation numbers has been remarked upon before, but it is good to see it confirmed for the particular field of astrophysics.

Although I’m a bit unclear as to how the “sample” was selected I think this paper is a valuable contribution to the discussion, and I hope it helps counter the growing, and in my opinion already excessive, reliance on the H-index by grants panels and the like. Trying to condense all the available information about an applicant into a single number is clearly a futile task, and this paper shows that using H-index and total numbers doesn’t add anything as they are both measuring exactly the same thing.

A very interesting question emerges from this, however, which is why the relationship between total citation numbers and h-index has the form it does: the latter is always roughly half of the square-root of the former. This suggests to me that there might be some sort of scaling law describing onto which the distribution of cites-per-paper can be mapped for any individual. It would be interesting to construct a mathematical model of citation behaviour that could reproduce this apparently universal property….

18 Responses to “The H-index is Redundant…”

  1. “…already excessive, reliance on the H-index by grants panels…”

    as a member of a grants panel – which i believe has never considered H-indexes during its reviews… you may want to make that clear here.

    as you say – i know of no one who thinks H is an independent measure of “impact” compared to total citations. for me the best measure is the median rank of an individual’s papers against all high-impact journal articles in that general field from the same period (month, months or years depending upon the age of the article).

    …but i agree that the lower index of the H vs citations is potentially an interesting insight into what the former is measuring.

    • telescoper Says:

      Sure. I don’t believe anyone on the AGP looks at the h-index, but I at least do look at recent citation activity – really just to see if there is any!

      I know other panels, international ones primarily, which use the h-index more extensively.

      P.S….presume you mean “power index” in your last sentence?

  2. […] “An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) …” (more) […]

  3. Alan Heavens Says:

    If you publish at a uniform rate of R papers per year, and each accrues citations at a uniform rate of S per year, then (ignoring the fact that H must be an integer) I think you expect H^2 = A C, where C=total number of citations, and A=2r/(1+r)^2, where r=R/S. H^2/C has a maximum of 1/2, so we might expect
    H= a sqrt(C), with a<0.7

    • telescoper Says:

      Yes, a simple model like that can reproduce the mean relationship, but there’s clearly a large spread in S for different papers in reality so you would expect a considerable scatter; it’s not obvious why the relationship is so tight (at least not to me)..

      • Alan Heavens Says:

        Well, changing r by a factor of 100, from 0.1 to 10, changes the value of a from 0.4 up to 0.7 and back to 0.4, so it’s not very sensitive. On the other hand, perhaps there’s a correlation between how many papers people write per year and how many citations per year each one gets…

      • telescoper Says:

        ..particularly if there are many self-citations.

  4. Anton Garrett Says:

    I thought H was the negative of the author’s entropy. Any relation…?

  5. [Insert general caveats about bibliometry here.]

    Even if it “measures the same thing”, the H index is easier to compute, since only the most popular papers need to be considered.

    If one really does have to compress the assessment of a person’s relevance in the citation game to a single number, the g index is probably better than the h index.

  6. “Trying to condense all the available information about an applicant into a single number is clearly a futile task”

    It would seem bizarre if we weren’t so used to it: percentage marks for school and university assessment, GDP for measuring how well the country is doing, etc.

    • telescoper Says:

      It seems bizarre to me, nevertheless. Compressing a student’s academic record into one number always seems to me to be pointless, but we go on doing it…

    • Anton Garrett Says:

      The *existence* of such a measure is implied by the fact that we are prepared to rate candidates in order. Whether such a measure can be made explicit is another matter.

      • Good point. Could such a ranking ever reflect one which sometimes happens in chess: A consistently beats B, B consistently beats C but C consistently beats A?

        Despite the shortcomings of concrete indices, there needs to be some objective criterion or collection of such. Otherwise, the employer can hire whomever he wants saying that his decision was based on the interview. Perhaps OK in more cases if this is a business, but definitely not OK with public money.

        I like the Scandinavian system: First, there are external people who rank the candidates. Second, the information is public, at least to all who applied. While it doesn’t have to be a single number, the external people have to rank the candidates in order and say how they arrived at this order.

      • Anton Garrett Says:

        Phillip: The intransitive situation to which you refer definitely implies that no such ranking exists.

        In chess, are such things really consistent? Or is it because A is a better player than B who is a better player than C, but A always plays at a lower standard vs C because of non-chess factors such as bribery or threat?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: