## The H-index is Redundant…

An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) which has been the subject of a number of previous blog posts on here. The author’s surname is pronounced “sprout”, by the way.

The H-index is defined to be the largest number H such that the author has written at least H papers having H citations. It can easily be calculated by looking up all papers by a given author on a database such as NASA/ADS, sorting them by (decreasing) number of citations, and working down the list to the point where the number of citations of a paper falls below the number representing position in the list. Normalized quantities – obtained by dividing the number of citations a paper receives by the number of authors of that paper for each paper – can be used to form an alternative measure.

Here is the abstract of the paper:

Here are a couple of graphs which back up the claim of a near-perfect correlation between H-index and total citations:

The figure shows both total citations (right) and normalized citations (left); the latter, in my view, a much more sensible measure of individual contributions. The basic problem of course is that people don’t get citations, *papers* do. Apportioning appropriate credit for a multi-author paper is therefore extremely difficult. Does each author of a 100-author paper that gets 100 citations really deserve the same credit as a single author of a paper that also gets 100 citations? Clearly not, yet that’s what happens if you count total citations.

The correlation between H index and the square root of total citation numbers has been remarked upon before, but it is good to see it confirmed for the particular field of astrophysics.

Although I’m a bit unclear as to how the “sample” was selected I think this paper is a valuable contribution to the discussion, and I hope it helps counter the growing, and in my opinion already excessive, reliance on the H-index by grants panels and the like. Trying to condense all the available information about an applicant into a single number is clearly a futile task, and this paper shows that using H-index and total numbers doesn’t add anything as they are both measuring exactly the same thing.

A very interesting question emerges from this, however, which is why the relationship between total citation numbers and h-index has the form it does: the latter is always roughly half of the square-root of the former. This suggests to me that there might be some sort of scaling law describing onto which the distribution of cites-per-paper can be mapped for any individual. It would be interesting to construct a mathematical model of citation behaviour that could reproduce this apparently universal property….

Follow @telescoper
January 28, 2012 at 2:15 pm

“…already excessive, reliance on the H-index by grants panels…”

as a member of a grants panel – which i believe has never considered H-indexes during its reviews… you may want to make that clear here.

as you say – i know of no one who thinks H is an independent measure of “impact” compared to total citations. for me the best measure is the median rank of an individual’s papers against all high-impact journal articles in that general field from the same period (month, months or years depending upon the age of the article).

…but i agree that the lower index of the H vs citations is potentially an interesting insight into what the former is measuring.

January 28, 2012 at 2:27 pm

Sure. I don’t believe anyone on the AGP looks at the h-index, but I at least do look at recent citation activity – really just to see if there is any!

I know other panels, international ones primarily, which use the h-index more extensively.

P.S….presume you mean “power index” in your last sentence?

January 28, 2012 at 2:20 pm

[…] “An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) …” (more) […]

January 28, 2012 at 2:59 pm

If you publish at a uniform rate of R papers per year, and each accrues citations at a uniform rate of S per year, then (ignoring the fact that H must be an integer) I think you expect H^2 = A C, where C=total number of citations, and A=2r/(1+r)^2, where r=R/S. H^2/C has a maximum of 1/2, so we might expect

H= a sqrt(C), with a<0.7

January 28, 2012 at 3:20 pm

Yes, a simple model like that can reproduce the mean relationship, but there’s clearly a large spread in S for different papers in reality so you would expect a considerable scatter; it’s not obvious why the relationship is so tight (at least not to me)..

January 28, 2012 at 3:58 pm

Well, changing r by a factor of 100, from 0.1 to 10, changes the value of a from 0.4 up to 0.7 and back to 0.4, so it’s not very sensitive. On the other hand, perhaps there’s a correlation between how many papers people write per year and how many citations per year each one gets…

January 28, 2012 at 5:30 pm

..particularly if there are many self-citations.

January 28, 2012 at 6:00 pm

I thought H was the negative of the author’s entropy. Any relation…?

January 30, 2012 at 1:41 am

No, that is the S-index..

January 30, 2012 at 9:12 am

That reminds me that I really should do a blog post about why the H-theorem is bollocks.

January 30, 2012 at 12:26 pm

“Trying to condense all the available information about an applicant into a single number is clearly a futile task”

It would seem bizarre if we weren’t so used to it: percentage marks for school and university assessment, GDP for measuring how well the country is doing, etc.

January 30, 2012 at 12:29 pm

It seems bizarre to me, nevertheless. Compressing a student’s academic record into one number always seems to me to be pointless, but we go on doing it…

February 3, 2012 at 1:41 pm

The *existence* of such a measure is implied by the fact that we are prepared to rate candidates in order. Whether such a measure can be made explicit is another matter.

February 3, 2012 at 7:37 pm

Phillip: The intransitive situation to which you refer definitely implies that no such ranking exists.

In chess, are such things really consistent? Or is it because A is a better player than B who is a better player than C, but A always plays at a lower standard vs C because of non-chess factors such as bribery or threat?

October 4, 2014 at 8:10 pm

See http://www.ams.org/notices/201409/rnoti-p1040.pdf for a mathematical model