Index Rerum

Following on from yesterday’s post about the forthcoming Research Excellence Framework that plans to use citations as a measure of research quality, I thought I would have a little rant on the subject of bibliometrics.

Recently one particular measure of scientific productivity has established itself as the norm for assessing job applications, grant proposals and for other related tasks. This is called the h-index, named after the physicist Jorge Hirsch, who introduced it in a paper in 2005. This is quite a simple index to define and to calculate (given an appropriately accurate bibliographic database). The definition  is that an individual has an h-index of  h if that individual has published h papers with at least h citations. If the author has published N papers in total then the other N-h must have no more than h citations. This is a bit like the Eddington number.  A citation, as if you didn’t know,  is basically an occurrence of that paper in the reference list of another paper.

To calculate it is easy. You just go to the appropriate database – such as the NASA ADS system – search for all papers with a given author and request the results to be returned sorted by decreasing citation count. You scan down the list until the number of citations falls below the position in the ordered list.

Incidentally, one of the issues here is whether to count only refereed journal publications or all articles (including books and conference proceedings). The argument in favour of the former is that the latter are often of lower quality. I think that is in illogical argument because good papers will get cited wherever they are published. Related to this is the fact that some people would like to count “high-impact” journals only, but if you’ve chosen citations as your measure of quality the choice of journal is irrelevant. Indeed a paper that is highly cited despite being in a lesser journal should if anything be given a higher weight than one with the same number of citations published  in, e.g., Nature. Of course it’s just a matter of time before the hideously overpriced academic journals run by the publishing mafia go out of business anyway so before long this question will simply vanish.

The h-index has some advantages over more obvious measures, such as the average number of citations, as it is not skewed by one or two publications with enormous numbers of hits. It also, at least to some extent, represents both quantity and quality in a single number. For whatever reasons in recent times h has undoubtedly become common currency (at least in physics and astronomy) as being a quick and easy measure of a person’s scientific oomph.

Incidentally, it has been claimed that this index can be fitted well by a formula h ~ sqrt(T)/2 where T is the total number of citations. This works in my case. If it works for everyone, doesn’t  it mean that h is actually of no more use than T in assessing research productivity?

Typical values of h vary enormously from field to field – even within each discipline – and vary a lot between observational and theoretical researchers. In extragalactic astronomy, for example, you might expect a good established observer to have an h-index around 40 or more whereas some other branches of astronomy have much lower citation rates. The top dogs in the field of cosmology are all theorists, though. People like Carlos Frenk, George Efstathiou, and Martin Rees all have very high h-indices.  At the extreme end of the scale, string theorist Ed Witten is in the citation stratosphere with an h-index well over a hundred.

I was tempted to put up examples of individuals’ h-numbers but decided instead just to illustrate things with my own. That way the only person to get embarrased is me. My own index value is modest – to say the least – at a meagre 27 (according to ADS).   Does that mean Ed Witten is four times the scientist I am? Of course not. He’s much better than that. So how exactly should one use h as an actual metric,  for allocating funds or prioritising job applications,  and what are the likely pitfalls? I don’t know the answer to the first one, but I have some suggestions for other metrics that avoid some of its shortcomings.

One of these addresses an obvious deficiency of h. Suppose we have an individual who writes one brilliant paper that gets 100 citations and another who is one author amongst 100 on another paper that has the same impact. In terms of total citations, both papers register the same value, but there’s no question in my mind that the first case deserves more credit. One remedy is to normalise the citations of each paper by the number of authors, essentially sharing citations equally between all those that contributed to the paper. This is quite easy to do on ADS also, and in my case it gives  a value of 19. Trying the same thing on various other astronomers, astrophysicists and cosmologists reveals that the h index of an observer is likely to reduce by a factor of 3-4 when calculated in this way – whereas theorists (who generally work in smaller groups) suffer less. I imagine Ed Witten’s index doesn’t change much when calculated on a normalized basis, although I haven’t calculated it myself.

Observers  complain that this normalized measure is unfair to them, but I’ve yet to hear a reasoned argument as to why this is so. I don’t see why 100 people should get the same credit for a single piece of work:  it seems  like obvious overcounting to me.

Another possibility – if you want to measure leadership too – is to calculate the h index using only those papers on which the individual concerned is the first author. This is  a bit more of a fiddle to do but mine comes out as 20 when done in this way.  This is considerably higher than most of my professorial colleagues even though my raw h value is smaller. Using first author papers only is also probably a good way of identifying lurkers: people who add themselves to any paper they can get their hands on but never take the lead. Mentioning no names of  course.  I propose using the ratio of  unnormalized to normalized h-indices as an appropriate lurker detector…

Finally in this list of bibliometrica is the so-called g-index. This is defined in a slightly more complicated way than h: given a set of articles ranked in decreasing order of citation numbers, g is defined to be the largest number such that the top g articles altogether received at least g2 citations. This is a bit like h but takes extra account of the average citations of the top papers. My own g-index is about 47. Obviously I like this one because my number looks bigger, but I’m pretty confident others go up even more than mine!

Of course you can play with these things to your heart’s content, combining ideas from each definition: the normalized g-factor, for example. The message is, though, that although h definitely contains some information, any attempt to condense such complicated information into a single number is never going to be entirely successful.

Comments, particularly with suggestions of alternative metrics are welcome via the box. Even from lurkers.


26 Responses to “Index Rerum”

  1. Haley Gomez Says:

    These things always depress me as my h-index is low! I’m also clearly not a leader since using first author publications only gives an even lower value (although my author lists are not horrendously large, between 2-9). There is also an issue about younger scientists here, since the h-index does not represent age or years since first publication. A significant fraction (approx. 30%) of my publications were published during this year and have not had much time to garner citations yet (they may never be cited of course, but it does take time to collect compared to say, someone with 20 years of paper writing). I wonder if this could be taken into account? The h-index original article by Hirsch suggests the m-index (or slope) could be used as a measure of quality where h = mn and n is the number of years since the first paper. In my case, the slope would be 1.2 (compared to Stephen Hawking’s 1.6); a slope of 2 suggests outstanding individuals. It’s funny that although >80% of Nobel Prize winners had h-index values >30, 50% of the same group had m < 1 (according to Hirsch).

  2. Haley,

    Yes, the h index can’t decrease as you get older so it benefits the older folks. Another way of looking at it is to look at the h-index over a fixed window (say the past five years). Anyone who had a big impact paper years ago but who hasn’t done anything much since would be shown up in such a measure.


  3. Anton Garrett Says:


    Has anyone proposed a theory to account for h ~ sqrt(T)/2?

    And – and to rabble-rouse a la Woit and Smolin – Ed Witten might be far more than four times the mathematician you are, but is he a physicist?


    • Anton,

      You reminded me I forgot to include a link to the blog that makes this claim. I’ve put it there now, but it doesn’t propose a theory.

      As for Witten, I couldn’t possibly comment.


  4. John Peacock Says:

    I like the “lurker” term. It’s a problem, and one that will get worse. One might try to get round it by looking for first authorship, but of course this disadvantages scientists who unselfishly let their students and postdocs go first. In some collaborations, you can perform an elaborate piece of textual analysis and infer someone’s contribution by where they are in the author list order – which can often consist of three alphabetically ordered sublists. But what will we do when Planck publishes? I gather all Planck papers will be strictly alphabetical.

    The answer, I guess, is that we will be where experimental particle physics has been for decades. There, whether you get a job or not depends on getting a good letter from one of the Czars. But how you get to be a Czar in the first place is unclear, as is whether this status is bestowed by scientific quality or just energy and force of personality.

  5. John,

    An alternative would be to require a set of authors to agree on a weighting to be assigned with the author list, i.e. Joe Smith (0.5), Herbert Bloggs (0.45) and Freddy Freeloader (0.05). That would cause some interesting internal ructions in most collaborations.

    In my own case I always put students and postdocs first unless there’s a good (and mutually agreed) reason not to. I think the reason why my first author h isn’t much lower than my ordinary one is that most of my higher impact papers were done when I was more junior and I worked with people who were nice enough to let me go first then.


  6. I find these discussions worrying because citation measures have been shown to be considerably biased by, e.g., the proximity to the top of the daily arXiv listing. Whether a paper is cited or not may also depend (sadly) on regional location of the authors (Europe versus US) or how aggressive some authors are in sending “cite me” emails.

    None of these things have any relation to the quality of the actual science that is being done. I don’t have any ideas for a better system that do not entail much more work on the part of job or funding reviewers, but it would be nice to at least keep this issues in mind.

  7. I’ve looked at my citation statistics for the first time. My g-index is 34. If yours is 47 and you have a chair, can I have 3/4 of a chair? Or doesn’t it work like that?

  8. Re: Haley Gomez’s point about the gradient of the h-index. I did a summer project between 3rd and 4th year as an undergrad, this resulted in a publication. Now 8 years later my m value including this is 1.25, excluding that one particular paper it becomes 1.8. Is that any good? I don’t know, which is a better representation of my scientific output, dunno.

    I like Peter’s suggestion of an h-index over a particular period (say the last 5 years), although it could produce a bias towards papers that gather a lot of citations in the first few years compared to “slow burning” papers. Although I’m not sure what particular subject areas that would skew towards.

    In astonomy it’s pretty difficult to get a handle on exactly how good you are. Especially if you are a PhD student or a postdoc, where your future job prospects depend on how good your research is. Things like h-index give a straw to grasp at, giving a vague doing OK/could do better estimate. While I’d like one of the 20+ parameter profiles I’m used to from my obsession with Football Manager, that’s not going to happen so the h-index is about the only feedback I can get.

  9. grandpa boris Says:

    Long long time ago, when I was attending a major university in the US, a friend in the philosophy department informed me that to get high academic standing for a philosophy faculty it wasn’t important to be right or brilliant or insightful. The standing was determined by the number of citations in other people’s work. All citations counted, including the ones in vitriolic debunking and criticism of the cited work. There for, being incredibly wrong and stupid was better than being brilliant and right, because wrongness and stupidity provoked an avalanche of papers that disagreed with the cited work. My friend was convinced that most of his instructors got their tenure through exactly this process and it explained everything about the state of the philosophy department at our school.

  10. Bryn, I didn’t realise you had 34 papers, but that’s because searching for Bryn Jones doesn’t find many. That’s one of the other worries with bibliometrics: name changes (marriages) and surnames like Jones are difficult to deal with.

  11. Russell Smith Says:

    So who has the largest lurker statistic? I think it would be hard to beat one I found with a raw h-index of 98, and a first-author h-index of zero (think SDSS).

  12. That one obviously produces an overflow error. Perhaps we should define a lurker quotient (h-h1)/h, where h1 is the first author h index. Your chap is then 100% lurker. My own lurker quotient is about 25%.

    SDSS must be an example of a case in which even though there are a lot of authors, normalising the citation statistic would probably still give a good dollop to each one because the total number is so large.

  13. If you type Jones into the ADS you fnd I have written 9400 papers, the first from the year 1737. Can I have a vice-chancellorship, please?

    Seriously, several pertinent criticisms of citation counts have been made here. It is a system that can have relevance to established researchers, but citation counts significantly underestimate the research value of people who have started publishing within the past several years. Measures of past research productivity are of relevance in judging a person’s future research potential only if that person has had the opportunities to engage in active research that can produce publications. They enormously underestimate the value of people who have been employed mostly in project support.

    A significant career problem for me was that the kind of work I did as a PhD student and in some postdoctoral positions was determined by other people, and very little emphasis was given by them to publishing results, against my wishes and judgement. In some cases this was because my work was concerned with supporting longer-term activities. My publication output in the early part of my career was consequently low, which effectively has prevented me from establishing a career.

    There are lessons here. For a young researcher to have a chance of a career in science, that person must work for a research manager who does give priority to completing projects and publishing. There is no point expecting that working in project support will lead to a long-term career in research: such jobs should be avoided by the ambitious. Working in popular, active new fields may produce publications with lots of citations over the following few years, even if those papers will not always stand the test of time. So working with new instrument or satellite data may get some immediate papers with lots of citations, but working on a difficult project in a long-established area will not bring useful career advantage.

    I am recommending here that people try to play a game, even if selecting the PhD supervisors and project leaders who will give career support is diffcult due to the lack of relevant information. A strategic problem with this is that it may hit recruitment to some postdoctoral positions (mostly support jobs) and PhD studentships (in rather traditional areas). But that would be a consequence of the career system and hierarchies that exist in academia.

  14. Bryn

    What you say is true and it should serve as a warning to those thinking about entering a career in research that they should try to select projects where they can start to show leadership at an early stage. It may sound attractive to get a PhD place to work on a big international project but that makes it very difficult to emerge as a research leader in your own right.


  15. Anton Garrett Says:

    Why not use the Duckworth-Lewis method?

    The pressure to find an “objective measure” comes from the need to justify choices of who to hire to third parties, specifically funding bodies. But it is impossible to remove all subjectivity – you can only replace subjectivity in who to hire with subjectivity in the choice of metric (as this discussion shows).


  16. My warning applies equally to small research teams as well (my own experience was mostly in smaller groups). The challenge for young researchers is to work for research leaders who will offer career support and who recognise that support needs to be given. The problem is over identifying those research managers who do encourage young researchers to publish the papers needed to advance a career, who recognise that giving freedom to talented researchers can be productive, who will support fellowship applications. Choosing an environment, if choice is possible, which will allow a young researcher to publish papers that will be noticed, is critical.

    The issue is how can those managers be spotted?

    • Publication data on all potential PhD supervisors are freely available these days. I encourage all prospective PhD students to check the bibliometric information to see how the supervisors students and postdocs have fared in the past…

  17. Anton.

    I think the problem is not just that the choice of metric is subjective, it’s also a fact that nobody really knows what we’re trying to measure anyway. Impact may be measurable through citations, but I don’t think that’s the same think as quality. What I think of as my “best” papers are certainly not the most cited ones but even I don’t really know what I mean by “best”.


  18. Anton Garrett Says:

    The deepest knowledge is picked up not from books but from the Apprenticeship system, and is tacit and not codifiable in books. That is the sort of knowledge by which you know what your best work is.

  19. I believe there can’t be an objective, quantitative, easy and flawless scientific metric. We can invent a metric and force people to comply (via funding distribution). But this does not necessarily mean that science output will be improved.

    Whenever I am given this hard problem, I try to go back to the basics, like reading the papers (or a selection of them). It takes longer than computing a h (or g, or m or..) index, but it is much more satisfying (and one learns a lot). Sadly, this does not fix the “lurker” issue, but a quick chat with the interested person will go a long way.

    And yes I also tend to let students and postdocs go first. I think it’s good practice. Grant panels reviews have complained abut my lack of first author papers. But, since I have a job, I can afford not to care about that.

    Glad to see that these issues are coming out in the open. Any funding agencies representative reading this blog?

  20. Dear Licia,

    I don’t know if representatives of any funding agencies read this blog; certainly none have admitted to it!

    Perhaps one should just ask PDRA applicants to select what they think are their 3-4 best papers so the panel can just read them?

    I was being a little facetious about the “lurker” question, although there are examples that spring to mind. Behind this, though, is the important question of how best to give credit to astronomical instrumentation developers. They do a vital job, often brilliantly, but because they make gear rather than write papers they don’t get a fair crack of the whip (in my view). They tend to be added to papers that exploit their instruments, but I have the impression that funding panels tend to underestimate the value of their contributions.


  21. […] Excellence. This is measured solely on the basis of citations – I’ve discussed some of the issues with that before – and counts 20%. They choose to use an unreliable database called SCOPUS, run by the […]

  22. […] of the metrics – is not so much a count of the number of refereed papers, but the number of citations the papers have attracted. Papers begin to attract citations – through the arXiv – long […]

  23. […] I’ve posted before about the difficulties and dangers of using citation statistics as measure of research output as […]

  24. […] are many issues with the use of citation counts, some of which I’ve blogged about before, but I was interested to read another article in the Times Higher, in this weeks issue, commenting […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: