Archive for bibliometrics

Measuring the lack of impact of journal papers

Posted in Open Access with tags , , , on February 4, 2016 by telescoper

I’ve been involved in a depressing discussion on the Astronomers facebook page, part of which was about the widespread use of Journal Impact factors by appointments panels, grant agencies, promotion committees, and so on. It is argued (by some) that younger researchers should be discouraged from publishing in, e.g., the Open Journal of Astrophysics, because it doesn’t have an impact factor and they would therefore be jeopardising their research career. In fact it takes two years for new journal to acquire an impact factor so if you take this advice seriously nobody should ever publish in any new journal.

For the record, I will state that no promotion committee, grant panel or appointment process I’ve ever been involved in has even mentioned impact factors. However, it appears that some do, despite the fact that they are demonstrably worse than useless at measuring the quality of publications. You can find comprehensive debunking of impact factors and exposure of their flaws all over the internet if you care to look: a good place to start is Stephen Curry’s article here.  I’d make an additional point here, which is that the impact factor uses citation information for the journal as a whole as a sort of proxy measure of the research quality of papers publish in it. But why on Earth should one do this when citation information for each paper is freely available? Why use a proxy when it’s trivial to measure the real thing?

The basic statistical flaw behind impact factors is that they are based on the arithmetic mean number of citations per paper. Since the distribution of citations in all journals is very skewed, this number is dragged upwards by a few papers with extremely large numbers of citations. In fact, most papers published have many few citations than the impact factor of a journal. It’s all very misleading, especially when used as a marketing tool by cynical academic publishers.

Thinking about this on the bus on my way into work this morning I decided to suggest a couple of bibliometric indices that should help put impact factors into context. I urge relevant people to calculate these for their favourite journals:

  • The Dead Paper Fraction (DPF). This is defined to be the fraction of papers published in the journal that receive no citations at all in the census period.  For journals with an impact factor of a few, this is probably a majority of the papers published.
  • The Unreliability of Impact Factor Factor (UIFF). This is defined to be the fraction of papers with fewer citations than the Impact Factor. For many journals this is most of their papers, and the larger this fraction is the more unreliable their Impact Factor is.

Another usefel measure for individual papers is

  • The Corrected Impact Factor. If a paper with a number N of actual citations is published in a journal with impact factor I then the corrected impact factor is C=N-I. For a deeply uninteresting paper published in a flashily hyped journal this will be large and negative, and should be viewed accordingly by relevant panels.

Other suggestions for citation metrics less stupid than the impact factor are welcome through the comments box…



How do physicists and astronomers team up to write research papers?

Posted in Science Politics with tags , on October 16, 2013 by telescoper

Busy busy today so just time to reblog this, an interesting article about the irresistible rise of the multi-author paper. What fraction of the “authors” actually play any role at all in writing these papers? Am I the only one that thinks this has very profound implications for the way we interpret bibliometric analyses?

Data mines

The way in which physicists and  astronomers team up to write technical papers has changed over the years, and not only is it interesting to look at this behavior for its own sake, but by analyzing the data it may be possible to better understand what role, if any, does the number of authors  have on the scientific impact of a paper. Likewise, such an analysis can allow physics and astronomy journals to make decisions about their publishing policies.

I was curious about the trends in the number of authors per refereed astronomy paper, so I set out to write an R script that would read in data from the NASA Astrophysics Data System, an online database of both refereed and non-refereed academic papers in astronomy and physics. The script counts the monthly number of refereed astronomy and physics papers between January 1967 and September 2013, as well as…

View original post 670 more words

The Impact X-Factor

Posted in Bad Statistics, Open Access with tags , , on August 14, 2012 by telescoper

Just time for a quick (yet still rather tardy) post to direct your attention to an excellent polemical piece by Stephen Curry pointing out the pointlessness of Journal Impact Factors. For those of you in blissful ignorance about the statistical aberration that is the JIF, it’s basically a measure of the average number of citations attracted by a paper published in a given journal. The idea is that if you publish a paper in a journal with a large JIF then it’s in among a number of papers that are highly cited and therefore presumably high quality. Using a form of Proof by Association, your paper must therefore be excellent too, hanging around with tall people being a tried-and-tested way of becoming tall.

I won’t repeat all Stephen Curry’s arguments as to why this is bollocks – read the piece for yourself – but one of the most important is that the distribution of citations per paper is extremely skewed, so the average is dragged upwards by a few papers with huge numbers of citations. As a consequence most papers published in a journal with a large JIF attract many fewer citations than the average. Moreover, modern bibliometric databases make it quite easy to extract citation information for individual papers, which is what is relevant if you’re trying to judge the quality impact of a particular piece of work, so why bother with the JIF at all?

I will however copy the summary, which is to the point:

So consider all that we know of impact factors and think on this: if you use impact factors you are statistically illiterate.

  • If you include journal impact factors in the list of publications in your cv, you are statistically illiterate.
  • If you are judging grant or promotion applications and find yourself scanning the applicant’s publications, checking off the impact factors, you are statistically illiterate.
  • If you publish a journal that trumpets its impact factor in adverts or emails, you are statistically illiterate. (If you trumpet that impact factor to three decimal places, there is little hope for you.)
  • If you see someone else using impact factors and make no attempt at correction, you connive at statistical illiteracy.

Statistical illiteracy is by no means as rare among scientists as we’d like to think, but at least I can say that I pay no attention whatsoever to Journal Impact Factors. In fact I don’t think many people in in astronomy or astrophysics use them at all. I’d be interested to hear from anyone who does.

I’d like to add a little coda to Stephen Curry’s argument. I’d say that if you publish a paper in a journal with a large JIF (e.g. Nature) but the paper turns out to attract very few citations then the paper should be penalised in a bibliometric analysis, rather like the handicap system used in horse racing or golf. If, despite the press hype and other tedious trumpetings associated with the publication of a Nature paper, the work still attracts negligible interest then it must really be a stinker and should be rated as such by grant panels, etc. Likewise if you publish a paper in a less impactful journal which nevertheless becomes a citation hit then it should be given extra kudos because it has gained recognition by quality alone.

Of course citation numbers don’t necessarily mean quality. Many excellent papers are slow burners from a bibliometric point of view. However, if a journal markets itself as being a vehicle for papers that are intended to attract large citation counts and a paper published there flops then I think it should attract a black mark. Hoist it on its own petard, as it were.

So I suggest papers be awarded an Impact X-Factor, based on the difference between its citation count and the JIF for the journal. For most papers this will of course be negative, which would serve their authors right for mentioning the Impact Factor in the first place.

PS. I chose the name “X-factor” as in the TV show precisely for its negative connotations.

The H-index is Redundant…

Posted in Bad Statistics, Science Politics with tags , , , , , on January 28, 2012 by telescoper

An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) which has been the subject of a number of previous blog posts on here. The author’s surname is pronounced “sprout”, by the way.

The H-index is defined to be the largest number H such that the author has written at least H papers having H citations. It can easily be calculated by looking up all papers by a given author on a database such as NASA/ADS, sorting them by (decreasing) number of citations, and working down the list to the point where the number of citations of a paper falls below the number representing position in the list. Normalized quantities – obtained by dividing the number of citations a paper receives by the number of authors of that paper for each paper – can be used to form an alternative measure.

Here is the abstract of the paper:

Here are a couple of graphs which back up the claim of a near-perfect correlation between H-index and total citations:

The figure shows both total citations (right) and normalized citations (left); the latter, in my view, a much more sensible measure of individual contributions. The basic problem of course is that people don’t get citations, papers do. Apportioning appropriate credit for a multi-author paper is therefore extremely difficult. Does each author of a 100-author paper that gets 100 citations really deserve the same credit as a single author of a paper that also gets 100 citations? Clearly not, yet that’s what happens if you count total citations.

The correlation between H index and the square root of total citation numbers has been remarked upon before, but it is good to see it confirmed for the particular field of astrophysics.

Although I’m a bit unclear as to how the “sample” was selected I think this paper is a valuable contribution to the discussion, and I hope it helps counter the growing, and in my opinion already excessive, reliance on the H-index by grants panels and the like. Trying to condense all the available information about an applicant into a single number is clearly a futile task, and this paper shows that using H-index and total numbers doesn’t add anything as they are both measuring exactly the same thing.

A very interesting question emerges from this, however, which is why the relationship between total citation numbers and h-index has the form it does: the latter is always roughly half of the square-root of the former. This suggests to me that there might be some sort of scaling law describing onto which the distribution of cites-per-paper can be mapped for any individual. It would be interesting to construct a mathematical model of citation behaviour that could reproduce this apparently universal property….

Google Citations

Posted in Uncategorized with tags , , on November 18, 2011 by telescoper

Just time for a quick post this morning to pass on the news that Google Citations is now openly available. I just had a quick look at my own bibliometric data and, as far as I can tell, it’s pretty accurate. As well as total citations, Google Scholar also produces an h-index and something called the i10-index (which is just the number of papers with more than 10 citations). It also gives the corresponding figures for the past 5 years as well as for the entire career of a given researcher.

I’ve bragged blogged already about my most popular paper citation-wise, which has 287 citations on Google Scholar, which doesn’t exactly make it a world-beater but I’m still quite please with its impact. What I find particularly interesting about that paper is its longevity. This paper was published in 1991, i.e. 20 years ago, but I  recently looked on the ADS system at its citation history and found the following:

Curiously, it’s getting more citations now than it did when it was first published. I’ve got quite a few “slow burners” like this, in fact, and many of the citations listed for me in the last 5 years actually stem from papers written much earlier. Unfortunately, although I think this steady rate of citation is some sort of indicator of something or other, this is exactly the wrong sort of paper for the Research Excellence Framework, as it is only papers that are published within the roughly 5-year REF window that are taken into account. It would be more useful for the REF panels if the “5-year” window listed citations only to those papers actually published within the last five years. I wonder how the panel will try to use this limited information in assessing the true quality of  a paper?

I should also say that although this paper is, by a large margin, the nearest I’ve got to the citation hit parade, I don’t think it’s by any means the best paper I’ve ever written.

Another weakness is that Google Scholar doesn’t give a normalized h-index (i.e. one based on citations shared out amongst the authors of multi-author papers).

Still, you can’t have everything. Now that this extremely useful tool is available (for free) to all scientists and other denizens of the interwebs, I re-iterate my point that the panels involved in the assessing research for the Research Excellence Framework should use it rather than the inferior commercial versions, which are much less accurate.


Linking to Data – Effect on Citation Rates in Astronomy (via Meters, Metrics and More)

Posted in Uncategorized with tags , , , , on June 30, 2011 by telescoper

I’m not a big fan of bibliometricism …but this is definitely Quite Interesting. I wonder if my linking to it will increase its readership?

Linking to Data - Effect on Citation Rates in Astronomy In the paper Effect of E-printing on Citation Rates in Astronomy and Physics we asked ourselves the question whether the introduction of the arXiv e-print repository had any influence on citation behavior. We found significant increases in citation rates for papers that appear as e-prints prior to being published in scholarly journals. This is just one example of how publication practices influence article metrics (citation rates, usage, obsolesc … Read More

via Meters, Metrics and More