Archive for citations

Not the Open Journal of Astrophysics Impact Factor – Update

Posted in Open Access, The Universe and Stuff with tags , , , , on February 11, 2020 by telescoper

 I thought I would give an update with some bibliometric information about the 12 papers published by the Open Journal of Astrophysics in 2019. The NASA/ADS system has been struggling to tally the citations to a couple of our papers but this issue has now been resolved.  According to this source the total number of citations for these papers is 532 (as of today). This number is dominated by one particular paper which has 443 citations according to NASA/ADS. Excluding this paper gives an average number of citations for the remaining 11 of 7.4.

I’ll take this opportunity to re-iterate some comments about the Journal Impact Factor. When asked about this my usual response is (a) to repeat the arguments why the impact factor is daft and (b) point out that we have to have been running continuously for at least two years to have an official impact factor anyway.

For those of you who can’t be bothered to look up the definition of an impact factor , for a given year it is basically the sum of the citations for all papers published in the journal over the previous two-year period divided by the total number of papers published in that journal over the same period. It’s therefore the average citations per paper published in a two-year window. The impact factor for 2019 would be defined using data from 2017 and 2018, etc.

The impact factor is prone to the same issue as the simple average I quoted above in that citation statistics are generally heavily skewed  and the average can therefore be dragged upwards by a small number of papers with lots of citations (in our case just one).

I stress again we don’t have an Impact Factor as such for the Open Journal. However, for reference (but obviously not comparison) the latest actual impact factors (2018, i.e. based on 2016 and 2017 numbers) for some leading astronomy journals are: Monthly Notices of the Royal Astronomical Society 5.23; Astrophysical Journal 5.58; and Astronomy and Astrophysics 6.21.

My main point, though, is that with so much bibliometric information available at the article level there is no reason whatsoever to pay any attention to crudely aggregated statistics at the journal level. Judge the contents, not the packaging.

This post is based on an article at the OJA blog.

 

 

Not the Open Journal of Astrophysics Impact Factor – Update

Posted in Open Access, The Universe and Stuff with tags , , , , on January 20, 2020 by telescoper

Now that we have started a new year, and a new volume of the Open Journal of Astrophysics , I thought I would give an update with some bibliometric information about the 12 papers we published in 2019.

It is still early days for aggregating citations for 2019 but, using a combination of the NASA/ADS system and the Inspire-HEP, I have been able to place a firm lower limit on the total number of citations so far for those papers of 408, giving an average citation rate per paper of 34.

These numbers are dominated by one particular paper which has 327 citations according to Inspire (see above). Excluding this paper gives an average number of citations for the remaining 11 of 7.4.

I’ll take this opportunity to re-iterate some comments about the Journal Impact Factor. When asked about this my usual response is (a) to repeat the arguments why the impact factor is daft and (b) point out that we have to have been running continuously for at least two years to have an official impact factor anyway.

For those of you who can’t be bothered to look up the definition of an impact factor , for a given year it is basically the sum of the citations for all papers published in the journal over the previous two-year period divided by the total number of papers published in that journal over the same period. It’s therefore the average citations per paper published in a two-year window. The impact factor for 2019 would be defined using data from 2017 and 2018, etc.

The impact factor is prone to the same issue as the simple average I quoted above in that citation statistics are generally heavily skewed and the average can therefore be dragged upwards by a small number of papers with lots of citations (in our case just one).

I stress again we don’t have an Impact Factor for the Open Journal. However, for reference (but obviously not direct comparison) the latest actual impact factors (2018, i.e. based on 2016 and 2017 numbers) for some leading astronomy journals are: Monthly Notices of the Royal Astronomical Society 5.23; Astrophysical Journal 5.58; and Astronomy and Astrophysics 6.21.

My main point, though, is that with so much bibliometric information available at the article level there is no reason whatsoever to pay any attention to crudely aggregated statistics at the journal level. Judge the contents, not the packaging.

 

ADS and the Open Journal of Astrophysics

Posted in Open Access with tags , , , , , on January 19, 2020 by telescoper

Most if not all of the authors of papers published in the Open Journal of Astrophysics, along with a majority of astrophysicists in general, use the NASA/SAO Astrophysics Data System (ADS) as an important route to the research literature in their domain, including bibliometric statistics and other information. Indeed this is the most important source of such data for most working astrophysicists. In light of this we have been taking steps to facilitate better interaction between the Open Journal of Astrophysics and the ADS.

First, note that journals indexed by ADS are assigned a short code that makes it easier to retrieve a publication. For reference, the short code for the Open Journal of Astrophysics is OJAp. For example, the 12 papers published by the Open Journal of Astrophysics can be found on ADS here.

If you click the above link you will find that the papers published more recently have not got their citations assigned yet. When we publish a paper at the Open Journal of Astrophysics we assign a DOI and deposit it and related metadata to a system called CrossRef which is accessed by ADS to populate bibliographic fields in its own database. ADS also assigns a unique bibliometric code it generates itself (based on the metadata it obtains from Crossref). This process can take a little while, however, as both Crossref and ADS update using batch processes, the latter usually running only at weekends. This introduces a significant delay in aggregating the citations acquired via different sources.

To complicate things further, papers submitted to the arXiv as preprints are indexed on ADS as preprints and only appear as journal articles when they are published. Among other things, citations from the preprint version are then aggregated on the system with those of the published article, but it can take a while before this process is completed, particularly if an author does not update the journal reference on arXiv.

For a combination of reasons, therefore, the papers we have published in the past have sometimes appeared on ADS out of order. On top of this, of the 12 papers published in 2019, there is one assigned a bibliometric code ending in 13 by ADS and none numbered 6! This is not too much a problem as the ADS identifiers are unique, but the result is not as tidy as it might be.

To further improve our service to the community, we have decided at the Open Journal of Astrophysics that from now on we will speed up this interaction with ADS by depositing information directly at the same time as we lodge it with Crossref. This means that (a) ADS does not have to rely on authors updating the arXiv field and (b) we can give ADS directly information that is not lodged at Crossref.

I hope this clarifies the situation.

Not the Open Journal of Astrophysics Impact Factor

Posted in Open Access with tags , , , on October 22, 2019 by telescoper

Yesterday evening, after I’d finished my day job, I was doing some work on the Open Journal of Astrophysics ahead of a talk I am due to give this afternoon as part of the current Research Week at Maynooth University. The main thing I was doing was checking on citations for the papers we have published so far, to be sure that the Crossref mechanism is working properly and the papers were appearing correctly on, e.g., the NASA/ADS system. There are one or two minor things that need correcting, but it’s basically doing fine.

In the course of all that I remembered that when I’ve been giving talks about the Open Journal project quite a few people have asked me about its Journal Impact Factor. My usual response is (a) to repeat the arguments why the impact factor is daft and (b) point out that we have to have been running continuously for at least two years to have an official impact factor so we don’t really have one.

For those of you who can’t be bothered to look up the definition of an impact factor , for a given year it is basically the sum of the citations in a given year for all papers published in the journal over the previous two-year period divided by the total number of papers published in that journal over the same period. It’s therefore the average citations per paper published in a two-year window. The impact factor for 2019 would be defined using citations to papers publish in 2017 and 2018, etc.

The Open Journal of Astrophysics didn’t publish any papers in 2017 and only one in 2018 so obviously we can’t define an official impact factor for 2019. However, since I was rummaging around with bibliometric data at the time I could work out the average number of citations per paper for the papers we have published so far in 2019. That number is:

I stress again that this is not the Impact Factor for the Open Journal but it is a rough indication of the citation impact of our papers. For reference (but obviously not comparison) the latest actual impact factors (2018, i.e. based on 2016 and 2017 numbers) for some leading astronomy journals are: Monthly Notices of the Royal Astronomical Society 5.23; Astrophysical Journal 5.58; and Astronomy and Astrophysics 6.21.

Citation Analysis of Scientific Categories

Posted in Open Access, Science Politics with tags , on May 18, 2018 by telescoper

I stumbled across an interesting paper the other day with the title Citation Analysis of Scientific Categories. The title isn’t really accurate because not all the 231 categories covered by the analysis are `scientific’: they include many topics in the arts and humanities too. Anyway, the abstract is here:

Databases catalogue the corpus of research literature into scientific categories and report classes of bibliometric data such as the number of citations to articles, the number of authors, journals, funding agencies, institutes, references, etc. The number of articles and citations in a category are gauges of productivity and scientific impact but a quantitative basis to compare researchers between categories is limited. Here, we compile a list of bibliometric indicators for 236 science categories and citation rates of the 500 most cited articles of each category. The number of citations per paper vary by several orders of magnitude and are highest in multidisciplinary sciences, general internal medicine, and biochemistry and lowest in literature, poetry, and dance. A regression model demonstrates that citation rates to the top articles in each category increase with the square root of the number of articles in a category and decrease proportionately with the age of the references: articles in categories that cite recent research are also cited more frequently. The citation rate correlates positively with the number of funding agencies that finance the research. The category h-index correlates with the average number of cites to the top 500 ranked articles of each category (R2 = 0.997). Furthermore, only a few journals publish the top 500 cited articles in each category: four journals publish 60% (σ = ±20%) of these and ten publish 81% (σ = ±15%).

The paper is open access (I think) and you can find the whole thing here.

I had a discussion over lunch today with a couple of colleagues here in Maynooth about using citations. I think we agreed that citation analysis does convey some information about the impact of a person’s research but that information is rather limited. One of the difficulties is that publication rates and citation activity are very discipline-dependent so one can’t easily compare individuals in different areas. The paper here is interesting because it presents an interesting table showing how various statistical citation measures vary across fields and sub-fields;  physics is broken down into a number of distinct areas (e.g. Astronomy & Astrophysics, Particle Physics, Condensed Matter and Nuclear Physics) across which there is considerable variation. How to best to use this information is still not clear..

 

 

Metrics for `Academic Reputation’

Posted in Bad Statistics, Science Politics with tags , , , on April 9, 2018 by telescoper

This weekend I came across a provocative paper on the arXiv with the title Measuring the academic reputation through citation records via PageRank. Here is the abstract:

The objective assessment of the prestige of an academic institution is a difficult and hotly debated task. In the last few years, different types of University Rankings have been proposed to quantify the excellence of different research institutions in the world. Albeit met with criticism in some cases, the relevance of university rankings is being increasingly acknowledged: indeed, rankings are having a major impact on the design of research policies, both at the institutional and governmental level. Yet, the debate on what rankings are  exactly measuring is enduring. Here, we address the issue by measuring a quantitative and reliable proxy of the academic reputation of a given institution and by evaluating its correlation with different university rankings. Specifically, we study citation patterns among universities in five different Web of Science Subject Categories and use the PageRank algorithm on the five resulting citation networks. The rationale behind our work is that scientific citations are driven by the reputation of the reference so that the PageRank algorithm is expected to yield a rank which reflects the reputation of an academic institution in a specific field. Our results allow to quantifying the prestige of a set of institutions in a certain research field based only on hard bibliometric data. Given the volume of the data analysed, our findings are statistically robust and less prone to bias, at odds with ad hoc surveys often employed by ranking bodies in order to attain similar results. Because our findings are found to correlate extremely well with the ARWU Subject rankings, the approach we propose in our paper may open the door to new, Academic Ranking methodologies that go beyond current methods by reconciling the qualitative evaluation of Academic Prestige with its quantitative measurements via publication impact.

(The link to the description of the PageRank algorithm was added by me; I also corrected a few spelling mistakes in the abstract). You can find the full paper here (PDF).

For what it’s worth, I think the paper contains some interesting ideas (e.g. treating citations as a `tree’ rather than a simple `list’) but the authors make some assumptions that I find deeply questionable (e.g. that being cited among a short reference listed is somehow of higher value than in a long list). The danger is that using such information in a metric could form an incentive to further bad behaviour (such as citation cartels).

I have blogged quite a few times about the uses and abuses of citations (see tag here) , and I won’t rehearse these arguments here. I will say, however, that I do agree with the idea of sharing citations among the authors of the paper rather than giving each and every author credit for the total. Many astronomers disagree with this point of view, but surely it is perverse to argue that the 100th author of a paper with 51 citations deserves more credit than the sole author of paper with 49?

Above all, though, the problem with constructing a metric for `Academic Reputation’ is that the concept is so difficult to define in the first place…

Lognormality Revisited (Again)

Posted in Biographical, Science Politics, The Universe and Stuff with tags , , , , , , , on May 10, 2016 by telescoper

Today provided me with a (sadly rare) opportunity to join in our weekly Cosmology Journal Club at the University of Sussex. I don’t often get to go because of meetings and other commitments. Anyway, one of the papers we looked at (by Clerkin et al.) was entitled Testing the Lognormality of the Galaxy Distribution and weak lensing convergence distributions from Dark Energy Survey maps. This provides yet more examples of the unreasonable effectiveness of the lognormal distribution in cosmology. Here’s one of the diagrams, just to illustrate the point:

Log_galaxy_countsThe points here are from MICE simulations. Not simulations of mice, of course, but simulations of MICE (Marenostrum Institut de Ciencies de l’Espai). Note how well the curves from a simple lognormal model fit the calculations that need a supercomputer to perform them!

The lognormal model used in the paper is basically the same as the one I developed in 1990 with  Bernard Jones in what has turned out to be  my most-cited paper. In fact the whole project was conceived, work done, written up and submitted in the space of a couple of months during a lovely visit to the fine city of Copenhagen. I’ve never been very good at grabbing citations – I’m more likely to fall off bandwagons rather than jump onto them – but this little paper seems to keep getting citations. It hasn’t got that many by the standards of some papers, but it’s carried on being referred to for almost twenty years, which I’m quite proud of; you can see the citations-per-year statistics even seen to be have increased recently. The model we proposed turned out to be extremely useful in a range of situations, which I suppose accounts for the citation longevity:

nph-ref_historyCitations die away for most papers, but this one is actually attracting more interest as time goes on! I don’t think this is my best paper, but it’s definitely the one I had most fun working on. I remember we had the idea of doing something with lognormal distributions over coffee one day,  and just a few weeks later the paper was finished. In some ways it’s the most simple-minded paper I’ve ever written – and that’s up against some pretty stiff competition – but there you go.

Lognormal_abstract

The lognormal seemed an interesting idea to explore because it applies to non-linear processes in much the same way as the normal distribution does to linear ones. What I mean is that if you have a quantity Y which is the sum of n independent effects, Y=X1+X2+…+Xn, then the distribution of Y tends to be normal by virtue of the Central Limit Theorem regardless of what the distribution of the Xi is  If, however, the process is multiplicative so  Y=X1×X2×…×Xn then since log Y = log X1 + log X2 + …+log Xn then the Central Limit Theorem tends to make log Y normal, which is what the lognormal distribution means.

The lognormal is a good distribution for things produced by multiplicative processes, such as hierarchical fragmentation or coagulation processes: the distribution of sizes of the pebbles on Brighton beach  is quite a good example. It also crops up quite often in the theory of turbulence.

I’ll mention one other thing  about this distribution, just because it’s fun. The lognormal distribution is an example of a distribution that’s not completely determined by knowledge of its moments. Most people assume that if you know all the moments of a distribution then that has to specify the distribution uniquely, but it ain’t necessarily so.

If you’re wondering why I mentioned citations, it’s because they’re playing an increasing role in attempts to measure the quality of research done in UK universities. Citations definitely contain some information, but interpreting them isn’t at all straightforward. Different disciplines have hugely different citation rates, for one thing. Should one count self-citations?. Also how do you apportion citations to multi-author papers? Suppose a paper with a thousand citations has 25 authors. Does each of them get the thousand citations, or should each get 1000/25? Or, put it another way, how does a single-author paper with 100 citations compare to a 50 author paper with 101?

Or perhaps a better metric would be the logarithm of the number of citations?