Lognormality Revisited (Again)

Today provided me with a (sadly rare) opportunity to join in our weekly Cosmology Journal Club at the University of Sussex. I don’t often get to go because of meetings and other commitments. Anyway, one of the papers we looked at (by Clerkin et al.) was entitled Testing the Lognormality of the Galaxy Distribution and weak lensing convergence distributions from Dark Energy Survey maps. This provides yet more examples of the unreasonable effectiveness of the lognormal distribution in cosmology. Here’s one of the diagrams, just to illustrate the point:

Log_galaxy_countsThe points here are from MICE simulations. Not simulations of mice, of course, but simulations of MICE (Marenostrum Institut de Ciencies de l’Espai). Note how well the curves from a simple lognormal model fit the calculations that need a supercomputer to perform them!

The lognormal model used in the paper is basically the same as the one I developed in 1990 with  Bernard Jones in what has turned out to be  my most-cited paper. In fact the whole project was conceived, work done, written up and submitted in the space of a couple of months during a lovely visit to the fine city of Copenhagen. I’ve never been very good at grabbing citations – I’m more likely to fall off bandwagons rather than jump onto them – but this little paper seems to keep getting citations. It hasn’t got that many by the standards of some papers, but it’s carried on being referred to for almost twenty years, which I’m quite proud of; you can see the citations-per-year statistics even seen to be have increased recently. The model we proposed turned out to be extremely useful in a range of situations, which I suppose accounts for the citation longevity:

nph-ref_historyCitations die away for most papers, but this one is actually attracting more interest as time goes on! I don’t think this is my best paper, but it’s definitely the one I had most fun working on. I remember we had the idea of doing something with lognormal distributions over coffee one day,  and just a few weeks later the paper was finished. In some ways it’s the most simple-minded paper I’ve ever written – and that’s up against some pretty stiff competition – but there you go.

Lognormal_abstract

The lognormal seemed an interesting idea to explore because it applies to non-linear processes in much the same way as the normal distribution does to linear ones. What I mean is that if you have a quantity Y which is the sum of n independent effects, Y=X1+X2+…+Xn, then the distribution of Y tends to be normal by virtue of the Central Limit Theorem regardless of what the distribution of the Xi is  If, however, the process is multiplicative so  Y=X1×X2×…×Xn then since log Y = log X1 + log X2 + …+log Xn then the Central Limit Theorem tends to make log Y normal, which is what the lognormal distribution means.

The lognormal is a good distribution for things produced by multiplicative processes, such as hierarchical fragmentation or coagulation processes: the distribution of sizes of the pebbles on Brighton beach  is quite a good example. It also crops up quite often in the theory of turbulence.

I’ll mention one other thing  about this distribution, just because it’s fun. The lognormal distribution is an example of a distribution that’s not completely determined by knowledge of its moments. Most people assume that if you know all the moments of a distribution then that has to specify the distribution uniquely, but it ain’t necessarily so.

If you’re wondering why I mentioned citations, it’s because they’re playing an increasing role in attempts to measure the quality of research done in UK universities. Citations definitely contain some information, but interpreting them isn’t at all straightforward. Different disciplines have hugely different citation rates, for one thing. Should one count self-citations?. Also how do you apportion citations to multi-author papers? Suppose a paper with a thousand citations has 25 authors. Does each of them get the thousand citations, or should each get 1000/25? Or, put it another way, how does a single-author paper with 100 citations compare to a 50 author paper with 101?

Or perhaps a better metric would be the logarithm of the number of citations?

12 Responses to “Lognormality Revisited (Again)”

  1. “Citations die away for most papers, but this one is actually attracting more interest as time goes on!”

    With time, will the distribution approach a log-normal distribution? 🙂

  2. “Suppose a paper with a thousand citations has 25 authors. Does each of them get the thousand citations, or should each get 1000/25?

    All bibliometry is very coarse, but not adopting the second approach above is the worst error one can make.

    “Or, put it another way, how does a single-author paper with 100 citations compare to a 50 author paper with 101?”

    Obviously, the first one should receive much more weight.

  3. telescoper Says:

    I just realised this paper was written 25 years ago….

    …where did all that time go?

    • This September, it will be 25 years since I attended my first scientific conference, the International Conference on Gravitational Lensing in Hamburg (where I was a student at the time). I had by this time already decided to do my thesis work in Sjur Refsdal’s gravitational-lens group at the Hamburg Observatory, but this conference strengthened my decision.

      The conference bag (a bit larger than A4, like a folder with a velcro edge), was donated by |d|i|g|i|t|a|l|. Those were the days.

      • telescoper Says:

        Actually, my first scientific conference was in Cargese, Corsica in the summer of 1986. Almost 30 years ago…

    • Across the evening sky, all the birds are leaving
      But how can they know it’s time for them to go?
      Before the winter fire, I will still be dreaming
      I have no thought of time

      For who knows where the time goes?
      Who knows where the time goes?

      Sad, deserted shore, your fickle friends are leaving
      Ah, but then you know it’s time for them to go
      But I will still be here, I have no thought of leaving
      I do not count the time

      For who knows where the time goes?
      Who knows where the time goes?

      And I am not alone while my love is near me
      I know it will be so until it’s time to go
      So come the storms of winter and then the birds in spring again
      I have no fear of time

      For who knows how my love grows?
      And who knows where the time goes?

      Sandy Denny

  4. Citation statistics is a very rough measure of quality. I don’t think number of authors has much to do with quality of the research. Perhaps number of _significant_ authors (who really contributed) has some effect, but given the uncertainty in what citations actually measure, it is not the dominant uncertainty. Number of citations does depend on activity in the field: it is difficult to get more citations than there are papers published in the field – some normalization for this might be useful.
    Name recognition is crucial: better known people attract citations by that route. Perhaps we could define a ‘handicap’ depending on seniority?

    A paper with zero citations after a few years is probably not that great. Above that, a better way to judge a paper, at least better than counting authors or using bibliometrics, is by reading it.

  5. “a better way to judge a paper, at least better than counting authors or using bibliometrics, is by reading it”

    I hope that we all agree on that!

    “I don’t think number of authors has much to do with quality of the research.”

    Certainly a typical paper with 1000 authors is not 1000 times better than the typical paper with 1 author. The point is that some people have, say, 1000 citations, with most papers being single-author papers or having a small number of co-authors, with many first-author papers. Others have 20,000 citations, but 19,500 of them are from, say, half a dozen highly-cited papers with hundreds of authors, where they are no. 763. Sadly, there are some who will say that the one candidate with the 20,000 citations must be 20 times better than the one with 1000. Probably, he is not nearly as good.

    Thus, as a zeroth-order correction, divide the number of citations by the number of authors.

    • Phil Uttley Says:

      The main argument for doing this is to reduce the contribution of citations from papers that people hardly did any work on, but normalising by number of authors is quite a crude approach and would be very unfair on people doing the key work but who happen to be in a large collaboration. In fields where the main contributors are higher up the list it would be much better to apply a weighting based on position in the author list, perhaps some kind of logarithmic dependence since I doubt that it is linear at all. Many of those 1000 authors in a big collaboration may deserve some payoff for what may have been years or work on the project, with no other public recognition. I think similar ideas have been suggested before, but it is probably impossible to come up with a one-size-fits-all approach due to the very different sociology of different fields.

      • “The main argument for doing this is to reduce the contribution of citations from papers that people hardly did any work on, but normalising by number of authors is quite a crude approach and would be very unfair on people doing the key work but who happen to be in a large collaboration.”

        As I said, a zeroth-order approach.

        “In fields where the main contributors are higher up the list it would be much better to apply a weighting based on position in the author list, perhaps some kind of logarithmic dependence since I doubt that it is linear at all.”

        Even within a field, conventions vary as to what the order in the list means. In some fields, the most important author is last. Then there are alphabetical lists. But with 3 or 4 authors, the alphabetical order might be coincidence.

        It might make sense for the authors themselves to provide a weighting scheme.

        “Many of those 1000 authors in a big collaboration may deserve some payoff for what may have been years or work on the project, with no other public recognition.”

        Usually, though, if people have been working on something for years and the public result is being on a paper with a thousand authors, they usually have a permanent job anyway, so don’t really need the recognition.

        “I think similar ideas have been suggested before, but it is probably impossible to come up with a one-size-fits-all approach due to the very different sociology of different fields.”

        Indeed. Note also that there is considerable variance in deciding when one moves from being in the acknowledgements to being a co-authors.

  6. Anton Garrett Says:

    The fit looks good… but was it compared with the fit of other distributions having a similar number of adjustable parameters in order to see just how much better?

    • telescoper Says:

      The only other fit shown is a Gaussian and both it and the lognormal have the same number of parameters

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: