Today provided me with a (sadly rare) opportunity to join in our weekly Cosmology Journal Club at the University of Sussex. I don’t often get to go because of meetings and other commitments. Anyway, one of the papers we looked at (by Clerkin et al.) was entitled *Testing the Lognormality of the Galaxy Distribution and weak lensing convergence distributions from Dark Energy Survey maps*. This provides yet more examples of the unreasonable effectiveness of the lognormal distribution in cosmology. Here’s one of the diagrams, just to illustrate the point:

The points here are from MICE simulations. Not simulations of mice, of course, but simulations of MICE (Marenostrum Institut de Ciencies de l’Espai). Note how well the curves from a simple lognormal model fit the calculations that need a supercomputer to perform them!

The lognormal model used in the paper is basically the same as the one I developed in 1990 with Bernard Jones in what has turned out to be my most-cited paper. In fact the whole project was conceived, work done, written up and submitted in the space of a couple of months during a lovely visit to the fine city of Copenhagen. I’ve never been very good at grabbing citations – I’m more likely to fall off bandwagons rather than jump onto them – but this little paper seems to keep getting citations. It hasn’t got that many by the standards of some papers, but it’s carried on being referred to for almost twenty years, which I’m quite proud of; you can see the citations-per-year statistics even seen to be have increased recently. The model we proposed turned out to be extremely useful in a range of situations, which I suppose accounts for the citation longevity:

Citations die away for most papers, but this one is actually attracting more interest as time goes on! I don’t think this is my best paper, but it’s definitely the one I had most fun working on. I remember we had the idea of doing something with lognormal distributions over coffee one day, and just a few weeks later the paper was finished. In some ways it’s the most simple-minded paper I’ve ever written – and that’s up against some pretty stiff competition – but there you go.

The lognormal seemed an interesting idea to explore because it applies to non-linear processes in much the same way as the normal distribution does to linear ones. What I mean is that if you have a quantity *Y* which is the sum of n independent effects, *Y=X*_{1}+X_{2}+…+X_{n}, then the distribution of *Y *tends to be normal by virtue of the *Central Limit Theorem* regardless of what the distribution of the X_{i} is If, however, the process is multiplicative so *Y=X*_{1}×X_{2}×…×X_{n} then since *log Y = log X*_{1} + log X_{2} + …+log X_{n} then the Central Limit Theorem tends to make *log Y *normal, which is what the lognormal distribution means.

The lognormal is a good distribution for things produced by multiplicative processes, such as hierarchical fragmentation or coagulation processes: the distribution of sizes of the pebbles on Brighton beach is quite a good example. It also crops up quite often in the theory of turbulence.

I’ll mention one other thing about this distribution, just because it’s fun. The lognormal distribution is an example of a distribution that’s not completely determined by knowledge of its moments. Most people assume that if you know all the moments of a distribution then that has to specify the distribution uniquely, but it ain’t necessarily so.

If you’re wondering why I mentioned citations, it’s because they’re playing an increasing role in attempts to measure the quality of research done in UK universities. Citations definitely contain some information, but interpreting them isn’t at all straightforward. Different disciplines have hugely different citation rates, for one thing. Should one count self-citations?. Also how do you apportion citations to multi-author papers? Suppose a paper with a thousand citations has 25 authors. Does each of them get the thousand citations, or should each get 1000/25? Or, put it another way, how does a single-author paper with 100 citations compare to a 50 author paper with 101?

Or perhaps a better metric would be the logarithm of the number of citations?