## Random Thoughts: Points and Poisson (d’Avril)

I’ve got a thing about randomness. For a start I don’t like the word, because it covers such a multitude of sins. People talk about there being randomness in nature when what they really mean is that they don’t know how to predict outcomes perfectly. That’s not quite the same thing as things being inherently unpredictable; statements about the nature of reality are ontological, whereas I think randomness is only a useful concept in an epistemological sense. It describes our lack of knowledge: just because we don’t know how to predict doesn’t mean that it can’t be predicted.

Nevertheless there are useful mathematical definitions of randomness and it is also (somtimes) useful to make mathematical models that display random behaviour in a well-defined sense, especially in situations where one has to take into account the effects of noise.

I thought it would be fun to illustrate one such model. In a point process, the random element is a “dot” that occurs at some location in time or space. Such processes occur in wide range of contexts: arrivals of buses at a bus stop, photons in a detector, darts on a dartboard, and so on.

Let us suppose that we think of such a process happening in time, although what follows can straightforwardly be generalised to things happening over an area (such a dartboard) or within some higher-dimensional region. It is also possible to invest the points with some other attributes; processes like this are sometimes called marked point processes, but I won’t discuss them here.

The “most” random way of constructing a simple point process is to assume that each event happens independently of every other event, and that there is a constant probability per unit time of an event happening. This type of process is called a Poisson process, after the French mathematician Siméon-Denis Poisson, who was born in 1781. He was one of the most creative and original physicists of all time: besides fundamental work on electrostatics and the theory of magnetism for which he is famous, he also built greatly upon Laplace’s work in probability theory. His principal result was to derive a formula giving the number of random events if the probability of each one is very low. The Poisson distribution, as it is now known and which I will come to shortly, is related to this original calculation; it was subsequently shown that this distribution amounts to a limiting of the binomial distribution. Just to add to the connections between probability theory and astronomy, it is worth mentioning that in 1833 Poisson wrote an important paper on the motion of the Moon.

In a finite interval of duration T the mean (or expected) number of events for a Poisson process will obviously just be proportional to the product of the rate per unit time and T itself; call this product l.

The full distribution is then

This gives the probability that a finite interval contains exactly *x* events. It can be neatly derived from the binomial distribution by dividing the interval into a very large number of very tiny pieces, each one of which becomes a Bernoulli trial. The probability of success (i.e. of an event occurring) in each trial is extremely small, but the number of trials becomes extremely large in such a way that the mean number of successes is l. In this limit the binomial distribution takes the form of the above expression. The variance of this distribution is interesting: it is alsol. This means that the typical fluctuations within the interval are of order the square root of l on a mean level of l, so the *fractional* variation is of the famous “one over root n” form that is a useful estimate of the expected variation in point processes. Indeed, it’s a useful rule-of-thumb for estimating likely fluctuation levels in a host of statistical situations.

If football were a Poisson process with a mean number of goals per game of, say, 2 then would expect must games to have 2 plus or minus 1.4 (the square root of 2) goals, i.e. between about 0.6 and 3.4. That is actually not far from what is observed and the distribution of goals per game in football matches is actually quite close to a Poisson distribution.

This idea can be straightforwardly extended to higher dimensional processes. If points are scattered over an area with a constant probability per unit area then the mean number in a finite area will also be some number l and the same formula applies.

As a matter of fact I first learned about the Poisson distribution when I was at school, doing A-level mathematics (which in those days actually included some mathematics). The example used by the teacher to illustrate this particular bit of probability theory was a two-dimensional one from biology. The skin of a fish was divided into little squares of equal area, and the number of parasites found in each square was counted. A histogram of these numbers accurately follows the Poisson form. For years I laboured under the delusion that it was given this name because it was something to do with fish, but then I never was very quick on the uptake.

This is all very well, but point processes are not always of this Poisson form. Points can be clustered, so that having one point at a given position increases the conditional probability of having others nearby. For example, galaxies like those shown in the nice picture are distributed throughout space in a clustered pattern that is very far from the Poisson form. But it’s very difficult to tell from just looking at the picture. What is needed is a rigorous statistical analysis.

The statistical description of clustered point patterns is a fascinating subject, because it makes contact with the way in which our eyes and brain perceive pattern. I’ve spent a large part of my research career trying to figure out efficient ways of quantifying pattern in an objective way and I can tell you it’s not easy, especially when the data are prone to systematic errors and glitches. I can only touch on the subject here, but to see what I am talking about look at the two patterns below:

You will have to take my word for it that one of these is a realization of a two-dimensional Poisson point process and the other contains correlations between the points. One therefore has a real pattern to it, and one is a realization of a completely unstructured random process.

I show this example in popular talks and get the audience to vote on which one is the random one. The vast majority usually think that the top is the one that is random and the bottom one is the one with structure to it. It is not hard to see why. The top pattern is very smooth (what one would naively expect for a constant probability of finding a point at any position in the two-dimensional space) , whereas the bottom one seems to offer a profusion of linear, filamentary features and densely concentrated clusters.

In fact, it’s the bottom picture that was generated by a Poisson process using a Monte Carlo random number generator. All the structure that is visually apparent is imposed by our own sensory apparatus, which has evolved to be so good at discerning patterns that it finds them when they’re not even there!

The top process is also generated by a Monte Carlo technique, but the algorithm is more complicated. In this case the presence of a point at some location suppresses the probability of having other points in the vicinity. Each event has a zone of avoidance around it; the points are therefore *anticorrelated*. The result of this is that the pattern is much smoother than a truly random process should be. In fact, this simulation has nothing to do with galaxy clustering really. The algorithm used to generate it was meant to mimic the behaviour of glow-worms which tend to eat each other if they get too close. That’s why they spread themselves out in space more uniformly than in the random pattern.

Incidentally, I got both pictures from Stephen Jay Gould’s collection of essays *Bully for Brontosaurus* and used them, with appropriate credit and copyright permission, in my own book *From Cosmos to Chaos*. I forgot to say this in earlier versions of this post.

The tendency to find things that are not there is quite well known to astronomers. The constellations which we all recognize so easily are not physical associations of stars, but are just chance alignments on the sky of things at vastly different distances in space. That is not to say that they are random, but the pattern they form is not caused by direct correlations between the stars. Galaxies form real three-dimensional physical associations through their direct gravitational effect on one another.

People are actually pretty hopeless at understanding what “really” random processes look like, probably because the word random is used so often in very imprecise ways and they don’t know what it means in a specific context like this. The point about random processes, even simpler ones like repeated tossing of a coin, is that coincidences happen much more frequently than one might suppose.

I suppose there is an evolutionary reason why our brains like to impose order on things in a general way. More specifically scientists often use perceived patterns in order to construct hypotheses. However these hypotheses must be tested objectively and often the initial impressions turn out to be figments of the imagination, like the canals on Mars.

Now, I think I’ll complain to wordpress about the widget that links pages to a “random blog post”.

I’m sure it’s not really random….

April 4, 2009 at 6:09 pm

Nice article Peter. I like to get people thinking by asking questions like:

* is 5 [or whatever] a random number then?

* is 5 a random number when it appears on a sheet generated by a “random number routine” but not when it is the answer to a mathematics problem?

* is 3.14159 a random number?

* is 3.24159 a random number?

The aim is to get people thinking about the *process* that generates the number, and how the knowledge they have (or don’t have) of that process changes things.

Anton

April 4, 2009 at 11:12 pm

Years ago I had to go to a conference in the States. Just before that trip I had been on holiday in Egypt. When I went through through the departures procedure at Heathrow the uniformed official just before check-in looked at my passport, saw the Egyptian visa, and then pressed a large red button on his console. He then waved me forward to the next stage of the queue. About 2 minutes later I was whisked off into another room and told that I had been selected for a random search.

Obviously there wasn’t anything remotely random about the decision to search me – it was determined by the big red button – but at least I thought better of saying so. You can’t reason with such people.

April 5, 2009 at 8:39 pm

A nice example of people’s ability to find patterns when they are not there was the large number of complaints Apple received about the iPod’s “shuffle” feature always “playing two songs in a row by the same artist” or “repeating a song before going through the whole playlist”. After exhaustingly testing the random number generator at the heart of this, I believe they gave up and modified it to have glowworm-like “zones of avoidance”.

(Oh, and in the text prior to the Poisson distribution, the mean is given as “l” but then becomes “lambda” in the formula, at least on my browser.)

April 5, 2009 at 10:34 pm

Apparently you can set your facebook profile to say you’re looking for “random play”. Perhaps one can sue if the result is insufficiently random.

Anyway The lambda looks like a lambda in my browser, but I’m not sure what happens with a random one.

April 6, 2009 at 4:50 pm

[...] going to blatantly swipe these two pictures from Peter Coles, but you should read his post for more information. The question is: which of these images [...]

April 6, 2009 at 7:03 pm

By Aether Wave Theory (AWT) whole Universe is basically random stuff.

April 6, 2009 at 11:53 pm

I believe that recently a group of computer scientists had generated a system for producing truly random numbers as apposed to current computer algorithms. I’m going to take a guess that this development originated from Italy/Switzerland.

April 7, 2009 at 2:15 am

I think there may be a random “t” in here somewhere?

“The constellations which we all recognize so easily are not physical associations of starts, . . . “

April 7, 2009 at 6:49 am

Peter

Thanks for that. I’ve fixed it now.

Peter

April 7, 2009 at 8:39 am

Thank you for a great article, Peter. It’s not often I read through an entire article about statistics which is bad, I know, but this one was so well written it was impossible to stop reading until the end.

I’ve been meaning to write about patterns that people see in subjects such as astrology and have been putting it off for a bit, but this has given me a little bit of an incentive. Thanks again.

Tom

April 7, 2009 at 3:06 pm

It’s worse than that! Take the random array, choose a square region, displace 50% of the pixels to the left one pixel. That’s fairly harmless. Now put the two pictures side by side and view as a stereogram. As with the Bell Inequality, data and information are not the same things.

What happens if you additionally displace the other 50% to the right then view again?

April 7, 2009 at 3:23 pm

Thank you very much for this great article.

April 7, 2009 at 9:26 pm

Dear Huw (April 6th, 11.53pm)

What do you mean by a ‘truly random’ number please?

Anton

April 12, 2009 at 11:02 pm

[...] and Demons My recent post about randomness and non-randomness spawned a lot of comments over on cosmic variance about the nature of entropy. I thought I’d [...]

April 15, 2009 at 2:17 pm

[...] invitation put me in an artistic frame of mind so, to follow up my post on randomness (and the corresponding parallel version on cosmic variance), I thought I’d develop some [...]

April 19, 2009 at 1:42 am

Beautiful illustration. Thanks!

April 21, 2009 at 1:26 pm

[...] more detailed explanation of these images is at the blog In The Dark, and he uses it to talk about galaxy distributions. However, it also tells us a lot about our [...]

May 30, 2009 at 10:35 pm

I spend a good deal of my time at work generating blue noise dot patterns, and it is definitely hard to get away from Poisson.

It may be making things confusing, but there are physiological reasons to why we tend to think bluer noises are more random than the whiter ones, as <a href="http://www.imatest.com/docs/images/LogFC_CSF_eye_test.jpg"charts like this point out…

October 24, 2009 at 2:55 pm

[...] case discussed by Pearson in the limit of very large n. So this gives another example of the useful rule-of-thumb that quantities arising from fluctuations among n entities generally give a result that depends on [...]

November 15, 2009 at 7:52 pm

[...] what is meant by randomness in the first place. I’ve actually commented on this before, in a post that still seems to be collecting readers so I thought I’d develop one or two of the ideas a [...]

May 27, 2010 at 2:13 pm

[...] while ago I posted an item asking what “scattered randomly” is meant to mean. It included this [...]

January 13, 2011 at 9:59 am

The public vote in lectures on which of the two spatial distributions is random got a mention on this morning’s

In Our Timeon B.B.C. Radio Four.November 25, 2011 at 11:36 am

[...] I got one last week from an artist called Tobias Collier concerning an old post of mine about randomness. Looking at his website I can see why he was interested in that particular topic, and also found so [...]

December 29, 2013 at 4:52 pm

I guessed it right. The top one looks like a hexagonal set of pores. Like a zeolite or such. Also, the lack of dark spots (clusters) was something I noted and then the other one had some.

January 12, 2014 at 11:22 pm

[…] 9 in Stirling et al. demonstrates this effect for ‘striped’ nanoparticles. I have referred to this post from my erstwhile colleague Peter Coles repeatedly throughout the debate at PubPeer. I recommend […]

February 26, 2014 at 5:04 pm

[…] all this reminded me of a very old post of mine about the difficulty of discerning patterns in distributions of points. Take the two (not very […]