A few months have passed since I last won a dictionary as a prize in the Independent Crossword competition. That’s nothing remarkable in itself, but since my average rate of dictionary accumulation has been about one a month over the last few years, it seems a bit of a lull. Have I forgotten how to do crosswords and keep sending in wrong solutions? Is the Royal Mail intercepting my post? Has the number of correct entries per week suddenly increased, reducing my odds of winning? Have the competition organizers turned against me?
In fact, statistically speaking, there’s nothing significant in this gap. Even if my grids are all correct, the number of correct grids has remained constant, and the winner is pulled at random from those submitted (i.e. in such a way that all correct entries are equally likely to be drawn) , then a relatively long unsuccessful period such as I am experiencing at the moment is not at all improbable. The point is that such runs are far more likely in a truly random process than most people imagine, as indeed are runs of successes. Chance coincidences happen more often than you think.
I try this out in lectures sometimes, by asking a member of the audience to generate a random sequence of noughts and ones in their head. It seems people are very conscious that the number of ones should be roughly equal to the number of noughts that they impose that as they go along. Almost universally, the supposedly random sequences people produce only have very short runs of 1s or 0s because, say, a run like ‘00000’ just seems too unlikely. Well, it is unlikely, but that doesn’t mean it won’t happen. In a truly random binary sequence like this (i.e. one in which 1 and 0 both have a probability of 0.5 and each selection is independent of the others), coincidental runs of consecutive 0s and 1s happen with surprising frequency. Try it yourself, with a coin.
Coincidentally, the subject of randomness was suggested to me independently yesterday by an anonymous email correspondent by the name of John Peacock as I have blogged about it before; one particular post on this topic is actually one of this blog’s most popular articles). What triggered this was a piece about music players such as Spotify (whatever that is) which have a “random play” feature. Apparently people don’t accept that it is “really random” because of the number of times the same track comes up. To deal with this “problem”, experts are working at algorithms that don’t actually play things randomly but in such a way that accords with what people think randomness means.
I think this fiddling is a very bad idea. People understand probability so poorly anyway that attempting to redefine the word’s meaning is just going to add confusion. You wouldn’t accept a casino that used loaded dice, so why allow cheating in another context? Far better for all concerned for the general public to understand what randomness is and, perhaps more importantly, what it looks like.
I have to confess that I don’t really like the word “randomness”, but I haven’t got time right now for a rant about it. There are, however, useful mathematical definitions of randomness and it is also (sometimes) useful to make mathematical models that display random behaviour in a well-defined sense, especially in situations where one has to take into account the effects of noise.
I thought it would be fun to illustrate one such model. In a point process, the random element is a “dot” that occurs at some location in time or space. Such processes can be defined in one or more dimensions and relate to a wide range of situations: arrivals of buses at a bus stop, photons in a detector, darts on a dartboard, and so on.
The statistical description of clustered point patterns is a fascinating subject, because it makes contact with the way in which our eyes and brain perceive pattern. I’ve spent a large part of my research career trying to figure out efficient ways of quantifying pattern in an objective way and I can tell you it’s not easy, especially when the data are prone to systematic errors and glitches. I can only touch on the subject here, but to see what I am talking about look at the two patterns below:
You will have to take my word for it that one of these is a realization of a two-dimensional Poisson point process and the other contains correlations between the points. One therefore has a real pattern to it, and one is a realization of a completely unstructured random process.
I show this example in popular talks and get the audience to vote on which one is the random one. In fact, I did this just a few weeks ago during a lecture in our module Quarks to Cosmos, which attempts to explain scientific concepts to non-science students. As usual when I do this, I found that the vast majority thought that the top one is random and the bottom one is the one with structure to it. It is not hard to see why. The top pattern is very smooth (what one would naively expect for a constant probability of finding a point at any position in the two-dimensional space) , whereas the bottom one seems to offer a profusion of linear, filamentary features and densely concentrated clusters.
In fact, it’s the bottom picture that was generated by a Poisson process using a Monte Carlo random number generator. All the structure that is visually apparent in the second example is imposed by our own sensory apparatus, which has evolved to be so good at discerning patterns that it finds them when they’re not even there!
The top process is also generated by a Monte Carlo technique, but the algorithm is more complicated. In this case the presence of a point at some location suppresses the probability of having other points in the vicinity. Each event has a zone of avoidance around it; the points are therefore anticorrelated. The result of this is that the pattern is much smoother than a truly random process should be. In fact, this simulation has nothing to do with galaxy clustering really. The algorithm used to generate it was meant to mimic the behaviour of glow-worms which tend to eat each other if they get too close. That’s why they spread themselves out in space more uniformly than in the “really” random pattern.
I assume that Spotify’s non-random play algorithm will have the effect of producing a one-dimensional version of the top pattern, i.e. one with far too few coincidences to be genuinely random.
Incidentally, I got both pictures from Stephen Jay Gould’s collection of essays Bully for Brontosaurus and used them, with appropriate credit and copyright permission, in my own book From Cosmos to Chaos.
The tendency to find things that are not there is quite well known to astronomers. The constellations which we all recognize so easily are not physical associations of stars, but are just chance alignments on the sky of things at vastly different distances in space. That is not to say that they are random, but the pattern they form is not caused by direct correlations between the stars. Galaxies form real three-dimensional physical associations through their direct gravitational effect on one another.
People are actually pretty hopeless at understanding what “really” random processes look like, probably because the word random is used so often in very imprecise ways and they don’t know what it means in a specific context like this. The point about random processes, even simpler ones like repeated tossing of a coin, is that coincidences happen much more frequently than one might suppose.
I suppose there is an evolutionary reason why our brains like to impose order on things in a general way. More specifically scientists often use perceived patterns in order to construct hypotheses. However these hypotheses must be tested objectively and often the initial impressions turn out to be figments of the imagination, like the canals on Mars.
Perhaps I should complain to WordPress about the widget that links pages to a “random blog post”. I’m sure it’s not really random….