German Tanks, Traffic Wardens, and the End of the World

The other day I was looking through some documents relating to the portfolio of courses and modules offered by the Department of Mathematics here at the University of Sussex when I came across a reference to the German Tank Problem. Not knowing what this was I did a google search and  a quite comprehensive wikipedia page on the subject which explains the background rather well.

It seems that during the latter stages of World War 2 the Western Allies made sustained efforts to determine the extent of German tank production, and approached this in two major ways, namely  conventional intelligence gathering and statistical estimation with the latter approach often providing the more accurate and reliable, as was the case in estimation of the production of Panther tanks  just prior to D-Day. The allied command structure had thought the heavy Panzer V (Panther) tanks, with their high velocity, long barreled 75 mm/L70 guns, were uncommon, and would only be encountered in northern France in small numbers.  The US Army was confident that the Sherman tank would perform well against the Panzer III and IV tanks that they expected to meet but would struggle against the Panzer V. Shortly before D-Day, rumoursbegan to circulate that large numbers of Panzer V tanks had been deployed in Normandy.

To ascertain if this were true the Allies attempted to estimate the number of Panzer V  tanks being produced. To do this they used the serial numbers on captured or destroyed tanks. The principal numbers used were gearbox numbers, as these fell in two unbroken sequences; chassis, engine numbers and various other components were also used. The question to be asked is how accurately can one infer the total number of tanks based on a sample of a few serial numbers. So accurate did this analysis prove to be that, in the statistical theory of estimation, the general problem of estimating the maximum of a discrete uniform distribution from sampling without replacement is now known as the German tank problem. I’ll leave the details to the wikipedia discussion, which in my opinion is yet another demonstration of the advantages of a Bayesian approach to this kind of problem.

This problem is a more general version of a problem that I first came across about 30 years ago. I think it was devised in the following form by Steve Gull, but can’t be sure of that.

Imagine you are a visitor in an unfamiliar, but very populous, city. For the sake of argument let’s assume that it is in China. You know that this city is patrolled by traffic wardens, each of whom carries a number on their uniform.  These numbers run consecutively from 1 (smallest) to T (largest) but you don’t know what T is, i.e. how many wardens there are in total. You step out of your hotel and discover traffic warden number 347 sticking a ticket on your car. What is your best estimate of T, the total number of wardens in the city? I hope the similarity to the German Tank Problem is obvious, except in this case it is much simplified by involving just one number rather than a sample.

I gave a short lunchtime talk about this many years ago when I was working at Queen Mary College, in the University of London. Every Friday, over beer and sandwiches, a member of staff or research student would give an informal presentation about their research, or something related to it. I decided to give a talk about bizarre applications of probability in cosmology, and this problem was intended to be my warm-up. I was amazed at the answers I got to this simple question. The majority of the audience denied that one could make any inference at all about T based on a single observation like this, other than that it  must be at least 347.

Actually, a single observation like this can lead to a useful inference about T, using Bayes’ theorem. Suppose we have really no idea at all about T before making our observation; we can then adopt a uniform prior probability. Of course there must be an upper limit on T. There can’t be more traffic wardens than there are people, for example. Although China has a large population, the prior probability of there being, say, a billion traffic wardens in a single city must surely be zero. But let us take the prior to be effectively constant. Suppose the actual number of the warden we observe is t. Now we have to assume that we have an equal chance of coming across any one of the T traffic wardens outside our hotel. Each value of t (from 1 to T) is therefore equally likely. I think this is the reason that my astronomers’ lunch audience thought there was no information to be gleaned from an observation of any particular value, i.e. t=347.

Let us simplify this argument further by allowing two alternative “models” for the frequency of Chinese traffic wardens. One has T=1000, and the other (just to be silly) has T=1,000,000. If I find number 347, which of these two alternatives do you think is more likely? Think about the kind of numbers that occupy the range from 1 to T. In the first case, most of the numbers have 3 digits. In the second, most of them have 6. If there were a million traffic wardens in the city, it is quite unlikely you would find a random individual with a number as small as 347. If there were only 1000, then 347 is just a typical number. There are strong grounds for favouring the first model over the second, simply based on the number actually observed. To put it another way, we would be surprised to encounter number 347 if T were actually a million. We would not be surprised if T were 1000.

One can extend this argument to the entire range of possible values of T, and ask a more general question: if I observe traffic warden number t what is the probability I assign to each value of T? The answer is found using Bayes’ theorem. The prior, as I assumed above, is uniform. The likelihood is the probability of the observation given the model. If I assume a value of T, the probability P(t|T) of each value of t (up to and including T) is just 1/T (since each of the wardens is equally likely to be encountered). Bayes’ theorem can then be used to construct a posterior probability of P(T|t). Without going through all the nuts and bolts, I hope you can see that this probability will tail off for large T. Our observation of a (relatively) small value for t should lead us to suspect that T is itself (relatively) small. Indeed it’s a reasonable “best guess” that T=2t. This makes intuitive sense because the observed value of t then lies right in the middle of its range of possibilities.

Before going on, it is worth mentioning one other point about this kind of inference: that it is not at all powerful. Note that the likelihood just varies as 1/T. That of course means that small values are favoured over large ones. But note that this probability is uniform in logarithmic terms. So although T=1000 is more probable than T=1,000,000,  the range between 1000 and 10,000 is roughly as likely as the range between 1,000,000 and 10,000,0000, assuming there is no prior information. So although it tells us something, it doesn’t actually tell us very much. Just like any probabilistic inference, there’s a chance that it is wrong, perhaps very wrong.

Which brings me to an extrapolation of this argument to an argument about the end of the World. Now I don’t mind admitting that as I get older I get more and  more pessimistic about the prospects for humankind’s survival into the distant future. Unless there are major changes in the way this planet is governed, our Earth may indeed become barren and uninhabitable through war or environmental catastrophe. But I do think the future is in our hands, and disaster is, at least in principle, avoidable. In this respect I have to distance myself from a very strange argument that has been circulating among philosophers and physicists for a number of years. It is called Doomsday argument, and it even has a sizeable wikipedia entry, to which I refer you for more details and variations on the basic theme. As far as I am aware, it was first introduced by the mathematical physicist Brandon Carter and subsequently developed and expanded by the philosopher John Leslie (not to be confused with the TV presenter of the same name). It also re-appeared in slightly different guise through a paper in the serious scientific journal Nature by the eminent physicist Richard Gott. Evidently, for some reason, some serious people take it very seriously indeed.

So what can Doomsday possibly have to do with Panzer tanks or traffic wardens? Instead of traffic wardens, we want to estimate N, the number of humans that will ever be born, Following the same logic as in the example above, I assume that I am a “randomly” chosen individual drawn from the sequence of all humans to be born, in past present and future. For the sake of argument, assume I number n in this sequence. The logic I explained above should lead me to conclude that the total number N is not much larger than my number, n. For the sake of argument, assume that I am the one-billionth human to be born, i.e. n=1,000,000,0000.  There should not be many more than a few billion humans ever to be born. At the rate of current population growth, this means that not many more generations of humans remain to be born. Doomsday is nigh.

Richard Gott’s version of this argument is logically similar, but is based on timescales rather than numbers. If whatever thing we are considering begins at some time tbegin and ends at a time tend and if we observe it at a “random” time between these two limits, then our best estimate for its future duration is of order how long it has lasted up until now. Gott gives the example of Stonehenge, which was built about 4,000 years ago: we should expect it to last a few thousand years into the future. Actually, Stonehenge is a highly dubious . It hasn’t really survived 4,000 years. It is a ruin, and nobody knows its original form or function. However, the argument goes that if we come across a building put up about twenty years ago, presumably we should think it will come down again (whether by accident or design) in about twenty years time. If I happen to walk past a building just as it is being finished, presumably I should hang around and watch its imminent collapse….

But I’m being facetious.

Following this chain of thought, we would argue that, since humanity has been around a few hundred thousand years, it is expected to last a few hundred thousand years more. Doomsday is not quite as imminent as previously, but in any case humankind is not expected to survive sufficiently long to, say, colonize the Galaxy.

You may reject this type of argument on the grounds that you do not accept my logic in the case of the traffic wardens. If so, I think you are wrong. I would say that if you accept all the assumptions entering into the Doomsday argument then it is an equally valid example of inductive inference. The real issue is whether it is reasonable to apply this argument at all in this particular case. There are a number of related examples that should lead one to suspect that something fishy is going on. Usually the problem can be traced back to the glib assumption that something is “random” when or it is not clearly stated what that is supposed to mean.

There are around sixty million British people on this planet, of whom I am one. In contrast there are 3 billion Chinese. If I follow the same kind of logic as in the examples I gave above, I should be very perplexed by the fact that I am not Chinese. After all, the odds are 50: 1 against me being British, aren’t they?

Of course, I am not at all surprised by the observation of my non-Chineseness. My upbringing gives me access to a great deal of information about my own ancestry, as well as the geographical and political structure of the planet. This data convinces me that I am not a “random” member of the human race. My self-knowledge is conditioning information and it leads to such a strong prior knowledge about my status that the weak inference I described above is irrelevant. Even if there were a million million Chinese and only a hundred British, I have no grounds to be surprised at my own nationality given what else I know about how I got to be here.

This kind of conditioning information can be applied to history, as well as geography. Each individual is generated by its parents. Its parents were generated by their parents, and so on. The genetic trail of these reproductive events connects us to our primitive ancestors in a continuous chain. A well-informed alien geneticist could look at my DNA and categorize me as an “early human”. I simply could not be born later in the story of humankind, even if it does turn out to continue for millennia. Everything about me – my genes, my physiognomy, my outlook, and even the fact that I bothering to spend time discussing this so-called paradox – is contingent on my specific place in human history. Future generations will know so much more about the universe and the risks to their survival that they won’t even discuss this simple argument. Perhaps we just happen to be living at the only epoch in human history in which we know enough about the Universe for the Doomsday argument to make some kind of sense, but too little to resolve it.

To see this in a slightly different light, think again about Gott’s timescale argument. The other day I met an old friend from school days. It was a chance encounter, and I hadn’t seen the person for over 25 years. In that time he had married, and when I met him he was accompanied by a baby daughter called Mary. If we were to take Gott’s argument seriously, this was a random encounter with an entity (Mary) that had existed for less than a year. Should I infer that this entity should probably only endure another year or so? I think not. Again, bare numerological inference is rendered completely irrelevant by the conditioning information I have. I know something about babies. When I see one I realise that it is an individual at the start of its life, and I assume that it has a good chance of surviving into adulthood. Human civilization is a baby civilization. Like any youngster, it has dangers facing it. But is not doomed by the mere fact that it is young,

John Leslie has developed many different variants of the basic Doomsday argument, and I don’t have the time to discuss them all here. There is one particularly bizarre version, however, that I think merits a final word or two because is raises an interesting red herring. It’s called the “Shooting Room”.

Consider the following model for human existence. Souls are called into existence in groups representing each generation. The first generation has ten souls. The next has a hundred, the next after that a thousand, and so on. Each generation is led into a room, at the front of which is a pair of dice. The dice are rolled. If the score is double-six then everyone in the room is shot and it’s the end of humanity. If any other score is shown, everyone survives and is led out of the Shooting Room to be replaced by the next generation, which is ten times larger. The dice are rolled again, with the same rules. You find yourself called into existence and are led into the room along with the rest of your generation. What should you think is going to happen?

Leslie’s argument is the following. Each generation not only has more members than the previous one, but also contains more souls than have ever existed to that point. For example, the third generation has 1000 souls; the previous two had 10 and 100 respectively, i.e. 110 altogether. Roughly 90% of all humanity lives in the last generation. Whenever the last generation happens, there bound to be more people in that generation than in all generations up to that point. When you are called into existence you should therefore expect to be in the last generation. You should consequently expect that the dice will show double six and the celestial firing squad will take aim. On the other hand, if you think the dice are fair then each throw is independent of the previous one and a throw of double-six should have a probability of just one in thirty-six. On this basis, you should expect to survive. The odds are against the fatal score.

This apparent paradox seems to suggest that it matters a great deal whether the future is predetermined (your presence in the last generation requires the double-six to fall) or “random” (in which case there is the usual probability of a double-six). Leslie argues that if everything is pre-determined then we’re doomed. If there’s some indeterminism then we might survive. This isn’t really a paradox at all, simply an illustration of the fact that assuming different models gives rise to different probability assignments.

While I am on the subject of the Shooting Room, it is worth drawing a parallel with another classic puzzle of probability theory, the St Petersburg Paradox. This is an old chestnut to do with a purported winning strategy for Roulette. It was first proposed by Nicolas Bernoulli but famously discussed at greatest length by Daniel Bernoulli in the pages of Transactions of the St Petersburg Academy, hence the name.  It works just as well for the case of a simple toss of a coin as for Roulette as in the latter game it involves betting only on red or black rather than on individual numbers.

Imagine you decide to bet such that you win by throwing heads. Your original stake is £1. If you win, the bank pays you at even money (i.e. you get your stake back plus another £1). If you lose, i.e. get tails, your strategy is to play again but bet double. If you win this time you get £4 back but have bet £2+£1=£3 up to that point. If you lose again you bet £8. If you win this time, you get £16 back but have paid in £8+£4+£2+£1=£15 to that point. Clearly, if you carry on the strategy of doubling your previous stake each time you lose, when you do eventually win you will be ahead by £1. It’s a guaranteed winner. Isn’t it?

The answer is yes, as long as you can guarantee that the number of losses you will suffer is finite. But in tosses of a fair coin there is no limit to the number of tails you can throw before getting a head. To get the correct probability of winning you have to allow for all possibilities. So what is your expected stake to win this £1? The answer is the root of the paradox. The probability that you win straight off is ½ (you need to throw a head), and your stake is £1 in this case so the contribution to the expectation is £0.50. The probability that you win on the second go is ¼ (you must lose the first time and win the second so it is ½ times ½) and your stake this time is £2 so this contributes the same £0.50 to the expectation. A moment’s thought tells you that each throw contributes the same amount, £0.50, to the expected stake. We have to add this up over all possibilities, and there are an infinite number of them. The result of summing them all up is therefore infinite. If you don’t believe this just think about how quickly your stake grows after only a few losses: £1, £2, £4, £8, £16, £32, £64, £128, £256, £512, £1024, etc. After only ten losses you are staking over a thousand pounds just to get your pound back. Sure, you can win £1 this way, but you need to expect to stake an infinite amount to guarantee doing so. It is not a very good way to get rich.

The relationship of all this to the Shooting Room is that it is shows it is dangerous to pre-suppose a finite value for a number which in principle could be infinite. If the number of souls that could be called into existence is allowed to be infinite, then any individual as no chance at all of being called into existence in any generation!

Amusing as they are, the thing that makes me most uncomfortable about these Doomsday arguments is that they attempt to determine a probability of an event without any reference to underlying mechanism. For me, a valid argument about Doomsday would have to involve a particular physical cause for the extinction of humanity (e.g. asteroid impact, climate change, nuclear war, etc). Given this physical mechanism one should construct a model within which one can estimate probabilities for the model parameters (such as the rate of occurrence of catastrophic asteroid impacts). Only then can one make a valid inference based on relevant observations and their associated likelihoods. Such calculations may indeed lead to alarming or depressing results. I fear that the greatest risk to our future survival is not from asteroid impact or global warming, where the chances can be estimated with reasonable precision, but self-destructive violence carried out by humans themselves. Science has no way of being able to predict what atrocities people are capable of so we can’t make any reliable estimate of the probability we will self-destruct. But the absence of any specific mechanism in the versions of the Doomsday argument I have discussed robs them of any scientific credibility at all.

There are better grounds for worrying about the future than simple-minded numerology.



49 Responses to “German Tanks, Traffic Wardens, and the End of the World”

  1. “For me, a valid argument about Doomsday would have to involve a particular physical cause for the extinction of humanity”

    Well if the LHC black hole (or worse) doesn’t get us the triffids surely will:

    But a lifeform with a soul (unlike a plain lifeform) can be surprised to find itself incarnated in a naturalistic but fine-tuned and life-supporting cosmos and can infer that Doomsday is probably nothing to worry about anyway:

  2. “Not knowing what this was”

    You hereby own up to not being a regular reader of Cusp’s blog:

  3. “There are around sixty million British people on this planet, of whom I am one. In contrast there are 3 billion Chinese. If I follow the same kind of logic as in the examples I gave above, I should be very perplexed by the fact that I am not Chinese. After all, the odds are 50: 1 against me being British, aren’t they?”

    I think this objection is bogus. If all Brits and all Chinese apply this argument, then only a small fraction conclude that they are atypical, which is the whole point if being atypical is defined as belonging to the minority. It is extremely unlikely to win the lottery, yet someone wins every week. If I say in advance “person X will win the lottery” and indeed person X does, then this is indeed surprising. However, no news reporter would visit someone who has won the lottery to report on a very unlikely event.

  4. “Following this chain of thought, we would argue that, since humanity has been around a few hundred thousand years, it is expected to last a few hundred thousand years more. Doomsday is not quite as imminent as previously, but in any case humankind is not expected to survive sufficiently long to, say, colonize the Galaxy.”

    This is a bit misleading. When Gott visited the Berlin wall, or a Broadway play, then his assumption that he was there at a random time is valid. However, human population has been growing exponentially for a wrong time, so you should think of yourself being picked from this distribution, and not a uniform one. This crucial difference is the essence of the Doomsday argument. You can’t apply Gott’s argument to argue that the Doomsday argument gives too short a timescale since the underlying assumption of Gott’s argument is a uniform distribution in time. (If you modify this for an exponential distribution then your are back at the Doomsday argument.)

  5. “If I happen to walk past a building just as it is being finished, presumably I should hang around and watch its imminent collapse….”

    Not really. This would be expected to happen once in a while. You have to integrate over all buildings you have walked past in your life. In a few cases, you would expect this to happen.

    This is related to the discussion of whether a certain statistical anomaly is interesting. An isolated event with a probability of being due to chance of only 1/1000 might be worth looking into. If I investigate 1000 such possible events, then it is not. Same thing.

  6. “The allied command structure had thought the heavy Panzer V (Panther) tanks”

    I’m not an expert on military history, and the tanks might well be called “Panther” in English. However, the similarity to “Panzer” is coincidental. “Panzer” means “body armour”, be that of a knight or a tortoise, and by extension an armoured vehicle. The German word “Panther” is the same as the English word “panther”*. To confuse things, there are various German tanks known as “leopards” (which is spelled the same way in German as in English).

    *For some reason, this cat has many names in English: panther, cougar, mountain lion, catamount, puma, just to name a few. (German has just Panther and Puma.)

  7. I’m looking forward to your blog post on the Sleeping-Beauty problem” (yes, the dash essentially changes the meaning).

  8. Anton Garrett Says:

    “the thing that makes me most uncomfortable about these Doomsday arguments is that they attempt to determine a probability of an event without any reference to underlying mechanism.”

    Yes, that is the point. As you then say:

    “For me, a valid argument about Doomsday would have to involve a particular physical cause for the extinction of humanity (e.g. asteroid impact, climate change, nuclear war, etc). Given this physical mechanism one should construct a model within which one can estimate probabilities for the model parameters (such as the rate of occurrence of catastrophic asteroid impacts). Only then can one make a valid inference based on relevant observations and their associated likelihoods.”

    And also the prior. But yes, this is the basic error in the reasoning that you are (rightly) criticising.

    • telescoper Says:

      Yes, I should have mentioned the prior in that paragraph – the nature of the prior is however implicitly related to the form of the model.

  9. Gott’s argument is essentially that a typical observer, one observing something at a random time (assuming a flat prior), will not observe this at a special time.* One can turn this around and say that the typical state of some object is that which a typical observer (one observing at a typical time) will see. In the case of the Berlin wall or a Broadway play, this doesn’t provide one with any extra information, because these objects didn’t change much during their existence. However, if the state of an object changes with time, in particular if there is a special, relatively short time when it is in an atypical state, then a typical observer would not expect to observe this state. For this reason, one would not expect a typical observer to observe a large value of Omega in a universe which will collapse in the future, even though Omega becomes infinite (but, of course, for a very short time). I claim that neglecting this effect leads almost everyone to misunderstand one aspect of the flatness problem. I’m still waiting to be proved wrong. 😐 (In a nutshell, the time evolution of a pencil falling after having been almost balanced on its tip, or similar behaviour on the part of a tightrope walker, is quantitatively different than the expansion of a Friedmann universe; this is one example where arguing too far from analogy leads to a wrong impression.)

    *Gott’s argument has provoked a huge amount of discussion, but in its basic form is trivially true: Go to Egypt and you will see the pyramids and you will see an apartment house built in the 1970s. Which will be there 1000 years hence? If I want to leave the room only after I can be reasonably sure that the baby will sleep for an hour, I can be more confident if I leave after the baby has been asleep for 5 minutes than if I leave after it has been asleep for 5 seconds. People do this intuitively even if they have never heard of Gott.

    • Anton Garrett Says:

      That’s because they know that the baby’s internal sleep mechanisms are such that if it has been asleep only 5 seconds it is not necessarily fully quiescent whereas if it has been asleep 5 mins then it is. In your other example, stone lasts longer than brick or mud buildings because the mechanisms by which buildings decay are slower for stone. It’s always about prior information about mechanisms and if you are deprived of that then you are in trouble.

      An observation is an observation. Nobody has any right to assume that they are a “typical observer”.

      • In both cases, the logic is that a state which has existed for a longer period of time will probably continue to exist for a longer period of time. There are of course explanations for each individual case, but one can come to this (statistically correct) conclusion without knowing the mechanism in each individual case.

        I think if I offered you 1000:1 odds that something which has been in existence for 1000 times longer than something else would last longer into the future than this something else, then you should take the be (if you are interested in maximizing your income) even if you know nothing about the mechanisms determining the longevity of the objects.

      • telescoper Says:

        …even if you understand what “typical observer” means!

      • A typical observer is one who considers himself to be a typical observer. 🙂

      • telescoper Says:

        I think that’s a deluded observer…

      • Not necessarily a contradiction.

      • Anton Garrett Says:

        “I think if I offered you 1000:1 odds that something which has been in existence for 1000 times longer than something else would last longer into the future than this something else, then you should take the bet (if you are interested in maximizing your income) even if you know nothing about the mechanisms determining the longevity of the objects.”

        I disagree. I think that you are being influenced by your experience of situations in which you know something about the mechanism, such as the two examples you gave. As the point is to strip out all prior information, the problem you invent regarding time is logically equivalent to another regarding space, namely how far you are from the endpoint of a lengthy ruler. In the absence of any observation you are equally likely to be anywhere.

    • Anton Garrett Says:

      Let me phrase that spatial analogue more clearly. You arrive at a railway station in a country about which you know nothing at all. A local tells you that there are no branches on this line. You ask him how many stops there are to the end of the line in one direction. He tells you – say, 17. Can you infer anything from this about the number of stops to the end of the line in the other direction? In other words, can you infer anything from his reply about the number of stops on the entire railway line (since you can just subtract 17 from that)?

      • With 95% confidence, more than 17 but less than 663 total stops.

      • Anton Garrett Says:

        Please show me the calculation.

      • Following Gott: I don’t know how many stops there are, I know only that I at at the 17th stop. Assuming that all are likely, then with 95 per cent confidence I am in the middle 95 per cent of the stops. Thus, there must be less than 39*17=663 otherwise I would be in the first two-and-one-half per cent. There must also be more than 17 (MAX(17,17/39)), trivially. (If I were at, say, stop 780, then the minimum would be 780/39=20, otherwise I would be in the last two-and-one-half per cent.)

      • Anton Garrett Says:

        Suppose the railway (no branches) comprises N stations including the termini; N >2 so that there is at least one intermediate stop. You assign a prior probability Prior(N|I)=f(N) where I is your prior information about railway lines. Suppose the names of all stations including the termini are put in an urn and one is withdrawn, after which you are set down at that station (a “randomly selected” station, meaning that any station is equally likely); this is information J. You ask the stationmaster, “How many stations are there down the line to the left when looking at the track from your office on the platform?” He answers, “n”. (NB n may be zero; the track is such that you cannot tell if you are at a terminus because the buffers are just out of sight beyond the platform). The selection process means that p(n|N,J) is independent of n; as there are N possible answers (0,1,…(N-1)) it follows that p(n|N,J)=1/N for n=0,1…(N-1) and =0 for n>(N-1); in other words p(n|N,J)=(1/N)H(N-n-1) where H(*) is zero if its (integer-valued) argument is negative and is unity if its argument is zero or positive. (It is of course assumed that all variables are non-negative.) Bayes’ theorem tells us that p(N|n,I,J) is proportional to p(n|N,I,J)p(N|I,J) where the constant of proportionality comes from normalisation over N. Now, I is redundant in the conditioning information in p(n|N,I,J), and J is redundant in p(N|I,J), so that p(N|n,I,J) is proportional to p(n|N,J)p(N|I), which upon substitution is proportional to (1/N)f(N)H(N-n-1). Suppose now that there are m stations along the track to the right. We can change variable from N to m, since N=m+n+1; the result is that the probability distribution for m is proportional to 1/(m+n+1) * f(m+n+1)H(m) where the constant of proportionality comes from normalising over m from 0 to infinity. As all variables are understood to be non-negative the H-function need not be written and the posterior for m is equal to (1/K)1/(m+n+1) * f(m+n+1) where K is given by normalising over all non-negative values of m. This is the Bayesian solution and is the basis for analysing all other intuitive analyses, which hopefully I’ll do within 24 hours.

      • Anton Garrett Says:

        Now the continuous case. You are put down on a non-branching railway track at a “random position” defined as being equally likely anywhere. You assign a prior density f(L) for the total length L of the railway based on your knowledge of railway geography; \int dL f(L) =1 integrated from 0 to infinity. You have a railway surveyor handy and you ask him how far to the end of the line looking to the left from the side of the tracks you are standing on. He answers “x” (in the same units of distance as L). What is your posterior density for y, the distance to the right to the buffers, ie y=(L-x) ?

        An analogous analysis gives the answer

        (1/K) 1/(x+y) * f(x+y)

        where K is a function of x, given by

        \int dy 1/(x+y) * f(x+y) from 0 to infinity.

        Now we can start comparing with intuitive analyses. The prior f(*) is bound to contain some characteristic distances (at the least, its mean) and the mode for y can be compared to these characteristic distances for various forms of f.

      • So, you agree? 🙂

      • Anton Garrett Says:

        Busy weekend Philip, as you can judge from my warning of delayed responses on Friday. Then and in the continuum case I set the scene by giving the Bayesian (ie, correct) answer. Before I analyse your argument, here’s a question: What if, before the ‘experiment’, you were 99.999% certain that there were 997 stations?

      • I don’t really see a conflict. The Gott case is that you are there at a random time and don’t have any further information. Of course the conclusion changes if your prior information changes, which is why it doesn’t work in the case of a human, for example (finite maximum age and a pretty sharp cutoff at that). Obviously, if I am 99.999% certain that there were 997 stations then if I find myself at number 996 I will reach a different conclusion if I didn’t have this prior information.

      • Anton Garrett Says:

        OK Phillip, let me sharpen the question. Suppose your prior info is such that you are 100% certain that the railway has 997 stops. This is still a special case of the probabilistic situation, and can therefore be used to test assertions based on intuition rather than Bayes’ theorem. In that case, are you still 95% confident that you are in the middle 95% of all the stops?

      • No, but I never said I was. 100 per cent sure is strong prior information. The whole point is whether one can say anything at all if one has essentially no information.

      • Anton Garrett Says:

        Phillip, Your statement “Assuming that all are likely, then with 95 per cent confidence I am in the middle 95 per cent of the stops” is not contingent on the prior information you have, and I am showing you by means of extreme prior information that it can therefore fail. It is therefore untrustworthy – which, in problems involving inference, is another way of saying wrong.

      • The point is that this statement applies in the lack of any prior information. Gott visiting the Berlin Wall in 1970 or whatever with no further information, and Gott visiting the Berlin wall knowing when it was built, or with intelligence indicating the rise of Gorbachev, is a different situation. If a conclusion holds in the absence of prior information, then one can’t say that providing prior information changes the inference. This is true, but not relevant to the original problem.

        I get off of a bus and look at the house number on a street (where houses are consecutively numbered). What is the highest number on the street? I can only guess, of course, and for a 95 per cent confidence answer I will be wrong 5 per cent of the time. However, if I see the number 2 I will reach a different conclusion than if I see the number 200. All Gott’s argument does is quantify this.

      • Anton Garrett Says:


        My station problem is already a simple model problem and I don’t think your new model problem is any simpler. I’m reluctant to analyse it because there can be subtle differences between a thing and an analogy to it. We are in disagreement about the railway problem; let’s try to settle it there.

        Let proposition A = “the middle 95% of stations are in the urn” and B = “my station is in the middle 95%”.

        Then you are reasoning (I think) that

        p(B|A) = 0.95 = p(A|B).

        This is the usual error appearing in intuitive non-Bayesian reversion; not all people who are female are pregnant, but all people who are pregnant are female.

      • Anton Garrett Says:

        Phillip: to make my critique more specific, with 95% probability you are in the first 95% of the track, or the middle 95%, or the end 95%, or – crucially – the subset of track comprising the first 47.5% plus the last 47.5%.

      • Right, but all are equally correct. When making a contour plot, there are in general an infinite number of 95 per cent contours. It is convention to draw the smallest such contour, but in the case of a constant pdf there is not a unique such contour.

      • Anton Garrett Says:

        “100 per cent sure is strong prior information. The whole point is whether one can say anything at all if one has essentially no information.”

        The whole point is to make a general statement that doesn’t fail in special cases.

      • Anton Garrett Says:

        Phillip, I confess that I don’t fully understand your brief argument at 8.01pm on 20th, and as a Bayesian who finds that it fails in special cases I take that to mean it contains a subtle error. Unless you fill it out, the only thing I can do is to give the Bayesian version, as follows:

        Define the proposition

        A = “my stop is in the middle 95% of all stops”

        Then prior(A|J) = 0.95. Define

        B = “my stop is the 18th”

        To calculate posterior(A|B,J) we must use Bayes’ theorem:

        posterior(A|B,J) = (1/K) prior(A|J)likelihood(B|J,A)


        K = prior(A|J)likelihood(B|J,A) + prior(~A|J)likelihood(B|J,~A)

        The priors are known, the likelihoods can be found, but even then you have to play intuitive tricks to get the probability for the number of stations – when Bayes’ theorem allows you to consider this proposition directly.

      • Maybe the difference is that Gott doesn’t think of being in the middle 95 per cent as a prior. Maybe that is wrong, I don’t know.

        Say there are 100 children in a school, and the are numbered by consecutive integers. Picking one at random, there is a 90 per cent probability that the number is between 6 an 95 (inclusive). I hope we can agree on that.

        Suppose I am give the number n of a child at random and want to estimate the size of the class. AT 90 per cent confidence, it is between 19/18*n and 19*n.

        That’s it. Is this a different answer than the Bayesian answer?

      • Anton Garrett Says:

        Never mind Gott, the argument at 8.01pm on 20th was yours; can you fill it out please?

      • This is just a direct translation of Gott’s time argument to railway stops; if I’m at stop 17 then I am pretty sure that there are less than 800; this just quantifies it.

      • Anton Garrett Says:

        I’m not asking you to translate it across model problems; I’m asking you to explain it in more detail than you have yet done above, if you are willing.

  10. “Indeed it’s a reasonable “best guess” that T=2t. This makes intuitive sense because the observed value of t then lies right in the middle of its range of possibilities.”

    A quick Monte Carlo program suggests to me your best guess was wrong:

    Mode result: 1.0 appeared 33300.0 times.
    2.0 appeared 26797 .0 times.

    (Although feel free to point out mistakes in the program, or if I misunderstood your point!)

    from numpy.random import randint, seed
    from numpy import zeros, size
    from scipy.stats import mode

    #Number repeats
    N = 100000
    results_array = zeros(N)
    Number_2 = 0

    for i in range (0,N):
    #pick the size of the range
    range_size = randint(2,1e9)

    #pick a number from the range
    random_number = randint(1, range_size -1)

    result = float(range_size)/random_number
    results_array[i] = int(round(result,0))
    if (results_array[i] == 2) : Number_2 +=1

    print “Number: “, random_number
    print “Coles Estimate of range: “, 2*random_number
    print “Actual range: “, range_size
    print ” “, result, int(round(result,0))
    print ” ”

    print ” ”
    print “Mode result: “, mode(results_array)[0][0],” appeared “, mode(results_array)[1][0],” times.”
    print “2.0 appeared “, Number_2,”.0 times.”

  11. John Peacock Says:

    It surprises me that “doomsday” still gets so much discussion. Ken Olum pointed out the logical flaw quite a few years ago:

    • I’m not disagreeing, John, but the problem with pointing to such a reference is that someone has surely criticized Ken Olum since then.

      In general, when there is a huge literature on a topic, it is difficult to address it in blog comments. 😐

      Someone recently pointed to an article where Mike Turner was the second of two co-authors, which claimed that CDM naturally produces MOND phenomenology. Of course, this article has been criticized in turn. Even if one does not agree with the MOND folks, I think it is fair to say that the Turner paper at best gets an order-of-magnitude estimate right, but is far from explaining all MOND phenomenology.

  12. […] week I posted an item that included a discussion of the Doomsday Argument. A subsequent comment on that post mentioned a […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: