Archive for probability

A Paradox in Probability

Posted in Cute Problems, mathematics with tags , on November 29, 2022 by telescoper

I just came across this paradox in an old book of mathematical recreations and thought it was cute so I’d share it here:

Here are two possible solutions to pick from:

Since we are now in the era of precision cosmology, an uncertainty of a factor of 400 is not acceptable so which answer is correct? Or are they both wrong?

A Question of Distributions and Entropies

Posted in mathematics with tags , , on November 28, 2022 by telescoper

I thought I’d use the medium of this blog to pick the brains of my readers about some general questions I have about probability and entropy as described on the chalkboard above in order to help me with my homework.

Imagine that px(x) and py(y) are one-point probability density functions and pxy(x,y) is a two-point (joint) probability density function defined so that its marginal distributions are px(x) and py(y) and shown on the left-hand side of the board. These functions are all non-negative definite and integrate to unity as shown.

Note that, unless x and y are independent, in which case pxy(x,y) = px(x) py(y), the joint probability cannot be determined from the marginals alone.

On the right we have Sx, Sy and Sxy defined by integrating plogp for the two univariate distributions and the bivariate distributions respectively as shown on the right-hand side of the board. These would be proportional to the Gibbs entropy of the distributions concerned but that isn’t directly relevant.

My question is: what can be said in general terms (i.e. without making any further assumptions about the distributions involved) about the relationship between Sx, Sy and Sxy ?

Answers on a postcard through the comments block please!

A Vaccination Fallacy

Posted in Bad Statistics, Covid-19 with tags , , , , on June 27, 2021 by telescoper

I have been struck by the number of people upset by the latest analysis of SARS-Cov-2 “variants of concern” byPublic Health England. In particular it is in the report that over 40% of those dying from the so-called Delta Variant have had both vaccine jabs. I even saw some comments on social media from people saying that this proves that the vaccines are useless against this variant and as a consequence they weren’t going to bother getting their second jab.

This is dangerous nonsense and I think it stems – as much dangerous nonsense does – from a misunderstanding of basic probability which comes up in a number of situations, including the Prosecutor’s Fallacy. I’ll try to clarify it here with a bit of probability theory. The same logic as the following applies if you specify serious illness or mortality, but I’ll keep it simple by just talking about contracting Covid-19. When I write about probabilities you can think of these as proportions within the population so I’ll use the terms probability and proportion interchangeably in the following.

Denote by P[C|V] the conditional probability that a fully vaccinated person becomes ill from Covid-19. That is considerably smaller than P[C| not V] (by a factor of ten or so given the efficacy of the vaccines). Vaccines do not however deliver perfect immunity so P[C|V]≠0.

Let P[V|C] be the conditional probability of a person with Covid-19 having been fully vaccinated. Or, if you prefer, the proportion of people with Covid-19 who are fully vaccinated..

Now the first thing to point out is that these conditional probability are emphatically not equal. The probability of a female person being pregnant is not the same as the probability of a pregnant person being female.

We can find the relationship between P[C|V] and P[V|C] using the joint probability P[V,C]=P[V,C] of a person having been fully vaccinated and contracting Covid-19. This can be decomposed in two ways: P[V,C]=P[V]P[C|V]=P[C]P[V|C]=P[V,C], where P[V] is the proportion of people fully vaccinated and P[C] is the proportion of people who have contracted Covid-19. This gives P[V|C]=P[V]P[C|V]/P[C].

This result is nothing more than the famous Bayes Theorem.

Now P[C] is difficult to know exactly because of variable testing rates and other selection effects but is presumably quite small. The total number of positive tests since the pandemic began in the UK is about 5M which is less than 10% of the population. The proportion of the population fully vaccinated on the other hand is known to be about 50% in the UK. We can be pretty sure therefore that P[V]»P[C]. This in turn means that P[V|C]»P[C|V].

In words this means that there is nothing to be surprised about in the fact that the proportion of people being infected with Covid-19 is significantly larger than the probability of a vaccinated person catching Covid-19. It is expected that the majority of people catching Covid-19 in the current phase of the pandemic will have been fully vaccinated.

(As a commenter below points out, in the limit when everyone has been vaccinated 100% of the people who catch Covid-19 will have been vaccinated. The point is that the number of people getting ill and dying will be lower than in an unvaccinated population.)

The proportion of those dying of Covid-19 who have been fully vaccinated will also be high, a point also made here.

It’s difficult to be quantitatively accurate here because there are other factors involved in the risk of becoming ill with Covid-19, chiefly age. The reason this poses a problem is that in my countries vaccinations have been given preferentially to those deemed to be at high risk. Younger people are at relatively low risk of serious illness or death from Covid-19 whether or not they are vaccinated compared to older people, but the latter are also more likely to have been vaccinated. To factor this into the calculation above requires an additional piece of conditioning information. We could express this crudely in terms of a binary condition High Risk (H) or Low Risk (L) and construct P(V|L,H) etc but I don’t have the time or information to do this.

So please don’t be taken in by this fallacy. Vaccines do work. Get your second jab (or your first if you haven’t done it yet). It might save your life.

A Virus Testing Probability Puzzle

Posted in Cute Problems, mathematics with tags , on April 13, 2020 by telescoper

Here is a topical puzzle for you.

A test is designed to show whether or not a person is carrying a particular virus.

The test has only two possible outcomes, positive or negative.

If the person is carrying the virus the test has a 95% probability of giving a positive result.

If the person is not carrying the virus the test has a 95% probability of giving a negative result.

A given individual, selected at random, is tested and obtains a positive result. What is the probability that they are carrying the virus?

Update 1: the comments so far have correctly established that the answer is not what you might naively think (ie 95%) and that it depends on the fraction of people in the population actually carrying the virus. Suppose this is f. Now what is the answer?

Update 2: OK so we now have the probability for a fixed value of f. Suppose we know nothing about f in advance. Can we still answer the question?

Answers and/or comments through the comments box please.

The First Bookie

Posted in Football, mathematics, Sport with tags , , , , , , on April 24, 2019 by telescoper

I read an interesting piece in Sunday’s Observer which is mainly about the challenges facing the modern sports betting industry but which also included some interesting historical snippets about the history of gambling.

One thing that I didn’t know before reading this article was that it is generally accepted that the first ever bookmaker was a chap called Harry Ogden who started business in the late 18th century on Newmarket Heath. Organized horse-racing had been going on for over a century by then, and gambling had co-existed with it, not always legally. Before Harry Ogden, however, the types of wager were very different from what we have nowadays. For one thing bets would generally be offered on one particular horse (the Favourite), against the field. There being only two outcomes these were generally even-money bets, and the wagers were made between individuals rather than being administered by a `turf accountant’.

Then up stepped Harry Ogden, who introduced the innovation of laying odds on every horse in a race. He set the odds based on his knowledge of the form of the different horses (i.e. on their results in previous races), using this data to estimate probabilities of success for each one. This kind of `book’, listing odds for all the runners in a race, rapidly became very popular and is still with us today. The way of specifying odds as fractions (e.g. 6/1 against, 7/1 on) derives from this period.

Ogden wasn’t interested in merely facilitating other people’s wagers: he wanted to make a profit out of this process and the system he put in place to achieve this survives to this day. In particular he introduced a version of the overround, which works as follows. I’ll use a simple example from football rather than horse-racing because I was thinking about it the other day while I was looking at the bookies odds on relegation from the Premiership.

Suppose there is a football match, which can result either in a HOME win, an AWAY win or a DRAW. Suppose the bookmaker’s expert analysts – modern bookmakers employ huge teams of these – judge the odds of these three outcomes to be: 1-1 (evens) on a HOME win, 2-1 against the DRAW and 5-1 against the AWAY win. The corresponding probabilities are: 1/2 for the HOME win, 1/3 for the DRAW and 1/6 for the AWAY win. Note that these add up to 100%, as they are meant to be probabilities and these are the only three possible outcomes. These are `true odds’.

Offering these probabilities as odds to punters would not guarantee a return for the bookie, who would instead change the odds so they add up to more than 100%. In the case above the bookie’s odds might be: 4-6 for the HOME win; 6-4 for the DRAW and 4-1 against the AWAY win. The implied probabilities here are 3/5, 2/5 and 1/5 respectively, which adds up to 120%, not 100%. The excess is the overround or `bookmaker’s margin’ – in this case 20%.

This is quite the opposite to the Dutch Book case I discussed here.

Harry Ogden applied his method to horse races with many more possible outcomes, but the principle is the same: work out your best estimate of the true odds then apply your margin to calculate the odds offered to the punter.

One thing this means is that you have to be careful f you want to estimate the probability of an event from a bookie’s odds. If they offer you even money then that does not mean they you have a 50-50 chance!

A Problem of Sons

Posted in Cute Problems with tags , , on January 31, 2019 by telescoper

I’m posting this in the Cute Problems folder, but I’m mainly putting it up here as a sort of experiment. This little puzzle was posted on Twitter by someone I follow and it got a huge number of responses (>25,000). I was fascinated by the replies, and I’m really interested to see whether the distribution of responses from readers of this blog is different.

Anyway, here it is, exactly as posted on Twitter:

Assume there is a 50:50 chance of any child being male or female.

Now assume four generations, all other things being equal.

What are the odds of a son being a son of a son of a son?

Please choose an answer from those below:

 

UPDATE: The answer is below:

 

Continue reading

The Problem with Odd Moments

Posted in Bad Statistics, Cute Problems, mathematics with tags , , on July 9, 2018 by telescoper

Last week, realizing that it had been a while since I posted anything in the cute problems folder, I did a quick post before going to a meeting. Unfortunately, as a couple of people pointed out almost immediately, there was a problem with the question (a typo in the form of a misplaced bracket). I took the post offline until I could correct it and then promptly forgot about it. I remembered it yesterday so have now corrected it. I also added a useful integral as a hint at the end, because I’m a nice person. I suggest you start by evaluating the expectation value (i.e. the first-order moment). Answers to parts (2) and (3) through the comments box please!

Answers to (2) and (3) via the comments box please!

 

SOLUTION: I’ll leave you to draw your own sketch but, as Anton correctly points out, this is a distribution that is asymmetric about its mean but has all odd-order moments equal (including the skewness) equal to zero. it therefore provides a counter-example to common assertions, e.g. that asymmetric distributions must have non-zero skewness. The function shown in the problem was originally given by Stieltjes, but a general discussion can be be found in E. Churchill (1946) Information given by odd moments, Ann. Math. Statist. 17, 244-6. The paper is available online here.

Joseph Bertrand and the Monty Hall Problem

Posted in Bad Statistics, History, mathematics with tags , , , , on October 4, 2017 by telescoper

The death a few days ago of Monty Hall reminded me of something I was going to write about the Monty Hall Problem, as it did with another blogger I follow, namely that (unsrurprisingly) Stigler’s Law of Eponymy applies to this problem.

The earliest version of the problem now called the Monty Hall Problem dates from a book, first published in 1889, called Calcul des probabilités written by Joseph Bertrand. It’s a very interesting book, containing much of specific interest to astronomers as well as general things for other scientists. Ypu can read it all online here, if you can read French.

As it happens, I have a copy of the book and here is the relevant problem. If you click on the image it should be legible.

It’s actually Problem 2 of Chapter 1, suggesting that it’s one of the easier, introductory questions. Interesting that it has endured so long, even if it has evolved slightly!

I won’t attempt a full translation into English, but the problem is worth describing as it is actually more interesting than the Monty Hall Problem (with the three doors). In the Bertrand version there are three apparently identical boxes (coffrets) each of which has two drawers (tiroirs). In each drawer of each box there is a medal. In the first box there are two gold medals. The second box contains two silver medals. The third box contains one gold and one silver.

The boxes are shuffled, and you pick a box `at random’ and open one drawer `randomly chosen’ from the two. What is the probability that the other drawer of the same box contains a medal that differs from the first?

Now the probability that you select a box with two different medals in the first place is just 1/3, as it has to be the third box: the other two contain identical medals.

However, once you open one drawer and find (say) a silver medal then the probability of the other one being different (i.e. gold) changes because the knowledge gained by opening the drawer eliminates (in this case) the possibility that you selected the first box (which has only gold medals in it). The probability of the two medals being different is therefore 1/2.

That’s a very rough translation of the part of Bertrand’s discussion on the first page. I leave it as an exercise for the reader to translate the second part!

I just remembered that this is actually the same as the three-card problem I posted about here.

Fear, Risk, Uncertainty and the European Union

Posted in Politics, Science Politics, The Universe and Stuff with tags , , , , , , , , , on April 11, 2016 by telescoper

I’ve been far too busy with work and other things to contribute as much as I’d like to the ongoing debate about the forthcoming referendum on Britain’s membership of the European Union. Hopefully I’ll get time for a few posts before June 23rd, which is when the United Kingdom goes to the polls.

For the time being, however, I’ll just make a quick comment about one phrase that is being bandied about in this context, namely Project Fear.As far as I am aware this expression first came up in the context of last year’s referendum on Scottish independence, but it’s now being used by the “leave” campaign to describe some of the arguments used by the “remain” campaign. I’ve met this phrase myself rather often on social media such as Twitter, usually in use by a BrExit campaigner accusing me of scaremongering because I think there’s a significant probability that leaving the EU will cause the UK serious economic problems.

Can I prove that this is the case? No, of course not. Nobody will know unless and until we try leaving the EU. But my point is that there’s definitely a risk. It seems to me grossly irresponsible to argue – as some clearly are doing – that there is no risk at all.

This is all very interesting for those of us who work in university science departments because “Risk Assessments” are one of the things we teach our students to do as a matter of routine, especially in advance of experimental projects. In case you weren’t aware, a risk assessment is

…. a systematic examination of a task, job or process that you carry out at work for the purpose of; Identifying the significant hazards that are present (a hazard is something that has the potential to cause someone harm or ill health).

Perhaps we should change the name of our “Project Risk Assessments” to “Project Fear”?

I think this all demonstrates how very bad most people are at thinking rationally about uncertainty, to such an extent that even thinking about potential hazards is verboten. I’ve actually written a book about uncertainty in the physical sciences , partly in an attempt to counter the myth that science deals with absolute certainties. And if physics doesn’t, economics definitely can’t.

In this context it is perhaps worth mentioning the  definitions of “uncertainty” and “risk” suggested by Frank Hyneman Knight in a book on economics called Risk, Uncertainty and Profit which seem to be in standard use in the social sciences.  The distinction made there is that “risk” is “randomness” with “knowable probabilities”, whereas “uncertainty” involves “randomness” with “unknowable probabilities”.

I don’t like these definitions at all. For one thing they both involve a reference to “randomness”, a word which I don’t know how to define anyway; I’d be much happier to use “unpredictability”.In the context of BrExit there is unpredictability because we don’t have any hard information on which to base a prediction. Even more importantly, perhaps, I find the distinction between “knowable” and “unknowable” probabilities very problematic. One always knows something about a probability distribution, even if that something means that the distribution has to be very broad. And in any case these definitions imply that the probabilities concerned are “out there”, rather being statements about a state of knowledge (or lack thereof). Sometimes we know what we know and sometimes we don’t, but there are more than two possibilities. As the great American philosopher and social scientist Donald Rumsfeld (Shurely Shome Mishtake? Ed) put it:

“…as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know.”

There may be a proper Bayesian formulation of the distinction between “risk” and “uncertainty” that involves a transition between prior-dominated (uncertain) and posterior-dominated (risky), but basically I don’t see any qualititative difference between the two from such a perspective.

When it comes to the EU referendum is that probabilities of different outcomes are difficult to calculate because of the complexity of economics generally and the dynamics of trade within and beyond the European Union in particular. Moreover, probabilities need to be updated using quantitative evidence and we don’t actually have any of that. But it seems absurd to try to argue that there is neither any risk nor any uncertainty. Frankly, anyone who argues this is just being irrational.

Whether a risk is worth taking depends on the likely profit. Nobody has convinced me that the country as a whole will gain anything concrete if we leave the European Union, so the risk seems pointless. Cui Bono? I think you’ll find the answer to that among the hedge fund managers who are bankrolling the BrExit campaign…

 

 

Life as a Condition of Cosmology

Posted in The Universe and Stuff with tags , , , , , , , on November 7, 2015 by telescoper

Trigger Warnings: Bayesian Probability and the Anthropic Principle!

Once upon a time I was involved in setting up a cosmology conference in Valencia (Spain). The principal advantage of being among the organizers of such a meeting is that you get to invite yourself to give a talk and to choose the topic. On this particular occasion, I deliberately abused my privilege and put myself on the programme to talk about the “Anthropic Principle”. I doubt if there is any subject more likely to polarize a scientific audience than this. About half the participants present in the meeting stayed for my talk. The other half ran screaming from the room. Hence the trigger warnings on this post. Anyway, I noticed a tweet this morning from Jon Butterworth advertising a new blog post of his on the very same subject so I thought I’d while away a rainy November afternoon with a contribution of my own.

In case you weren’t already aware, the Anthropic Principle is the name given to a class of ideas arising from the suggestion that there is some connection between the material properties of the Universe as a whole and the presence of human life within it. The name was coined by Brandon Carter in 1974 as a corrective to the “Copernican Principle” that man does not occupy a special place in the Universe. A naïve application of this latter principle to cosmology might lead us to think that we could have evolved in any of the myriad possible Universes described by the system of Friedmann equations. The Anthropic Principle denies this, because life could not have evolved in all possible versions of the Big Bang model. There are however many different versions of this basic idea that have different logical structures and indeed different degrees of credibility. It is not really surprising to me that there is such a controversy about this particular issue, given that so few physicists and astronomers take time to study the logical structure of the subject, and this is the only way to assess the meaning and explanatory value of propositions like the Anthropic Principle. My former PhD supervisor, John Barrow (who is quoted in John Butterworth’s post) wrote the definite text on this topic together with Frank Tipler to which I refer you for more background. What I want to do here is to unpick this idea from a very specific perspective and show how it can be understood quite straightfowardly in terms of Bayesian reasoning. I’ll begin by outlining this form of inferential logic.

I’ll start with Bayes’ theorem which for three logical propositions (such as statements about the values of parameters in theory) A, B and C can be written in the form

P(B|AC) = K^{-1}P(B|C)P(A|BC) = K^{-1} P(AB|C)

where

K=P(A|C).

This is (or should be!)  uncontroversial as it is simply a result of the sum and product rules for combining probabilities. Notice, however, that I’ve not restricted it to two propositions A and B as is often done, but carried throughout an extra one (C). This is to emphasize the fact that, to a Bayesian, all probabilities are conditional on something; usually, in the context of data analysis this is a background theory that furnishes the framework within which measurements are interpreted. If you say this makes everything model-dependent, then I’d agree. But every interpretation of data in terms of parameters of a model is dependent on the model. It has to be. If you think it can be otherwise then I think you’re misguided.

In the equation,  P(B|C) is the probability of B being true, given that C is true . The information C need not be definitely known, but perhaps assumed for the sake of argument. The left-hand side of Bayes’ theorem denotes the probability of B given both A and C, and so on. The presence of C has not changed anything, but is just there as a reminder that it all depends on what is being assumed in the background. The equation states  a theorem that can be proved to be mathematically correct so it is – or should be – uncontroversial.

To a Bayesian, the entities A, B and C are logical propositions which can only be either true or false. The entities themselves are not blurred out, but we may have insufficient information to decide which of the two possibilities is correct. In this interpretation, P(A|C) represents the degree of belief that it is consistent to hold in the truth of A given the information C. Probability is therefore a generalization of the “normal” deductive logic expressed by Boolean algebra: the value “0” is associated with a proposition which is false and “1” denotes one that is true. Probability theory extends  this logic to the intermediate case where there is insufficient information to be certain about the status of the proposition.

A common objection to Bayesian probability is that it is somehow arbitrary or ill-defined. “Subjective” is the word that is often bandied about. This is only fair to the extent that different individuals may have access to different information and therefore assign different probabilities. Given different information C and C′ the probabilities P(A|C) and P(A|C′) will be different. On the other hand, the same precise rules for assigning and manipulating probabilities apply as before. Identical results should therefore be obtained whether these are applied by any person, or even a robot, so that part isn’t subjective at all.

In fact I’d go further. I think one of the great strengths of the Bayesian interpretation is precisely that it does depend on what information is assumed. This means that such information has to be stated explicitly. The essential assumptions behind a result can be – and, regrettably, often are – hidden in frequentist analyses. Being a Bayesian forces you to put all your cards on the table.

To a Bayesian, probabilities are always conditional on other assumed truths. There is no such thing as an absolute probability, hence my alteration of the form of Bayes’s theorem to represent this. A probability such as P(A) has no meaning to a Bayesian: there is always conditioning information. For example, if  I blithely assign a probability of 1/6 to each face of a dice, that assignment is actually conditional on me having no information to discriminate between the appearance of the faces, and no knowledge of the rolling trajectory that would allow me to make a prediction of its eventual resting position.

In tbe Bayesian framework, probability theory  becomes not a branch of experimental science but a branch of logic. Like any branch of mathematics it cannot be tested by experiment but only by the requirement that it be internally self-consistent. This brings me to what I think is one of the most important results of twentieth century mathematics, but which is unfortunately almost unknown in the scientific community. In 1946, Richard Cox derived the unique generalization of Boolean algebra under the assumption that such a logic must involve associated a single number with any logical proposition. The result he got is beautiful and anyone with any interest in science should make a point of reading his elegant argument. It turns out that the only way to construct a consistent logic of uncertainty incorporating this principle is by using the standard laws of probability. There is no other way to reason consistently in the face of uncertainty than probability theory. Accordingly, probability theory always applies when there is insufficient knowledge for deductive certainty. Probability is inductive logic.

This is not just a nice mathematical property. This kind of probability lies at the foundations of a consistent methodological framework that not only encapsulates many common-sense notions about how science works, but also puts at least some aspects of scientific reasoning on a rigorous quantitative footing. This is an important weapon that should be used more often in the battle against the creeping irrationalism one finds in society at large.

To see how the Bayesian approach provides a methodology for science, let us consider a simple example. Suppose we have a hypothesis H (some theoretical idea that we think might explain some experiment or observation). We also have access to some data D, and we also adopt some prior information I (which might be the results of other experiments and observations, or other working assumptions). What we want to know is how strongly the data D supports the hypothesis H given my background assumptions I. To keep it easy, we assume that the choice is between whether H is true or H is false. In the latter case, “not-H” or H′ (for short) is true. If our experiment is at all useful we can construct P(D|HI), the probability that the experiment would produce the data set D if both our hypothesis and the conditional information are true.

The probability P(D|HI) is called the likelihood; to construct it we need to have   some knowledge of the statistical errors produced by our measurement. Using Bayes’ theorem we can “invert” this likelihood to give P(H|DI), the probability that our hypothesis is true given the data and our assumptions. The result looks just like we had in the first two equations:

P(H|DI) = K^{-1}P(H|I)P(D|HI) .

Now we can expand the “normalising constant” K because we know that either H or H′ must be true. Thus

K=P(D|I)=P(H|I)P(D|HI)+P(H^{\prime}|I) P(D|H^{\prime}I)

The P(H|DI) on the left-hand side of the first expression is called the posterior probability; the right-hand side involves P(H|I), which is called the prior probability and the likelihood P(D|HI). The principal controversy surrounding Bayesian inductive reasoning involves the prior and how to define it, which is something I’ll comment on in a future post.

The Bayesian recipe for testing a hypothesis assigns a large posterior probability to a hypothesis for which the product of the prior probability and the likelihood is large. It can be generalized to the case where we want to pick the best of a set of competing hypothesis, say H1 …. Hn. Note that this need not be the set of all possible hypotheses, just those that we have thought about. We can only choose from what is available. The hypothesis may be relatively simple, such as that some particular parameter takes the value x, or they may be composite involving many parameters and/or assumptions. For instance, the Big Bang model of our universe is a very complicated hypothesis, or in fact a combination of hypotheses joined together,  involving at least a dozen parameters which can’t be predicted a priori but which have to be estimated from observations.

The required result for multiple hypotheses is pretty straightforward: the sum of the two alternatives involved in K above simply becomes a sum over all possible hypotheses, so that

P(H_i|DI) = K^{-1}P(H_i|I)P(D|H_iI),

and

K=P(D|I)=\sum P(H_j|I)P(D|H_jI)

If the hypothesis concerns the value of a parameter – in cosmology this might be, e.g., the mean density of the Universe expressed by the density parameter Ω0 – then the allowed space of possibilities is continuous. The sum in the denominator should then be replaced by an integral, but conceptually nothing changes. Our “best” hypothesis is the one that has the greatest posterior probability.

From a frequentist stance the procedure is often instead to just maximize the likelihood. According to this approach the best theory is the one that makes the data most probable. This can be the same as the most probable theory, but only if the prior probability is constant, but the probability of a model given the data is generally not the same as the probability of the data given the model. I’m amazed how many practising scientists make this error on a regular basis.

The following figure might serve to illustrate the difference between the frequentist and Bayesian approaches. In the former case, everything is done in “data space” using likelihoods, and in the other we work throughout with probabilities of hypotheses, i.e. we think in hypothesis space. I find it interesting to note that most theorists that I know who work in cosmology are Bayesians and most observers are frequentists!


As I mentioned above, it is the presence of the prior probability in the general formula that is the most controversial aspect of the Bayesian approach. The attitude of frequentists is often that this prior information is completely arbitrary or at least “model-dependent”. Being empirically-minded people, by and large, they prefer to think that measurements can be made and interpreted without reference to theory at all.

Assuming we can assign the prior probabilities in an appropriate way what emerges from the Bayesian framework is a consistent methodology for scientific progress. The scheme starts with the hardest part – theory creation. This requires human intervention, since we have no automatic procedure for dreaming up hypothesis from thin air. Once we have a set of hypotheses, we need data against which theories can be compared using their relative probabilities. The experimental testing of a theory can happen in many stages: the posterior probability obtained after one experiment can be fed in, as prior, into the next. The order of experiments does not matter. This all happens in an endless loop, as models are tested and refined by confrontation with experimental discoveries, and are forced to compete with new theoretical ideas. Often one particular theory emerges as most probable for a while, such as in particle physics where a “standard model” has been in existence for many years. But this does not make it absolutely right; it is just the best bet amongst the alternatives. Likewise, the Big Bang model does not represent the absolute truth, but is just the best available model in the face of the manifold relevant observations we now have concerning the Universe’s origin and evolution. The crucial point about this methodology is that it is inherently inductive: all the reasoning is carried out in “hypothesis space” rather than “observation space”.  The primary form of logic involved is not deduction but induction. Science is all about inverse reasoning.

Now, back to the anthropic principle. The point is that we can observe that life exists in our Universe and this observation must be incorporated as conditioning information whenever we try to make inferences about cosmological models if we are to reason consistently. In other words, the existence of life is a datum that must be incorporated in the conditioning information I mentioned above.

Suppose we have a model of the Universe M that contains various parameters which can be fixed by some form of observation. Let U be the proposition that these parameters take specific values U1, U2, and so on. Anthropic arguments revolve around the existence of life, so let L be the proposition that intelligent life evolves in the Universe. Note that the word “anthropic” implies specifically human life, but many versions of the argument do not necessarily accommodate anything more complicated than a virus.

Using Bayes’ theorem we can write

P(U|L,M)=K^{-1} P(U|M)P(L|U,M)

The dependence of the posterior probability P(U|L,M) on the likelihood P(L|U,M) demonstrates that the values of U for which P(L|U,M) is larger correspond to larger values of P(U|L,M); K is just a normalizing constant for the purpose of this argument. Since life is observed in our Universe the model-parameters which make life more probable must be preferred to those that make it less so. To go any further we need to say something about the likelihood and the prior. Here the complexity and scope of the model makes it virtually impossible to apply in detail the symmetry principles usually exploited to define priors for physical models. On the other hand, it seems reasonable to assume that the prior is broad rather than sharply peaked; if our prior knowledge of which universes are possible were so definite then we wouldn’t really be interested in knowing what observations could tell us. If now the likelihood is sharply peaked in U then this will be projected directly into the posterior distribution.

We have to assign the likelihood using our knowledge of how galaxies, stars and planets form, how planets are distributed in orbits around stars, what conditions are needed for life to evolve, and so on. There are certainly many gaps in this knowledge. Nevertheless if any one of the steps in this chain of knowledge requires very finely-tuned parameter choices then we can marginalize over the remaining steps and still end up with a sharp peak in the remaining likelihood and so also in the posterior probability. For example, there are plausible reasons for thinking that intelligent life has to be carbon-based, and therefore evolve on a planet. It is reasonable to infer, therefore, that P(U|L,M) should prefer some values of U. This means that there is a correlation between the propositions U and L in the sense that knowledge of one should, through Bayesian reasoning, enable us to make inferences about the other.

It is very difficult to make this kind of argument rigorously quantitative, but I can illustrate how the argument works with a simplified example. Let us suppose that the relevant parameters contained in the set U include such quantities as Newton’s gravitational constant G, the charge on the electron e, and the mass of the proton m. These are usually termed fundamental constants. The argument above indicates that there might be a connection between the existence of life and the value that these constants jointly take. Moreover, there is no reason why this kind of argument should not be used to find the values of fundamental constants in advance of their measurement. The ordering of experiment and theory is merely an historical accident; the process is cyclical. An illustration of this type of logic is furnished by the case of a plant whose seeds germinate only after prolonged rain. A newly-germinated (and intelligent) specimen could either observe dampness in the soil directly, or infer it using its own knowledge coupled with the observation of its own germination. This type, used properly, can be predictive and explanatory.

This argument is just one example of a number of its type, and it has clear (but limited) explanatory power. Indeed it represents a fruitful application of Bayesian reasoning. The question is how surprised we should be that the constants of nature are observed to have their particular values? That clearly requires a probability based answer. The smaller the probability of a specific joint set of values (given our prior knowledge) then the more surprised we should be to find them. But this surprise should be bounded in some way: the values have to lie somewhere in the space of possibilities. Our argument has not explained why life exists or even why the parameters take their values but it has elucidated the connection between two propositions. In doing so it has reduced the number of unexplained phenomena from two to one. But it still takes our existence as a starting point rather than trying to explain it from first principles.

Arguments of this type have been called Weak Anthropic Principle by Brandon Carter and I do not believe there is any reason for them to be at all controversial. They are simply Bayesian arguments that treat the existence of life as an observation about the Universe that is treated in Bayes’ theorem in the same way as all other relevant data and whatever other conditioning information we have. If more scientists knew about the inductive nature of their subject, then this type of logic would not have acquired the suspicious status that it currently has.