## Yellow Stars, Red Stars and Bayesian Inference

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , on May 25, 2017 by telescoper

I came across a paper on the arXiv yesterday with the title Why do we find ourselves around a yellow star instead of a red star?’.  Here’s the abstract:

M-dwarf stars are more abundant than G-dwarf stars, so our position as observers on a planet orbiting a G-dwarf raises questions about the suitability of other stellar types for supporting life. If we consider ourselves as typical, in the anthropic sense that our environment is probably a typical one for conscious observers, then we are led to the conclusion that planets orbiting in the habitable zone of G-dwarf stars should be the best place for conscious life to develop. But such a conclusion neglects the possibility that K-dwarfs or M-dwarfs could provide more numerous sites for life to develop, both now and in the future. In this paper we analyze this problem through Bayesian inference to demonstrate that our occurrence around a G-dwarf might be a slight statistical anomaly, but only the sort of chance event that we expect to occur regularly. Even if M-dwarfs provide more numerous habitable planets today and in the future, we still expect mid G- to early K-dwarfs stars to be the most likely place for observers like ourselves. This suggests that observers with similar cognitive capabilities as us are most likely to be found at the present time and place, rather than in the future or around much smaller stars.

Athough astrobiology is not really my province,  I was intrigued enough to read on, until I came to the following paragraph in which the authors attempt to explain how Bayesian Inference works:

We approach this problem through the framework of Bayesian inference. As an example, consider a fair coin that is tossed three times in a row. Suppose that all three tosses turn up Heads. Can we conclude from this experiment that the coin must be weighted? In fact, we can still maintain our hypothesis that the coin is fair because the chances of getting three Heads in a row is 1/8. Many events with a probability of 1/8 occur every day, and so we should not be concerned about an event like this indicating that our initial assumptions are flawed. However, if we were to flip the same coin 70 times in a row with all 70 turning up Heads, we would readily conclude that the experiment is fixed. This is because the probability of flipping 70 Heads in a row is about 10-22, which is an exceedingly unlikely event that has probably never happened in the history of the universe. This
informal description of Bayesian inference provides a way to assess the probability of a hypothesis in light of new evidence.

Obviously I agree with the statement right at the end that Bayesian inference provides a way to assess the probability of a hypothesis in light of new evidence’. That’s certainly what Bayesian inference does, but this `informal description’ is really a frequentist rather than a Bayesian argument, in that it only mentions the probability of given outcomes not the probability of different hypotheses…

Anyway, I was so unconvinced by this description’ that I stopped reading at that point and went and did something else. Since I didn’t finish the paper I won’t comment on the conclusions, although I am more than usually sceptical. You might disagree of course, so read the paper yourself and form your own opinion! For me, it goes in the file marked Bad Statistics!

## Life as a Condition of Cosmology

Posted in The Universe and Stuff with tags , , , , , , , on November 7, 2015 by telescoper

Trigger Warnings: Bayesian Probability and the Anthropic Principle!

Once upon a time I was involved in setting up a cosmology conference in Valencia (Spain). The principal advantage of being among the organizers of such a meeting is that you get to invite yourself to give a talk and to choose the topic. On this particular occasion, I deliberately abused my privilege and put myself on the programme to talk about the “Anthropic Principle”. I doubt if there is any subject more likely to polarize a scientific audience than this. About half the participants present in the meeting stayed for my talk. The other half ran screaming from the room. Hence the trigger warnings on this post. Anyway, I noticed a tweet this morning from Jon Butterworth advertising a new blog post of his on the very same subject so I thought I’d while away a rainy November afternoon with a contribution of my own.

In case you weren’t already aware, the Anthropic Principle is the name given to a class of ideas arising from the suggestion that there is some connection between the material properties of the Universe as a whole and the presence of human life within it. The name was coined by Brandon Carter in 1974 as a corrective to the “Copernican Principle” that man does not occupy a special place in the Universe. A naïve application of this latter principle to cosmology might lead us to think that we could have evolved in any of the myriad possible Universes described by the system of Friedmann equations. The Anthropic Principle denies this, because life could not have evolved in all possible versions of the Big Bang model. There are however many different versions of this basic idea that have different logical structures and indeed different degrees of credibility. It is not really surprising to me that there is such a controversy about this particular issue, given that so few physicists and astronomers take time to study the logical structure of the subject, and this is the only way to assess the meaning and explanatory value of propositions like the Anthropic Principle. My former PhD supervisor, John Barrow (who is quoted in John Butterworth’s post) wrote the definite text on this topic together with Frank Tipler to which I refer you for more background. What I want to do here is to unpick this idea from a very specific perspective and show how it can be understood quite straightfowardly in terms of Bayesian reasoning. I’ll begin by outlining this form of inferential logic.

I’ll start with Bayes’ theorem which for three logical propositions (such as statements about the values of parameters in theory) A, B and C can be written in the form

$P(B|AC) = K^{-1}P(B|C)P(A|BC) = K^{-1} P(AB|C)$

where

$K=P(A|C).$

This is (or should be!)  uncontroversial as it is simply a result of the sum and product rules for combining probabilities. Notice, however, that I’ve not restricted it to two propositions A and B as is often done, but carried throughout an extra one (C). This is to emphasize the fact that, to a Bayesian, all probabilities are conditional on something; usually, in the context of data analysis this is a background theory that furnishes the framework within which measurements are interpreted. If you say this makes everything model-dependent, then I’d agree. But every interpretation of data in terms of parameters of a model is dependent on the model. It has to be. If you think it can be otherwise then I think you’re misguided.

In the equation,  P(B|C) is the probability of B being true, given that C is true . The information C need not be definitely known, but perhaps assumed for the sake of argument. The left-hand side of Bayes’ theorem denotes the probability of B given both A and C, and so on. The presence of C has not changed anything, but is just there as a reminder that it all depends on what is being assumed in the background. The equation states  a theorem that can be proved to be mathematically correct so it is – or should be – uncontroversial.

To a Bayesian, the entities A, B and C are logical propositions which can only be either true or false. The entities themselves are not blurred out, but we may have insufficient information to decide which of the two possibilities is correct. In this interpretation, P(A|C) represents the degree of belief that it is consistent to hold in the truth of A given the information C. Probability is therefore a generalization of the “normal” deductive logic expressed by Boolean algebra: the value “0” is associated with a proposition which is false and “1” denotes one that is true. Probability theory extends  this logic to the intermediate case where there is insufficient information to be certain about the status of the proposition.

A common objection to Bayesian probability is that it is somehow arbitrary or ill-defined. “Subjective” is the word that is often bandied about. This is only fair to the extent that different individuals may have access to different information and therefore assign different probabilities. Given different information C and C′ the probabilities P(A|C) and P(A|C′) will be different. On the other hand, the same precise rules for assigning and manipulating probabilities apply as before. Identical results should therefore be obtained whether these are applied by any person, or even a robot, so that part isn’t subjective at all.

In fact I’d go further. I think one of the great strengths of the Bayesian interpretation is precisely that it does depend on what information is assumed. This means that such information has to be stated explicitly. The essential assumptions behind a result can be – and, regrettably, often are – hidden in frequentist analyses. Being a Bayesian forces you to put all your cards on the table.

To a Bayesian, probabilities are always conditional on other assumed truths. There is no such thing as an absolute probability, hence my alteration of the form of Bayes’s theorem to represent this. A probability such as P(A) has no meaning to a Bayesian: there is always conditioning information. For example, if  I blithely assign a probability of 1/6 to each face of a dice, that assignment is actually conditional on me having no information to discriminate between the appearance of the faces, and no knowledge of the rolling trajectory that would allow me to make a prediction of its eventual resting position.

In tbe Bayesian framework, probability theory  becomes not a branch of experimental science but a branch of logic. Like any branch of mathematics it cannot be tested by experiment but only by the requirement that it be internally self-consistent. This brings me to what I think is one of the most important results of twentieth century mathematics, but which is unfortunately almost unknown in the scientific community. In 1946, Richard Cox derived the unique generalization of Boolean algebra under the assumption that such a logic must involve associated a single number with any logical proposition. The result he got is beautiful and anyone with any interest in science should make a point of reading his elegant argument. It turns out that the only way to construct a consistent logic of uncertainty incorporating this principle is by using the standard laws of probability. There is no other way to reason consistently in the face of uncertainty than probability theory. Accordingly, probability theory always applies when there is insufficient knowledge for deductive certainty. Probability is inductive logic.

This is not just a nice mathematical property. This kind of probability lies at the foundations of a consistent methodological framework that not only encapsulates many common-sense notions about how science works, but also puts at least some aspects of scientific reasoning on a rigorous quantitative footing. This is an important weapon that should be used more often in the battle against the creeping irrationalism one finds in society at large.

To see how the Bayesian approach provides a methodology for science, let us consider a simple example. Suppose we have a hypothesis H (some theoretical idea that we think might explain some experiment or observation). We also have access to some data D, and we also adopt some prior information I (which might be the results of other experiments and observations, or other working assumptions). What we want to know is how strongly the data D supports the hypothesis H given my background assumptions I. To keep it easy, we assume that the choice is between whether H is true or H is false. In the latter case, “not-H” or H′ (for short) is true. If our experiment is at all useful we can construct P(D|HI), the probability that the experiment would produce the data set D if both our hypothesis and the conditional information are true.

The probability P(D|HI) is called the likelihood; to construct it we need to have   some knowledge of the statistical errors produced by our measurement. Using Bayes’ theorem we can “invert” this likelihood to give P(H|DI), the probability that our hypothesis is true given the data and our assumptions. The result looks just like we had in the first two equations:

$P(H|DI) = K^{-1}P(H|I)P(D|HI) .$

Now we can expand the “normalising constant” K because we know that either H or H′ must be true. Thus

$K=P(D|I)=P(H|I)P(D|HI)+P(H^{\prime}|I) P(D|H^{\prime}I)$

The P(H|DI) on the left-hand side of the first expression is called the posterior probability; the right-hand side involves P(H|I), which is called the prior probability and the likelihood P(D|HI). The principal controversy surrounding Bayesian inductive reasoning involves the prior and how to define it, which is something I’ll comment on in a future post.

The Bayesian recipe for testing a hypothesis assigns a large posterior probability to a hypothesis for which the product of the prior probability and the likelihood is large. It can be generalized to the case where we want to pick the best of a set of competing hypothesis, say H1 …. Hn. Note that this need not be the set of all possible hypotheses, just those that we have thought about. We can only choose from what is available. The hypothesis may be relatively simple, such as that some particular parameter takes the value x, or they may be composite involving many parameters and/or assumptions. For instance, the Big Bang model of our universe is a very complicated hypothesis, or in fact a combination of hypotheses joined together,  involving at least a dozen parameters which can’t be predicted a priori but which have to be estimated from observations.

The required result for multiple hypotheses is pretty straightforward: the sum of the two alternatives involved in K above simply becomes a sum over all possible hypotheses, so that

$P(H_i|DI) = K^{-1}P(H_i|I)P(D|H_iI),$

and

$K=P(D|I)=\sum P(H_j|I)P(D|H_jI)$

If the hypothesis concerns the value of a parameter – in cosmology this might be, e.g., the mean density of the Universe expressed by the density parameter Ω0 – then the allowed space of possibilities is continuous. The sum in the denominator should then be replaced by an integral, but conceptually nothing changes. Our “best” hypothesis is the one that has the greatest posterior probability.

From a frequentist stance the procedure is often instead to just maximize the likelihood. According to this approach the best theory is the one that makes the data most probable. This can be the same as the most probable theory, but only if the prior probability is constant, but the probability of a model given the data is generally not the same as the probability of the data given the model. I’m amazed how many practising scientists make this error on a regular basis.

The following figure might serve to illustrate the difference between the frequentist and Bayesian approaches. In the former case, everything is done in “data space” using likelihoods, and in the other we work throughout with probabilities of hypotheses, i.e. we think in hypothesis space. I find it interesting to note that most theorists that I know who work in cosmology are Bayesians and most observers are frequentists!

As I mentioned above, it is the presence of the prior probability in the general formula that is the most controversial aspect of the Bayesian approach. The attitude of frequentists is often that this prior information is completely arbitrary or at least “model-dependent”. Being empirically-minded people, by and large, they prefer to think that measurements can be made and interpreted without reference to theory at all.

Assuming we can assign the prior probabilities in an appropriate way what emerges from the Bayesian framework is a consistent methodology for scientific progress. The scheme starts with the hardest part – theory creation. This requires human intervention, since we have no automatic procedure for dreaming up hypothesis from thin air. Once we have a set of hypotheses, we need data against which theories can be compared using their relative probabilities. The experimental testing of a theory can happen in many stages: the posterior probability obtained after one experiment can be fed in, as prior, into the next. The order of experiments does not matter. This all happens in an endless loop, as models are tested and refined by confrontation with experimental discoveries, and are forced to compete with new theoretical ideas. Often one particular theory emerges as most probable for a while, such as in particle physics where a “standard model” has been in existence for many years. But this does not make it absolutely right; it is just the best bet amongst the alternatives. Likewise, the Big Bang model does not represent the absolute truth, but is just the best available model in the face of the manifold relevant observations we now have concerning the Universe’s origin and evolution. The crucial point about this methodology is that it is inherently inductive: all the reasoning is carried out in “hypothesis space” rather than “observation space”.  The primary form of logic involved is not deduction but induction. Science is all about inverse reasoning.

Now, back to the anthropic principle. The point is that we can observe that life exists in our Universe and this observation must be incorporated as conditioning information whenever we try to make inferences about cosmological models if we are to reason consistently. In other words, the existence of life is a datum that must be incorporated in the conditioning information I mentioned above.

Suppose we have a model of the Universe M that contains various parameters which can be fixed by some form of observation. Let U be the proposition that these parameters take specific values U1, U2, and so on. Anthropic arguments revolve around the existence of life, so let L be the proposition that intelligent life evolves in the Universe. Note that the word “anthropic” implies specifically human life, but many versions of the argument do not necessarily accommodate anything more complicated than a virus.

Using Bayes’ theorem we can write

$P(U|L,M)=K^{-1} P(U|M)P(L|U,M)$

The dependence of the posterior probability P(U|L,M) on the likelihood P(L|U,M) demonstrates that the values of U for which P(L|U,M) is larger correspond to larger values of P(U|L,M); K is just a normalizing constant for the purpose of this argument. Since life is observed in our Universe the model-parameters which make life more probable must be preferred to those that make it less so. To go any further we need to say something about the likelihood and the prior. Here the complexity and scope of the model makes it virtually impossible to apply in detail the symmetry principles usually exploited to define priors for physical models. On the other hand, it seems reasonable to assume that the prior is broad rather than sharply peaked; if our prior knowledge of which universes are possible were so definite then we wouldn’t really be interested in knowing what observations could tell us. If now the likelihood is sharply peaked in U then this will be projected directly into the posterior distribution.

We have to assign the likelihood using our knowledge of how galaxies, stars and planets form, how planets are distributed in orbits around stars, what conditions are needed for life to evolve, and so on. There are certainly many gaps in this knowledge. Nevertheless if any one of the steps in this chain of knowledge requires very finely-tuned parameter choices then we can marginalize over the remaining steps and still end up with a sharp peak in the remaining likelihood and so also in the posterior probability. For example, there are plausible reasons for thinking that intelligent life has to be carbon-based, and therefore evolve on a planet. It is reasonable to infer, therefore, that P(U|L,M) should prefer some values of U. This means that there is a correlation between the propositions U and L in the sense that knowledge of one should, through Bayesian reasoning, enable us to make inferences about the other.

It is very difficult to make this kind of argument rigorously quantitative, but I can illustrate how the argument works with a simplified example. Let us suppose that the relevant parameters contained in the set U include such quantities as Newton’s gravitational constant G, the charge on the electron e, and the mass of the proton m. These are usually termed fundamental constants. The argument above indicates that there might be a connection between the existence of life and the value that these constants jointly take. Moreover, there is no reason why this kind of argument should not be used to find the values of fundamental constants in advance of their measurement. The ordering of experiment and theory is merely an historical accident; the process is cyclical. An illustration of this type of logic is furnished by the case of a plant whose seeds germinate only after prolonged rain. A newly-germinated (and intelligent) specimen could either observe dampness in the soil directly, or infer it using its own knowledge coupled with the observation of its own germination. This type, used properly, can be predictive and explanatory.

This argument is just one example of a number of its type, and it has clear (but limited) explanatory power. Indeed it represents a fruitful application of Bayesian reasoning. The question is how surprised we should be that the constants of nature are observed to have their particular values? That clearly requires a probability based answer. The smaller the probability of a specific joint set of values (given our prior knowledge) then the more surprised we should be to find them. But this surprise should be bounded in some way: the values have to lie somewhere in the space of possibilities. Our argument has not explained why life exists or even why the parameters take their values but it has elucidated the connection between two propositions. In doing so it has reduced the number of unexplained phenomena from two to one. But it still takes our existence as a starting point rather than trying to explain it from first principles.

Arguments of this type have been called Weak Anthropic Principle by Brandon Carter and I do not believe there is any reason for them to be at all controversial. They are simply Bayesian arguments that treat the existence of life as an observation about the Universe that is treated in Bayes’ theorem in the same way as all other relevant data and whatever other conditioning information we have. If more scientists knew about the inductive nature of their subject, then this type of logic would not have acquired the suspicious status that it currently has.

## The Eclipse Coincidence Question

Posted in The Universe and Stuff with tags , , , , on March 22, 2015 by telescoper

The day before last week’s (partial) solar eclipse I posted an item in which I mentioned the apparent coincidence that makes total eclipses possible, namely that the Moon and Sun have very similar angular sizes when seen from Earth.

In the interest of balance I thought I would direct you to a paper by Steve Balbus that develops a detailed argument to the contrary along the lines I described briefly in my earlier post. I am not entirely convinced but do read it and make up your own mind:

Here is the abstract:

The nearly equal lunar and solar angular sizes as subtended at the Earth is generally regarded as a coincidence. This is, however, an incidental consequence of the tidal forces from these bodies being comparable. Comparable magnitudes implies strong temporal modulation, as the forcing frequencies are nearly but not precisely equal. We suggest that on the basis of paleogeographic reconstructions, in the Devonian period, when the first tetrapods appeared on land, a large tidal range would accompany these modulated tides. This would have been conducive to the formation of a network of isolated tidal pools, lending support to A.S. Romer’s classic idea that the evaporation of shallow pools was an evolutionary impetus for the development of chiridian limbs in aquatic tetrapodomorphs. Romer saw this as the reason for the existence of limbs, but strong selection pressure for terrestrial navigation would have been present even if the limbs were aquatic in origin. Since even a modest difference in the Moon’s angular size relative to the Sun’s would lead to a qualitatively different tidal modulation, the fact that we live on a planet with a Sun and Moon of close apparent size is not entirely coincidental: it may have an anthropic basis.

I don’t know if it’s a coincidence or not but I always follow the advice given by my role model, Agatha Christie’s Miss Marple in Nemesis: “Any coincidence is worth noticing. You can throw it away later if it is only a coincidence.”..

## Doomsday is Cancelled…

Posted in Bad Statistics, The Universe and Stuff with tags , on November 25, 2014 by telescoper

Last week I posted an item that included a discussion of the Doomsday Argument. A subsequent comment on that post mentioned a paper by Ken Olum, which I finally got around to reading over the weekend, so I thought I’d post a link here for those of you worrying that the world might come to an end before the Christmas holiday.

You can find Olum’s paper on the arXiv here. The abstract reads (my emphasis):

If the human race comes to an end relatively shortly, then we have been born at a fairly typical time in history of humanity. On the other hand, if humanity lasts for much longer and trillions of people eventually exist, then we have been born in the first surprisingly tiny fraction of all people. According to the Doomsday Argument of Carter, Leslie, Gott, and Nielsen, this means that the chance of a disaster which would obliterate humanity is much larger than usually thought. Here I argue that treating possible observers in the same way as those who actually exist avoids this conclusion. Under this treatment, it is more likely to exist at all in a race which is long-lived, as originally discussed by Dieks, and this cancels the Doomsday Argument, so that the chance of a disaster is only what one would ordinarily estimate. Treating possible and actual observers alike also allows sensible anthropic predictions from quantum cosmology, which would otherwise depend on one’s interpretation of quantum mechanics.

I think Olum does identify a logical flaw in the argument, but it’s by no means the only one. I wouldn’t find it at all surprising to be among the first “tiny fraction of all people”, as my genetic characteristics are such that I could not be otherwise. But even if you’re not all that interested in the Doomsday Argument I recommend you read this paper as it says some quite interesting things about the application of probabilistic reasoning elsewhere in cosmology, an area in which quite a lot is written that makes no sense to me whatsoever!

## Insignificance

Posted in The Universe and Stuff with tags , , , , , , , on January 4, 2011 by telescoper

I’m told that there was a partial eclipse of the Sun visible from the UK this morning, although it was so cloudy here in Cardiff that I wouldn’t have seen anything even if I had bothered to get up in time to observe it. For more details of the event and pictures from people who managed to see it, see here. There’s also a nice article on the BBC website. The BBC are coordinating three days of programmes alongside a host of other events called Stargazing Live presumably timed to coincide with this morning’s eclipse. It’s taking a chance to do live broadcasts about astronomy given the British weather, but I hope they are successful in generating interest especially among the young.

As a spectacle a partial solar eclipse is pretty exciting – as long as it’s not cloudy – but even a full view of one can’t really be compared with the awesome event that is a total eclipse. I’m lucky enough to have observed one and I can tell you it was truly awe-inspiring.

If you think about it, though, it’s a very strange thing that such a thing is possible at all. In a total eclipse, the Moon passes between the Earth and the Sun in such a way that it exactly covers the Solar disk. In order for this to happen the apparent angular size of the Moon (as seen from Earth) has to be almost exactly the same as that of the Sun (as seen from Earth). This involves a strange coincidence: the Moon is small (about 1740 km in radius) but very close to the Earth in astronomical terms (about 400,000 km away). The Sun, on the other hand, is both enormously large (radius 700,000 km) and enormously distant (approx. 150,000,000 km).  The ratio of radius to distance from Earth of these objects is almost identical at the point of a a total eclipse, so the apparent disk of the Moon almost exactly fits over that of the Sun. Why is this so?

The simple answer is that it is just a coincidence. There seems no particular physical reason why the geometry of the Earth-Moon-Sun system should have turned out this way. Moreover, the system is not static. The tides raised by the Moon on the Earth lead to frictional heating and a loss of orbital energy. The Moon’s orbit  is therefore moving slowly outwards from the Earth. I’m not going to tell you exactly how quickly this happens, as it is one of the questions I set my students in the module Astrophysical Concepts I’ll be starting in a few weeks, but eventually the Earth-Moon distance will be too large for total eclipses of the Sun by the Moon to be possible on Earth, although partial and annular eclipses may still be possible.

It seems therefore that we just happen to be living at the right place at the right time to see total eclipses. Perhaps there are other inhabited moonless planets whose inhabitants will never see one. Future inhabitants of Earth will have to content themselves with watching eclipse clips on Youtube.

Things may be more complicated than this though. I’ve heard it argued that the existence of a moon reasonably close to the Earth may have helped the evolution of terrestrial life. The argument – as far as I understand it – is that life presumably began in the oceans, then amphibious forms evolved in tidal margins of some sort wherein conditions favoured both aquatic and land-dwelling creatures. Only then did life fully emerge from the seas and begin to live on land. If it is the case that the existence of significant tides is necessary for life to complete the transition from oceans to solid ground, then maybe the Moon played a key role in the evolution of dinosaurs, mammals, and even ourselves.

I’m not sure I’m convinced of this argument because, although the Moon is the dominant source of the Earth’s tides, it is not overwhelmingly so. The effect of the Sun is also considerable, only a factor of three smaller than the Moon. So maybe the Sun could have done the job on its own. I don’t know.

That’s not really the point of this post, however. What I wanted to comment on is that astronomers basically don’t question the interpretation of the occurence of total eclipses as simply a coincidence. Eclipses just are. There are no doubt many other planets where they aren’t. We’re special in that we live somewhere where something apparently unlikely happens. But this isn’t important because eclipses aren’t really all that significant in cosmic terms, other than that the law of physics allow them.

On the other hand astronomers (and many other people) do make a big deal of the fact that life exists in the Universe. Given what  we know about fundamental physics and biology – which admittedly isn’t very much – this also seems unlikely. Perhaps there are many other worlds without life, so the Earth is special once again. Others argue that the existence of life is so unlikely that special provision must have been made to make it possible.

Before I find myself falling into the black hole marked “Anthropic Principle” let me just say that I don’t see the existence of life (including human life) as being of any greater significance than that of a total eclipse. Both phenomena are (subjectively) interesting to humans, both are contingent on particular circumstances, and both will no doubt cease to occur at some point in perhaps not-too-distant the future. Neither tells us much about the true nature of the Universe.

Let’s face it. We’re just not significant.

## Ergodic Means…

Posted in The Universe and Stuff with tags , , , , , , on October 19, 2009 by telescoper

The topic of this post is something I’ve been wondering about for quite a while. This afternoon I had half an hour spare after a quick lunch so I thought I’d look it up and see what I could find.

The word ergodic is one you will come across very frequently in the literature of statistical physics, and in cosmology it also appears in discussions of the analysis of the large-scale structure of the Universe. I’ve long been puzzled as to where it comes from and what it actually means. Turning to the excellent Oxford English Dictionary Online, I found the answer to the first of these questions. Well, sort of. Under etymology we have

ad. G. ergoden (L. Boltzmann 1887, in Jrnl. f. d. reine und angewandte Math. C. 208), f. Gr.

I say “sort of” because it does attribute the origin of the word to Ludwig Boltzmann, but the greek roots (εργον and οδοσ) appear to suggest it means “workway” or something like that. I don’t think I follow an ergodic path on my way to work so it remains a little mysterious.

The actual definitions of ergodic given by the OED are

Of a trajectory in a confined portion of space: having the property that in the limit all points of the space will be included in the trajectory with equal frequency. Of a stochastic process: having the property that the probability of any state can be estimated from a single sufficiently extensive realization, independently of initial conditions; statistically stationary.

As I had expected, it has two  meanings which are related, but which apply in different contexts. The first is to do with paths or orbits, although in physics this is usually taken to meantrajectories in phase space (including both positions and velocities) rather than just three-dimensional position space. However, I don’t think the OED has got it right in saying that the system visits all positions with equal frequency. I think an ergodic path is one that must visit all positions within a given volume of phase space rather than being confined to a lower-dimensional piece of that space. For example, the path of a planet under the inverse-square law of gravity around the Sun is confined to a one-dimensional ellipse. If the force law is modified by external perturbations then the path need not be as regular as this, in extreme cases wandering around in such a way that it never joins back on itself but eventually visits all accessible locations. As far as my understanding goes, however, it doesn’t have to visit them all with equal frequency. The ergodic property of orbits is  intimately associated with the presence of chaotic dynamical behaviour.

The other definition relates to stochastic processes, i.e processes involving some sort of random component. These could either consist of a discrete collection of random variables {X1…Xn} (which may or may not be correlated with each other) or a continuously fluctuating function of some parameter such as time t, i.e. X(t) or spatial position (or perhaps both).

Stochastic processes are quite complicated measure-valued mathematical entities because they are specified by probability distributions. What the ergodic hypothesis means in the second sense is that measurements extracted from a single realization of such a process have a definition relationship to analagous quantities defined by the probability distribution.

I always think of a stochastic process being like a kind of algorithm (whose workings we don’t know). Put it on a computer, press “go” and it spits out a sequence of numbers. The ergodic hypothesis means that by examining a sufficiently long run of the output we could learn something about the properties of the algorithm.

An alternative way of thinking about this for those of you of a frequentist disposition is that the probability average is taken over some sort of statistical ensemble of possible realizations produced by the algorithm, and this must match the appropriate long-term average taken over one realization.

This is actually quite a deep concept and it can apply (or not) in various degrees.  A simple example is to do with properties of the mean value. Given a single run of the program over some long time T we can compute the sample average

$\bar{X}_T\equiv \frac{1}{T} \int_0^Tx(t) dt$

the probability average is defined differently over the probability distribution, which we can call p(x)

$\langle X \rangle \equiv \int x p(x) dx$

If these two are equal for sufficiently long runs, i.e. as T goes to infinity, then the process is said to be ergodic in the mean. A process could, however, be ergodic in the mean but not ergodic with respect to some other property of the distribution, such as the variance. Strict ergodicity would require that the entire frequency distribution defined from a long run should match the probability distribution to some accuracy.

Now  we have a problem with the OED again. According to the defining quotation given above, ergodic can be taken to mean statistically stationary. Actually that’s not true. ..

In the one-parameter case, “statistically stationary” means that the probability distribution controlling the process is independent of time, i.e. that p(x,t)=p(x,t+Δt) . It’s fairly straightforward to see that the ergodic property requires that a process X(t) be stationary, but the converse is not the case. Not every stationary process is necessarily ergodic. Ned Wright gives an example here. For a higher-dimensional process, such as a spatially-fluctuating random field the analogous property is statistical homogeneity, rather than stationarity, but otherwise everything carries over.

Ergodic theorems are very tricky to prove in general, but there are well-known results that rigorously establish the ergodic properties of Gaussian processes (which is another reason why theorists like myself like them so much). However, it should be mentioned that even if the ergodic assumption applies its usefulness depends critically on the rate of convergence. In the time-dependent example I gave above, it’s no good if the averaging period required is much longer than the age of the Universe; in that case even ergodicity makes it difficult to make inferences from your sample. Likewise the ergodic hypothesis doesn’t help you analyse your galaxy redshift survey if the averaging scale needed is larger than the depth of the sample.

Moreover, it seems to me that many physicists resort to ergodicity when there isn’t any compelling mathematical grounds reason to think that it is true. In some versions of the multiverse scenario, it is hypothesized that the fundamental constants of nature describing our low-energy turn out “randomly” to take on different values in different domains owing to some sort of spontaneous symmetry breaking perhaps associated a phase transition generating  cosmic inflation. We happen to live in a patch within this structure where the constants are such as to make human life possible. There’s no need to assert that the laws of physics have been designed to make us possible if this is the case, as most of the multiverse doesn’t have the fine tuning that appears to be required to allow our existence.

As an application of the Weak Anthropic Principle, I have no objection to this argument. However, behind this idea lies the assertion that all possible vacuum configurations (and all related physical constants) do arise ergodically. I’ve never seen anything resembling a proof that this is the case. Moreover, there are many examples of physical phase transitions for which the ergodic hypothesis is known not to apply.  If there is a rigorous proof that this works out, I’d love to hear about it. In the meantime, I remain sceptical.

## Multiversalism

Posted in The Universe and Stuff with tags , , on June 17, 2009 by telescoper

The word “cosmology” is derived from the Greek κόσμος (“cosmos”) which means, roughly speaking, “the world as considered as an orderly system”. The other side of the coin to “cosmos” is Χάος (“chaos”). In one world-view the Universe comprised two competing aspects: the orderly part that was governed by laws and which could (at least in principle) be predicted, and the “random” part which was disordered and unpredictable. To make progress in scientific cosmology we do need to assume that the Universe obeys laws. We also assume that these laws apply everywhere and for all time or, if they vary, then they vary in accordance with another law.  This is the cosmos that makes cosmology possible.  However, with the rise of quantum theory, and its applications to the theory of subatomic particles and their interactions, the field of cosmology has gradually ceded some of its territory to chaos.

In the early twentieth century, the first mathematical world models were constructed based on Einstein’s general theory of relativity. This is a classical theory, meaning that it describes a system that evolves smoothly with time. It is also entirely deterministic. Given sufficient information to specify the state of the Universe at a particular epoch, it is possible to calculate with certainty what its state will be at some point in the future. In a sense the entire evolutionary history described by these models is not a succession of events laid out in time, but an entity in itself. Every point along the space-time path of a particle is connected to past and future in an unbreakable chain. If ever the word cosmos applied to anything, this is it.

But as the field of relativistic cosmology matured it was realised that these simple classical models could not be regarded as complete, and consequently that the Universe was unlikely to be as predictable as was first thought. The Big Bang model gradually emerged as the favoured cosmological theory during the middle of the last century, between the 1940s and the 1960s. It was not until the 1960s, with the work of Hawking and Penrose, that it was realised that expanding world models based on general relativity inevitably involve a break-down of known physics at their very beginning. The so-called singularity theorems demonstrate that in any plausible version of the Big Bang model, all physical parameters describing the Universe (such as its density, pressure and temperature) all become infinite at the instant of the Big Bang. The existence of this “singularity” means that we do not know what laws if any apply at that instant. The Big Bang contains the seeds of its own destruction as a complete theory of the Universe. Although we might be able to explain how the Universe subsequently evolves, we have no idea how to describe the instant of its birth. This is a major embarrassment. Lacking any knowledge of the laws we don’t even have any rational basis to assign probabilities. We are marooned with a theory that lets in water.

The second important development was the rise of quantum theory and its incorporation into the description of the matter and energy contained within the Universe. Quantum mechanics (and its development into quantum field theory) entails elements of unpredictability. Although we do not know how to interpret this feature of the theory, it seems that any cosmological theory based on quantum theory must include things that can’t be predicted with certainty.

As particle physicists built ever more complete descriptions of the microscopic world using quantum field theory, they also realised that the approaches they had been using for other interactions just wouldn’t work for gravity. Mathematically speaking, general relativity and quantum field theory just don’t fit together. It might have been hoped that quantum gravity theory would help us plug the gap at the very beginning of the Universe, but that has not happened yet because there isn’t such a theory. What we can say about the origin of the Universe is correspondingly extremely limited and mostly speculative, but some of these speculations have had a powerful impact on the subject.

One thing that has changed radically since the early twentieth century is the possibility that our Universe may actually be part of a much larger “collection” of Universes. The potential for semantic confusion here is enormous. The Universe is, by definition, everything that exists. Obviously, therefore, there can only be one Universe. The name given to a Universe that consists of bits and pieces like this is the multiverse.

There are various ways a multiverse can be realised. In the “Many Worlds” interpretation of quantum mechanics there is supposed to be a plurality of versions of our Universe, but their ontological status is far from clear (at least to me). Do we really have to accept that each of the many worlds is “out there”, or can we get away with using them as inventions to help our calculations?

On the other hand, some plausible models based on quantum field theory do admit the possibility that our observable Universe is part of collection of mini-universes, each of which “really” exists. It’s hard to explain precisely what I mean by that, but I hope you get my drift. These mini-universes form a classical ensemble in different domains of a single-space time, which is not what happens in quantum multiverses.

According to the Big Bang model, the Universe (or at least the part of it we know about) began about fourteen billion years ago. We do not know whether the Universe is finite or infinite, but we do know that if it has only existed for a finite time we can only observe a finite part of it. We can’t possibly see light from further away than fourteen billion light years because any light signal travelling further than this distance would have to have set out before the Universe began. Roughly speaking, this defines our “horizon”: the maximum distance we are in principle able to see. But the fact that we can’t observe anything beyond our horizon does not mean that such remote things do not exist at all. Our observable “patch” of the Universe might be a tiny part of a colossal structure that extends much further than we can ever hope to see. And this structure might be not at all homogeneous: distant parts of the Universe might be very different from ours, even if our local piece is well described by the Cosmological Principle.

Some astronomers regard this idea as pure metaphysics, but it is motivated by plausible physical theories. The key idea was provided by the theory of cosmic inflation, which I have blogged about already. In the simplest versions of inflation the Universe expands by an enormous factor, perhaps 1060, in a tiny fraction of a second. This may seem ridiculous, but the energy available to drive this expansion is inconceivably large. Given this phenomenal energy reservoir, it is straightforward to show that such a boost is not at all unreasonable. With inflation, our entire observable Universe could thus have grown from a truly microscopic pre-inflationary region. It is sobering to think that everything galaxy, star, and planet we can see might from a seed that was smaller than an atom. But the point I am trying to make is that the idea of inflation opens up ones mind to the idea that the Universe as a whole may be a landscape of unimaginably immense proportions within which our little world may be little more than a pebble. If this is the case then we might plausibly imagine that this landscape varies haphazardly from place to place, producing what may amount to an ensemble of mini-universes. I say “may” because there is yet no theory that tells us precisely what determines the properties of each hill and valley or the relative probabilities of the different types of terrain.

Many theorists believe that such an ensemble is required if we are to understand how to deal probabilistically with the fundamentally uncertain aspects of modern cosmology. I don’t think this is the case. It is, at least in principle, perfectly possible to apply probabilistic arguments to unique events like the Big Bang using Bayesian inference. If there is an ensemble, of course, then we can discuss proportions within it, and relate these to probabilities too. Bayesians can use frequencies if they are available but do not require them. It is one of the greatest fallacies in science that probabilities need to be interpreted as frequencies.

At the crux of many related arguments is the question of why the Universe appears to be so well suited to our existence within it. This fine-tuning appears surprising based on what (little) we know about the origin of the Universe and the many other ways it might apparently have turned out. Does this suggest that it was designed to be so or do we just happen to live in a bit of the multiverse nice enough for us to have evolved and survived in?

Views on this issue are often boiled down into a choice between a theistic argument and some form of anthropic selection.  A while ago I gave a talk at a meeting in Cambridge called God or Multiverse? that was an attempt to construct a dialogue between theologians and cosmologists. I found it interesting, but it didn’t alter my view that science and religion don’t really overlap very much at all on this, in the sense that if you believe in God it doesn’t mean you have to reject the multiverse, or vice-versa. If God can create a Universe, he could create a multiverse to0. As it happens, I’m agnostic about both.

So having, I hope, opened up your mind to the possibility that the Universe may be amenable to a frequentist interpretation, I should confess that I think one can actually get along quite nicely without it.  In any case, you will probably have worked out that I don’t really like the multiverse. One reason I don’t like it is that it accepts that some things have no fundamental explanation. We just happen to live in a domain where that’s the way things are. Of course, the Universe may turn out to be like that –  there definitely will be some point at which our puny monkey brains  can’t learn anything more – but if we accept that then we certainly won’t find out if there is really a better answer, i.e. an explanation that isn’t accompanied by an infinite amount of untestable metaphysical baggage. My other objection is that I think it’s cheating to introduce an infinite thing to provide an explanation of fine tuning. Infinity is bad.