Archive for Theory of Everything

The Curious Case of Weinstein’s Theory

Posted in The Universe and Stuff with tags , , , , , on May 29, 2013 by telescoper

I’m late onto this topic, but that’s probably no bad thing given how heated it seems to have been. Most of you have probably heard that, last week,  Marcus du Sautoy (who is the Simonyi Professor for the Public Understanding of Science at the University of Oxford), wrote a lengthy piece in the Grauniad about some work by a friend of his, Eric Weinstein. The Guardian piece was headed

Eric Weinstein may have found the answer to physics’ biggest problems
A physicist has formulated a mathematical theory that purports to explain why the universe works the way it does – and it feels like ‘the answer’

I’m not sure whether du Sautoy wrote this heading or whether it was added by staff at the newspaper, but Weinstein is not actually working as a physicist; he has a PhD from Harvard in Mathematical Physics, right enough, but has been working for some time as an economics consultant. Anyway, Weinstein also presented his work in a two-hour lecture at the Mathematics Department at Oxford University. Unfortunately, it appears that few (if any) of Oxford’s physicists received an invitation to attend the lecture which, together with the fact that there isn’t an actual paper (not even a draft, unrefereed one) laying out the details, led to some rather scathing responses from Twitterland and Blogshire. Andrew Pontzen’s New Scientist blog piece is fairly typical. This talk was followed by a retraction of an allegation that physicists were not invited to the talk; it turns out the invitation was sent, but not distributed as widely as it should.

Anyway, what are we to make of this spat? Well, I think it would be very unfortunate if this episode led to the perception that physicists feel that only established academics can make breakthroughs in their own field. There are plenty of historical examples of non-physicists having great ideas that have dramatically changed the landscape of physics; Einstein himself wasn’t an academic when he did his remarkable work in 1905. I think we should all give theoretical ideas a fair hearing wherever they come from. And although Marcus du Sautoy is also not a physicist, he no doubt knows enough about physics to know whether Weinstein’s work is flawed at a trivial level. And even if it is wrong (which, arguably, all theories are) then it may well be wrong in a way that’s interesting, possibly precisely because it does come from outside the current mainstream (which, in my opinion, is too obsessed with string theory for its own good).

That aside, I do have a serious issue with the way Marcus du Sautoy used his media connections to plug some work that hasn’t even been submitted to, let alone passed, the gold standard of peer review. I can’t comment on the work because I wasn’t at the talk and there is no paper for me to study and form my own conclusions. The accompanying blog post isn’t enough to make an informed decision either. It may or not be brilliant. I assure you I have an open mind on that, but I don’t think it’s apppropriate for a Professor of Public Understand of Science to indulge in such hype. It reminds me of a recent episode involving another famous Oxford mathematician, Roger Penrose. Perhaps he’ll get together with Eric Weinstein and look for evidence supporting the new theory in the cosmic microwave background?

Don’t get me wrong. I don’t at all object to Weinstein being given an opportunity to air his work at a departmental seminar or colloquium. Actually, I wish more departmental talks were of a speculative and challenging nature, rather than just being rehashes of already published work. The problem with talking about work in progress, though, is (as I know from experience) is that if you talk too openly about ideas then someone quicker and cleverer than yourself can work out the details faster than you can; while it’s a bit frustrating when that happens, in the long run it’s good for science. Or so I tell myself. Anyway, the problem is not with that: it’s with airing this in the wider media inappropriately early, i.e. before it has received proper scrutiny. This could give the impression to the public that science is just a free-for-fall and that anyone’s ideas, however half-baked, are equally valid. That is irresponsible.

Anyway, that’s my take on this strange business. I’d be interested to hear other opinions through the comments box. Please bear in mind, however, that the word “defamation” has been bandied about, so be careful, and note that this piece expresses my opinion. That’s all.

All models are wrong

Posted in The Universe and Stuff with tags , , , , , , , , , on May 17, 2013 by telescoper

I’m back in Cardiff for the day, mainly for the purpose of attending presentations by a group of final-year project students (two of them under my supervision, albeit now remotely).  One of the talks featured a famous quote by the statistician George E.P. Box:

Essentially, all models are wrong, but some are useful.

I agree with this, actually, but only if it’s not interpreted in a way that suggests that there’s no such thing as reality and/or that science is just a game.  We may never achieve a perfect understanding of how the Universe works, but that’s not the same as not knowing anything at all. 

A familiar example that nicely illustrates my point  is the London Underground or Tube map. There is a fascinating website depicting the evolutionary history of this famous piece of graphic design. Early versions simply portrayed the railway lines inset into a normal geographical map which made them rather complicated, as the real layout of the lines is far from regular. A geographically accurate depiction of the modern tube network is shown here which makes the point:


A revolution occurred in 1933 when Harry Beck compiled the first “modern” version of the map. His great idea was to simplify the representation of the network around a single unifying feature. To this end he turned the Central Line (in red) into a straight line travelling left to right across the centre of the page, only changing direction at the extremities. All other lines were also distorted to run basically either North-South or East-West and produce a much more regular pattern, abandoning any attempt to represent the “real” geometry of the system but preserving its topology (i.e. its connectivity).  Here is an early version of his beautiful construction:

Note that although this a “modern” map in terms of how it represents the layout, it does look rather dated in terms of other design elements such as the border and typefaces used. We tend not to notice how much we surround the essential things with embellishments that date very quickly.

More modern versions of this map that you can get at tube stations and the like rather spoil the idea by introducing a kink in the central line to accommodate the complexity of the interchange between Bank and Monument stations as well as generally buggering about with the predominantly  rectilinear arrangement of the previous design:

I quite often use this map when I’m giving popular talks about physics. I think it illustrates quite nicely some of the philosophical issues related with theoretical representations of nature. I think of theories or models as being like maps, i.e. as attempts to make a useful representation of some  aspects of external reality. By useful, I mean the things we can use to make tests. However, there is a persistent tendency for some scientists to confuse the theory and the reality it is supposed to describe, especially a tendency to assert there is a one-to-one relationship between all elements of reality and the corresponding elements in the theoretical picture. This confusion was stated most succintly by the Polish scientist Alfred Korzybski in his memorable aphorism :

The map is not the territory.

I see this problem written particularly large with those physicists who persistently identify the landscape of string-theoretical possibilities with a multiverse of physically existing domains in which all these are realised. Of course, the Universe might be like that but it’s by no means clear to me that it has to be. I think we just don’t know what we’re doing well enough to know as much as we like to think we do.

A theory is also surrounded by a penumbra of non-testable elements, including those concepts that we use to translate the mathematical language of physics into everday words. We shouldn’t forget that many equations of physics have survived for a long time, but their interpretation has changed radically over the years.

The inevitable gap that lies between theory and reality does not mean that physics is a useless waste of time, it just means that its scope is limited. The Tube  map is not complete or accurate in all respects, but it’s excellent for what it was made for. Physics goes down the tubes when it loses sight of its key requirement, i.e. to be testable, and in order to be testable it has to be simple enough to calculate things to be compared with observations. In many cases that means a simplified model is perfectly adequete.

Another quote by George Box expands upon this point:

Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.

In any case, an attempt to make a grand unified theory of the London Underground system would no doubt produce a monstrous thing so unwieldly that it would be useless in practice. I think there’s a lesson there for string theorists too…

Many modern-day physicists are obsessed with the idea of a “Theory of Everything” (or TOE). Such a theory would entail the unification of all physical theories – all laws of Nature, if you like – into a single principle. An equally accurate description would then be available, in a single formula, of phenomena that are currently described by distinct theories with separate sets of parameters. Instead of textbooks on mechanics, quantum theory, gravity, electromagnetism, and so on, physics students would need just one book. But would such a theory somehow be  physical reality, as some physicists assert? I don’t think so. In fact it’s by no means clear to me that it would even be useful..

Bayes’ Razor

Posted in Bad Statistics, The Universe and Stuff with tags , , , , , , , , , on February 19, 2011 by telescoper

It’s been quite while since I posted a little piece about Bayesian probability. That one and the others that followed it (here and here) proved to be surprisingly popular so I’ve been planning to add a few more posts whenever I could find the time. Today I find myself in the office after spending the morning helping out with a very busy UCAS visit day, and it’s raining, so I thought I’d take the opportunity to write something before going home. I think I’ll do a short introduction to a topic I want to do a more technical treatment of in due course.

A particularly important feature of Bayesian reasoning is that it gives precise motivation to things that we are generally taught as rules of thumb. The most important of these is Ockham’s Razor. This famous principle of intellectual economy is variously presented in Latin as Pluralites non est ponenda sine necessitate or Entia non sunt multiplicanda praetor necessitatem. Either way, it means basically the same thing: the simplest theory which fits the data should be preferred.

William of Ockham, to whom this dictum is attributed, was an English Scholastic philosopher (probably) born at Ockham in Surrey in 1280. He joined the Franciscan order around 1300 and ended up studying theology in Oxford. He seems to have been an outspoken character, and was in fact summoned to Avignon in 1323 to account for his alleged heresies in front of the Pope, and was subsequently confined to a monastery from 1324 to 1328. He died in 1349.

In the framework of Bayesian inductive inference, it is possible to give precise reasons for adopting Ockham’s razor. To take a simple example, suppose we want to fit a curve to some data. In the presence of noise (or experimental error) which is inevitable, there is bound to be some sort of trade-off between goodness-of-fit and simplicity. If there is a lot of noise then a simple model is better: there is no point in trying to reproduce every bump and wiggle in the data with a new parameter or physical law because such features are likely to be features of the noise rather than the signal. On the other hand if there is very little noise, every feature in the data is real and your theory fails if it can’t explain it.

To go a bit further it is helpful to consider what happens when we generalize one theory by adding to it some extra parameters. Suppose we begin with a very simple theory, just involving one parameter p, but we fear it may not fit the data. We therefore add a couple more parameters, say q and r. These might be the coefficients of a polynomial fit, for example: the first model might be straight line (with fixed intercept), the second a cubic. We don’t know the appropriate numerical values for the parameters at the outset, so we must infer them by comparison with the available data.

Quantities such as p, q and r are usually called “floating” parameters; there are as many as a dozen of these in the standard Big Bang model, for example.

Obviously, having three degrees of freedom with which to describe the data should enable one to get a closer fit than is possible with just one. The greater flexibility within the general theory can be exploited to match the measurements more closely than the original. In other words, such a model can improve the likelihood, i.e. the probability  of the obtained data  arising (given the noise statistics – presumed known) if the signal is described by whatever model we have in mind.

But Bayes’ theorem tells us that there is a price to be paid for this flexibility, in that each new parameter has to have a prior probability assigned to it. This probability will generally be smeared out over a range of values where the experimental results (contained in the likelihood) subsequently show that the parameters don’t lie. Even if the extra parameters allow a better fit to the data, this dilution of the prior probability may result in the posterior probability being lower for the generalized theory than the simple one. The more parameters are involved, the bigger the space of prior possibilities for their values, and the harder it is for the improved likelihood to win out. Arbitrarily complicated theories are simply improbable. The best theory is the most probable one, i.e. the one for which the product of likelihood and prior is largest.

To give a more quantitative illustration of this consider a given model M which has a set of N floating parameters represented as a vector \underline\lambda = (\lambda_1,\ldots \lambda_N)=\lambda_i; in a sense each choice of parameters represents a different model or, more precisely, a member of the family of models labelled M.

Now assume we have some data D and can consequently form a likelihood function P(D|\underline{\lambda},M). In Bayesian reasoning we have to assign a prior probability P(\underline{\lambda}|M) to the parameters of the model which, if we’re being honest, we should do in advance of making any measurements!

The interesting thing to look at now is not the best-fitting choice of model parameters \underline{\lambda} but the extent to which the data support the model in general.  This is encoded in a sort of average of likelihood over the prior probability space:

P(D|M) = \int P(D|\underline{\lambda},M) P(\underline{\lambda}|M) d^{N}\underline{\lambda}.

This is just the normalizing constant K usually found in statements of Bayes’ theorem which, in this context, takes the form

P(\underline{\lambda}|DM) = K^{-1}P(\underline{\lambda}|M)P(D|\underline{\lambda},M).

In statistical mechanics things like K are usually called partition functions, but in this setting K is called the evidence, and it is used to form the so-called Bayes Factor, used in a technique known as Bayesian model selection of which more anon….

The  usefulness of the Bayesian evidence emerges when we ask the question whether our N  parameters are sufficient to get a reasonable fit to the data. Should we add another one to improve things a bit further? And why not another one after that? When should we stop?

The answer is that although adding an extra degree of freedom can increase the first term in the integral defining K (the likelihood), it also imposes a penalty in the second factor, the prior, because the more parameters the more smeared out the prior probability must be. If the improvement in fit is marginal and/or the data are noisy, then the second factor wins and the evidence for a model with N+1 parameters lower than that for the N-parameter version. Ockham’s razor has done its job.

This is a satisfying result that is in nice accord with common sense. But I think it goes much further than that. Many modern-day physicists are obsessed with the idea of a “Theory of Everything” (or TOE). Such a theory would entail the unification of all physical theories – all laws of Nature, if you like – into a single principle. An equally accurate description would then be available, in a single formula, of phenomena that are currently described by distinct theories with separate sets of parameters. Instead of textbooks on mechanics, quantum theory, gravity, electromagnetism, and so on, physics students would need just one book.

The physicist Stephen Hawking has described the quest for a TOE as like trying to read the Mind of God. I think that is silly. If a TOE is every constructed it will be the most economical available description of the Universe. Not the Mind of God.  Just the best way we have of saving paper.