## A Mountain of Truth

I spent the last week at a conference in a beautiful setting amidst the hills overlooking the small town of Ascona by Lake Maggiore in the canton of Ticino, the Italian-speaking part of Switzerland. To be more precise we were located in a conference centre called the Centro Stefano Franscini on Monte Verità. The meeting was COSMOSTATS which aimed

… to bring together world-class leading figures in cosmology and particle physics, as well as renowned statisticians, in order to exchange knowledge and experience in dealing with large and complex data sets, and to meet the challenge of upcoming large cosmological surveys.

Although I didn’t know much about the location beforehand it turns out to have an extremely interesting history, going back about a hundred years. The first people to settle there, around the end of the 19th Century, were anarchists who had sought refuge there during times of political upheaval. The Locarno region had long been a popular place for people with “alternative” lifestyles. Monte Verità (“The Mountain of Truth”) was eventually bought by Henri Oedenkoven, the son of a rich industrialist, and he set up a sort of commune there at which the residents practised vegetarianism, naturism, free love and other forms of behaviour that were intended as a reaction against the scientific and technological progress of the time. From about 1904 onward the centre became a sanatorium where the discipline of psychoanalysis flourished and it later attracted many artists. In 1927, Baron Eduard Von dey Heydt took the place over. He was a great connoisseur of Oriental philosophy and art collector and he established a large collection at Monte Verità, much of which is still there because when the Baron died in 1956 he left Monte Verità to the local Canton.

Given the bizarre collection of anarchists, naturists, theosophists (and even vegetarians) that used to live in Monte Verità, it is by no means out of keeping with the tradition that it should eventually play host to a conference of cosmologists and statisticians.

The conference itself was interesting, and I was lucky enough to get to chair a session with three particularly interesting talks in it. In general, though, these dialogues between statisticians and physicists don’t seem to be as productive as one might have hoped. I’ve been to a few now, and although there’s a lot of enjoyable polemic they don’t work too well at changing anyone’s opinion or providing new insights.

We may now have mountains of new data in cosmology in particle physics but that hasn’t always translated into a corresponding mountain of truth. Intervening between our theories and observations lies the vexed question of how best to analyse the data and what the results actually mean. As always, lurking in the background, was the long-running conflict between adherents of the Bayesian and frequentist interpretations of probability. It appears that cosmologists -at least those represented at this meeting – tend to be Bayesian while particle physicists are almost exclusively frequentist. I’ll refrain from commenting on what this might mean. However, I was perplexed by various comments made during the conference about the issue of *coverage. *which is discussed rather nicely in some detail here. To me the question of of whether a Bayesian method has good frequentist coverage properties is completely irrelevant. Bayesian methods ask different questions (actually, ones to which scientists want to know the answer) so it is not surprising that they give different answers. Measuring a Bayesian method according to a frequentist criterion is completely pointless whichever camp you belong to.

The irrelevance of coverage was one thing that the previous residents knew better than some of the conference guests:

I’d like to thank Uros Seljak, Roberto Trotta and Martin Kunz for organizing the meeting in such a picturesque and intriguing place.

August 1, 2009 at 8:26 pm

I had an old-fashioned stats upbringing. If you found yourself using probabilities that couldnt be verified using repeat experiments then you were on dangerous ground. Now this criticism ocasionally affects frequentist statistics too, particularly in cosmology. But am still amazed at how non-experimental Bayesian parameter probability distributions, particularly priors, have taken root in astronomy. The frequentist approach may be limited in its aims but in statistical physics the connection of probabilities to repeat experiments would seem particularly crucial.

I assume someone has tried to apply Bayesian inference to fundamental quantum mechanics – how did that work out?

August 1, 2009 at 9:40 pm

Tom,

I couldn’t agree less. For a start you need an infinite number of experiments if you’re going to have frequentist verification and you never have that. More importantly, Bayesian reasoning is logically consistent, something that frequentist methods aren’t guaranteed to be. I heard several talks at COSMOSTATS from particle physicists saying that the big problem with their (frequentist) methods is that they don’t know how to combine results from different experiments….something that is conceptually trivial from a Bayesian point of view.

There has been work on Bayesian probability in quantum mechanics but it isn’t a completed project by any means. Entanglement, for example, isn’t easy to describe with straightforward extensions of probability theory: it requires a more complicated calculus.

Peter

August 2, 2009 at 11:07 pm

It just seemed that some of the links above seem to argue that if the real meaning of frequentist confidence intervals were known then more people would be Bayesians. Maybe its just as vacuous to argue that if the non-experimental nature of some Bayesian probabilities were more appreciated then more astronomers would be frequentists! Using probability distributions to describe repeated physical experiments has a clear and consistent, if limited, meaning. But where are the experiments that define the Bayesian probability that a model is correct? Are you saying that the lack of such an experimental basis is irrelevant?

Slightly tangentially – if the only Met Office response when asked what they meant when they said there was a 66% chance of a good summer was “Well there are 3 of us here and 2 of us think …” it would have less physics experimental credibility than a response which involved an analysis of the repeat year statistics. But defining probability by “hunches” seems to be fundamental to the Bayesian idea that quantitative probabilities can be applied to models as well as data.

Am in Rio for the IAU GA at the moment – forgive me if I am distracted from the “old chestnuts” that I have given another airing here…

August 3, 2009 at 1:50 pm

In the Bayesian interpretation, probabilities represent the degree of belief that it is reasonable to hold given the available data. They are the unique consistent generalisation of Boolean Algebra (in which True = 1 and False =0) to the intermediate case where there is insufficient evidence to be definite. They don’t pretend to be “measurable” in the sense that you seem to imply frequentist probabilities are, but they are logically consistent.

I think physicists are interested in the question of what is the probability of the model M given the data D. Frequentists can’t answer that question and instead pretend it is the same as the probability of the data D given the model M over a large number of imaginary trials. If you like that definition, consider the following example. The probability of a randomly-selected woman being pregnant is 2%, i.e. P(pregnant|woman)=0.02.

What is the probability of a randomly-selected pregnant person being a woman, i.e. P(woman|pregnant)?

August 3, 2009 at 4:34 pm

Peter (and Anton, if you’re lurking), you may have seen this already, but CricInfo has managed to combine two of your main posting topics (cricket and probability) in one really nice plot. There’s currently a link to “Hawk-Eye” at the top of the CricInfo “live commentary” page for the current Test; clicking on this launches a window with lots of plots and graphics, most of which have to do with bowlers’ pitch-maps and the like. But if you keep clicking through you eventually get to a graph which plots the “likelihood” of a win, loss or draw as a function of time for the match. Obviously they meant to label the axis “probability”, but no matter; the key thing is that it’s clearly a Bayesian plot (even if they haven’t realised this), which evolves slowly as partnerships build, then jags the other way when a wicket falls, and also changes when, as was the case on Saturday, a day is washed out.

Aside from it being interesting to cricket-loving scientists, it’s also highly relevant to this sort of discussion about what probability means. I suspect that almost every human being has the correct intuitive sense of this quantity as presented here. Certainly anyone who bets regularly should understand it. All of which makes it doubly intriguing to me that so many people with serious mathematical and/or statistical training have disowned this definition. Even if one needs two different words to distinguish “the long-run frequency of an event given a large number of identical trials” from “the degree to which some pieces of information imply a hypothesis is true”, the Bayesian axioms makes the latter such a well-defined quantity that it’s at very least useful.

August 5, 2009 at 10:11 am

[…] of the Clerihews! As a result of an after-dinner discussion at the meeting I attended last week, I’ve decided to put a revised cosmological clerihew collection back […]

August 5, 2009 at 7:25 pm

Tom: We freely talk about the probability of the amount of mass in the universe – no question of a frequency distribution there, as we live in one universe. So what does ‘probability’ mean? It is a number representing how strongly one proposition implies another. More formally, p(A|B) is a measure of how strongly the binary proposition A is implied to be true upon supposing that B is true, according to the ontological relations known between the referents of A and B. (The Boolean calculus of propositions then induces an algebra for the p’s, which turns out to be the sum and product rules). You can never “verify a probability” (as you put it) by doing an experiment.

Peter: “Entanglement… isn’t easy to describe with straightforward extensions of probability theory: it requires a more complicated calculus.” News to me! Probability is the logic that we humans use to think, so we simply cannot dispense with it – if it gives counter-intuitive results then we must (1) acknowledge that we are missing some crucial conditioning information, and/or (2) see those results as telling us something unexpected about the universe.

Daniel: Brilliant; I’ll look it up on Cricinfo. I’d like to see error bars on predictive Hawkeye for judging lbws, ie visualise a cone widening from the point of contact of the ball with the pad to the stumps. But I think that might be too much for some viewers…

Anton

August 6, 2009 at 2:06 pm

Anton: The picture of a HawkEye error cone is pretty much what I have in my head when I see those graphics. And actually I think most cricket-watchers would get what was going on here. The main reason for that is the familiarity with the idea that it’s harder for a batsmen to be given out LBW if he (or she) has come forward and so the “error cone” projected all the way to the stumps is bigger.

August 6, 2009 at 4:02 pm

Daniel: Do you know if predictive HawkEye has been tested by feeding into the computer only trajectory information from where the ball pitches to a couple of feet beyond that, and seeing if Hawkeye consistently replicates the trajectory as far as the plane of the stumps when no batsman is present?

Please could you post the URL to which you referred?

Anton

August 6, 2009 at 4:59 pm

Did I say that?

August 6, 2009 at 5:20 pm

I remember noting it Peter (abd the passive), and will apologise on this blog and buy you copious beer if I’m wrong; but it would be a hassle to find where, given the way the blog appears to contributors/readers. Can you as blogger do a global hunt for the word “designed” which I’m pretty sure appeared?

Anton

August 6, 2009 at 5:26 pm

I appear to have replied to the wrong message so these comments are on the wrong thread. I’ll see if I can figure out how to move them.

August 7, 2009 at 10:06 am

Anton: I believe the HawkEye error bar of ~5mm for a typical LBW was arrived at by cutting the data 2m from the stumps and comparing the extrapolation with the real subsequent trajectory. However, I also believe that this was only done using a bowling machine in the nets (with those yellow, dimpled balls). It’s still probably about right, but given they’ve got thousands of real Test examples of balls going through to the ‘keeper, I think they should use them. In particular I’d like them to show the tracks diverging when, say, there’s late swing after the ball passes the batsman. In short they’ve i) not done the best test they esily could and ii) haven’t presented nearly enough of these results.

You also ask about a URL, which I assume is the “result probability as a function of time” plot. If so there’s no direct link; you have to go to the CricInfo match commenary page, click on the HawkEye link at the top and then, in the spawned window, click through ’til you get the relevant graphic. (Or you can “customize” the HawkEye outputs to show just that graphic.)

August 8, 2009 at 11:23 pm

Have now given the talk (average!) and visited Sugar Loaf (brilliant!) and so a few Caipirinhas later I return to the “old chestnuts”.

Peter – the pregnant woman example seems to be somewhat inaccurately aimed at frequentist confidence intervals where a 68% confidence interval means that the confidence interval will contain the true value 68% of the time in repeat experiments. Bayesians want the confidence interval to mean that the probability that the true value lies in the interval is 68% which seems more intuitive. But this latter definition unfortunately involves inventing a probability for, say, the true mean, mu, of a Gaussian. But how do we go about defining that probablility? In the standard frequentist method an estimator of the mean say x_bar is set up whose statistical properties can be calculated from a hypothesised (but still experimentally based) probabilty function with true mean mu. So in the example of an industrial process, the known probabilities tells us how often the batch average in repeat experiments will differ from the true value. Nothing more and nothing less but at least it is a well defined process. The aims of the Bayesian route are of course much more exciting and ambitious but involve losing contact with the experimental, objective definition of probability. This is why I think physicists in particular should feel uncomfortable about the subjective Bayesian approach to statistics and probability

More time might involve recalling that frequentists also have a similarly well defined procedure to test hypotheses about the true parameters of a probability distribution, as well as for defining confidence intervals…

In terms of googling “bayesian quantum mechanics” I came across the line “In addition to the widespread applications of these (bayesian – ts) techniques to quantum control and computation, they may offer evidence in favor of a subjectivist, rather than material interpretation of the quantum state, as a state of knowledge.” Strong stuff – but as long as we know what we’re getting into!

Anton – I agree cosmology has its own set of issues but since I am not too keen on the multiverse, have stuck with the more basic issues for now!

Back to the increasingly frequentist Caipirinhas!

August 9, 2009 at 2:57 am

And Daniel –

“the key thing is that it’s clearly a Bayesian plot (even if they haven’t realised this), which evolves slowly as partnerships build, then jags the other way when a wicket falls, and also changes when, as was the case on Saturday, a day is washed out.”

Well it may be Bayesian – but it depends how they do it. If they look at previous matches that went like this one at each stage and got the frequency of the 3 results then it would be frequentist. If they phone a bookie and he gives his opinion then it might indeed be Bayesian! But just because a probability varies with time doesnt make it Bayesian, I fear!

The Duckworth-Lewis formula from which they set run targets in some rain-affected cricket matches uses the experimental data from previous matches, I believe. This would seem closer to good, solid, frequentist stats practice!

Tom Shanks endorses the Brazilian brands “Caipirinha” and “Copacabana Beach Wi-Fi Inc.”

August 9, 2009 at 11:35 am

Tom,

I don’t agree with you about the Duckworth-Lewis method. The method uses information from previous matches but also incorporates a model (involving the “resources” the batting side has in terms of overs left and wickets standing). The parameters of this model are fixed using information from previous matches, but that’s not what “frequentist” means. What matters is whether you update the probabilities consistently as new information becomes available. If you do this, you’re Bayesian. If you don’t, you might be frequentist or wrong. Or both.

Bayesian reasoning is only subjective in the sense that it is based on the information one has available. Two people with access to different information can come to different inferences. A bookie probably does have more information than a punter…

I repeat my objection to your statement that frequentist probabilities are “experimental”. They are not. There are no infinite ensembles and if you do exactly the same experiment twice you will get the same answer twice. What varies in real situations are variables of which you have no knowledge and can’t control. However, I suspect that I could say this an infinite number of times and it still wouldn’t change your mind.

Peter

August 9, 2009 at 10:23 pm

Peter,

Well, maybe I was descending too much into cartoons in previous answer to Daniel. But I would still say that the main distinguishing feature of the Bayesian approach is the use of priors rather than eg the updating of probability models in the light of new data. A frequentist presented with the fluctuating state of a cricket match can legitimately update his underlying hypothesised “true” model at any given point, and then proceed to present, say, a predicted mean result with a confidence interval, based on the previous experimental record of other games. It might be slightly messy but it avoids the need to invent a prior in the Bayesian approach which in turn needs the introduction of a more subjective definition of probability.

Am not so definitively against the Bayesian approach as you imply. But its just that its almost the establishment approach in astronomy at the moment and often its presented without any caveat at all. It would actually be interesting to hear your version of what you think the Bayesian caveat should be!

Am also thinking that taking on the pillars of Bayesian community during my stay here in Rio – may be a strategic rather than just a tactical mistake!

August 10, 2009 at 10:49 pm

Tom: Why do you think it is weakness of Bayesian methods, rather than a strength, that they take prior information into account? Don’t you want your inferences to be based on all relevant information? It remains a challenge to learn how to codify many types of prior information into a prior distribution, of course, but plenty of progress has been made.

Consider the situation in which you know the exact value of a continuous parameter, and you are measuring it (in an inevitably noisy experiment) only because your boss has told you to. Your prior for that parameter is a delta-function peaked at the exact value. Yet any frequentist method applied to your data will give nonzero probability that the parameter takes some other value, which you know is impossible. How can you be happy about that? Bayes’ theorem, in contrast, always causes a delta-function prior to generate a delta-function posterior (and assigns all deviations to experimental error), just as you want.

Anton

August 11, 2009 at 1:59 am

Anton – I like this example! The frequentist then goes and makes the experiment and reports back that his estimate of the parameter value excludes the “exact” value by 30 standard deviations. As physicists, whose estimate of the probability that the “exact” value is statistically consistent with the experimental data do we take more seriously, the Bayesian or the frequentist?

Since the primacy of the experimental result over any pre-conceived notion is to me almost what defines a physicist, we then must take the frequentist probability more seriously. After all who knows by what process the prior probability was defined?

This is the heart of the matter!

August 11, 2009 at 7:08 am

Tom,

Is that why you want a Hubble constant of <30?

Peter

August 11, 2009 at 10:35 am

Tom: If your prior info is that the parameter definitely takes some value known to you, it’s irrelevant how you got that. Information is information. But, for the sake of argument, suppose you know the value of the parameter because you have devised a brilliant new experimental technique for measuring it, with negligible noise – but your boss thinks you are a crackpot and insists you measure it the old way, with apparatus that is ridden with noise.

Immediately before you do what your boss says, your prior for the value of the parameter is essentially a delta-function. If you are a Bayesian, Bayes’ theorem will then give you a delta-function posterior and all the errors in your measurement will be assigned to noise – as they must be if you know the actual value. This coincides perfectly with intuition. But any frequentist method will give you a broader distribution for the parameter value, mostly where you *know* it cannot be. That constitutes a reductio ad absurdum knockdown of the frequentist toolkit.

What you are saying is this: Aha, suppose I was wrong at the start in my certainty over the parameter value. But that is a different problem! To handle that, you make your prior a bit broader than a delta function, and Bayes’ theorem will then tell you how the peak in the likelihood and the peak in the prior combine. But let’s keep it simple – please consider the problem I’ve set you, not a different one.

Anton

August 11, 2009 at 1:55 pm

Peter – we’re discussing the strategy here and not the tactics! As you know, errors are frequently underestimated whatever the methodology. Although I very much enjoyed the Gruber prize being awarded here last week to the H_0 Key Project team – (honest!) – I couldnt help but notice their quoted 7% error was not so different from Hubble’s 10% error on his original value of 500 kms-1Mpc-1. So there may still be hope for my simpler model!

Anton – I understood the question you posed and I thought I answered it!

Blimey – this is beginning to affect my Caipirinha rate! Wont say in which direction!

August 11, 2009 at 2:31 pm

Tom: Let’s agree to differ on that and go a step further back. You consider my original problem, and then ask: What if the likelihood is peaked 30sigma away from my delta prior?

Two responses:

1. What if it’s not? Do you take my point in that case?

2. If you are ABSOLUTELY CERTAIN of that prior value then it doesn’t matter how many sigma away the likelihood peaks – the discrepancy is always due to noise and you have simply got yourself a dataset you never remotely expected. That is what certainty *means*, and it is part of the problem I am setting up.

Anton

August 11, 2009 at 3:04 pm

PS To my last piece: Please also consider my earlier questions, ie Why do you think it is weakness of Bayesian methods, rather than a strength, that they take prior information into account? Don’t you want your inferences to be based on all relevant information?

Hadn’t heard of caipirinhas before, are they like daquiris?

Anton

August 12, 2009 at 3:55 am

Anton – I know the point that you are making but I dont regard it as a reductio ad absurdum of frequentism. In the frequentist case the parameter will still be estimated and have an associated error, guided only by the data. But in the Bayesian case a wrong subjective prior will always skew both the parameter estimate and its error. So in your example, whose prior do we choose to use, my delta function or that of my boss who is clearly not so certain? And why? This subjectivity can skew results in a way that a given experiment cannot retrieve and on balance I think that this danger outweighs any other apparent advantage of Bayesianism for physicists.

I still regard my previous reply as also being right to the point.

“Caipirinha (Portuguese pronunciation: [kajpiˈɾĩɲɐ]) is Brazil’s national cocktail, made with cachaça (pronounced [kaˈʃasɐ]), sugar and lime. Cachaça is Brazil’s most common distilled alcoholic beverage. Like rum, it is made from sugarcane. Cachaça is made from sugarcane alcohol, obtained from the fermentation of sugarcane juice which is afterwards distilled.”

But am starting to worry how these Caipirinhas are fitting into my calorie controlled diet!

August 12, 2009 at 9:18 am

Tom,

“whose prior do we choose to use, my delta function or that of my boss who is clearly not so certain?”

You choose yours and he chooses his, which will be broader. You and he are in possession of different (prior) information, so your inferences will be different even though both of you can be reasoning correctly. Just as somebody put down in a desert assigns low probability to rain, but if the same man has seen a weather chart showing a rare desert storm approaching then he assigns it a high probability – reasoning correctly in both cases.

Of course, I agree that a wrong prior will lead to a wrong posterior. Bayes’ theorem says so too. What you seem to be suggesting is that there is no such thing as a right or wrong prior and that Bayesians just choose whatever they like. That is not the case – if you are certain that a parameter takes value 5.3 then you are obviously wrong to assign it a delta-function prior peaked at 6.7, for example.

Bayesians do not know how to translate prior information into a prior distribution in all cases. But we do know it in many cases (symmetry principles are a great help), and research continues.

Any general method that gives a wrong answer in a particular problem is wrong, including frequentist methods in my delta example.

Let’s not get too hung up on the fact that this is about ‘prior’ information. One man’s experimental data is another man’s prior info as science progresses. Don’t you want your inferences to be based on all relevant information?

Anton

PS All that I say comes from a particular school of Bayesianism. There *are* a few people calling themselves Bayesian to whom your criticism applies, though they are mostly philosophers rather than scientists who actually deal with data to get answers they want. Certain frequentists have also stolen the B-word to denote a particular way of constructing a ‘statistic’. When I see a new stats textbook with ‘Bayesian’ in the title I look for RT Cox (not DR Cox!) in the index and references to see if it’s in line with the view I am arguing for here – RT Cox derived the sum and product rules (of which Bayes’ theorem is a corollary) by the means stated in my posting on this thread of August 5th, 7.25pm.

Try rum and tonic water, it’s another drink with a ‘tropical’ taste and a lot easier to make (and not so overly sweet as rum-and-coke).

August 13, 2009 at 1:58 am

“You choose yours and he chooses his, which will be broader. You and he are in possession of different (prior) information, so your inferences will be different even though both of you can be reasoning correctly. ”

Well i appreciate the directness of statements such as the above. In physics we are supposed to be dealing with objective interpretation of data and I doubt the value of the subjective probabilistic interpretation which you seem so easily to accept. Correct reasoning based on false premises (aka priors) will lead to systemically wrong answers!

My Stats MSc thesis adviser at Imperial College many moons ago was Prof DR Cox. The pros and cons of Bayesian statistics were fairly dealt with, mostly under the previous title of Decision Theory, as I recall. But the value of objectivity in statistics and science was well appreciated by David Cox.

One last try – how useful would the Pope have found subjective Bayesian priors in his arguments with Galileo about heliocentrism? Your view gives the impression that objectivity is an optional extra in a scientific debate – I dont think that it is!

August 13, 2009 at 10:42 am

Tom: In physics we are certainly dealing with objective interpretation of the data. But which data? When new data come in, this can cause scientists to change their mind. (Bayes’ theorem is how to update your knowledge, and it follows from the sum and product rules which you presumably accept. The word ‘update’ implies a prior state, incidentally.) The first scientists to update themselves are the experimentalists involved. Then they write a paper. Then other scientists read it and change their own minds. Because each scientist is reasoning correctly, consensus is reached – although at any moment of time there will be some scientists with differing conclusions, because not all have yet received the update.

For the avoidance of doubt I wasn’t criticising DR Cox, but pre-empting the usual response when I mention RT Cox (ie, “you’ve got his initials wrong,” since DR is better known than RT).

Re subjectivity and objectivity – your criticisms apply to some philosophers, but not to the school of Bayesians from which my own comments come. Consider that what I am doing is designing a robot to perform reasoning under uncertainty. I therefore need a set of complete and consistent rules; if I can find these rules and program the robot with them then there is no subjectivity. Information is fed into this robot and it spits out a reply, although of course with different data it is liable to give different replies, like the man in my desert example (this thread, August 12th, 0918). Bayes’ theorem is the way to update a prior distribution so as to obtain a posterior distribution in the light of new data; it tells us to multiply the prior by the likelihood and then renormalise, and it follows from the (uncontested) sum and product rules. The likelihod is the sampling distribution, as both Bayesians and frequentists agree. That leaves the prior…

You are in effect saying that there is no such thing as “THE prior”. To the extent that differing people may start with differing relevant prior knowledge, I agree; but to each state of prior knowledge I do contend that there is a unique prior distribution. Strip away each person’s relevant prior knowledge by applying Bayes’ theorem in reverse and you will eventually end up with a unanimity in which everybody is in a state of total ignorance. Codify *that* into a prior, then go forward with Bayes’ theorem, and you are up and running with a fully objective reasoning robot.

How then to find priors? We don’t know in all cases, but at least we are making progress. Complete ignorance of where a bead is, threaded on a circular hoop, obviously gives a uniform prior density around the hoop – by symmetry. (Formally, you are just as ignorant if you translate the bead, so that

P(theta) = P(theta + z)

which is a functional equation whose solution is P = constant = 1/(2pi) by normalisation.) Symmetry arguments are valuable in assigning priors. Conversely, absolutely certain knowledge of where the bead is located corresponds to a delta-function prior at that location/theta.

Two questions, which you do not appear to have responded to yet:

Don’t you agree that a method which gives a wrong answer on one problem (ie, frequentist methods in the case when you know the answer beforehand) cannot be trusted on other problems?

Don’t you want your scientific inferences to be based on all available relevant information? Partitioning that information into ‘prior’ and ‘data’ is artificial, because one man’s data is the next man’s prior.

Anton

August 13, 2009 at 1:59 pm

Tom – Would it help if I said that I am not contending for the meaning of the word ‘probability,’ but am interested in how strongly my information (prior plus data) imply the answer to the question I am interested in? I (following RT Cox) can show that this quantity obeys the sum and product rules, but if you wish not to call it probability then that’s not an underlying issue of contention as far as I’m concerned. The quantity I call “strength of implication” quantity is what one *actually wants* in all problems involving uncertainty.

Anton

August 13, 2009 at 9:58 pm

To your 2 questions – the frequentist method doesnt give a “wrong answer” in the case you describe – it just gives an estimate and an error which may be bigger than a previous experiment. This may indicate poor experimental design but it doesnt indicate a breakdown of the method.

To the second question – am keen on using all the information – frequentist methods can do this too in elementary examples such as combining objective confidence intervals from different experiments etc. What I find difficult to accept is the inherent fuzziness of the Bayesian definition of probability which wants to include your “strength of implication” and which needn’t be testable by repeat experiment.

The essence of the issue is that a Bayesian wants a true value of a parameter to be able to have an error associated with it because that

seems intuitive and great statements can be made that the true value is w=-1+-0.1 or whatever. Now you can do that but only by inventing something called eg “strength of implication” which is all in someone’s mind and which has no direct link to experiment. The probability linked to that +-0.1 can only be judged and not measured BECAUSE A TRUE VALUE CLEARLY HAS NO ERROR. Instead of w, think of estimating pi by repeatedly dividing measured circumferences of drawn circles by the measured diameter. The Bayesian might obtain pi=3.1+-0.1 but what does this error on a true value mean since pi only has one value? The frequentist position is that only an ESTIMATOR of a true value can have an error associated with it and that error can now be objectively estimated based on repeat experiments.

Calling your probability “strength of implication” would certainly help in distinguishing these two cases. Other than that, we may have to agree to differ on which provides the more sound basis for statistical physics.

August 13, 2009 at 10:57 pm

Tom,

I took pains to set up a problem in which you know with certainty what value a parameter takes (this is part of my problem specification) and in which you are measuring the parameter with noisy apparatus only because somebody who doesn’t trust you has ordered you to. You are now saying that a method which gives an answer *different from the one you know is correct* is OK. What do you understand by the word ‘wrong’?!

A true value has no error, but what if you don’t know that value? What about the 10^18th digit of pi? And what about the number of protons in the universe given various astrophysical measurements? I accept that you don’t like applying your idea of ‘probability’ to such situations, but I can apply mine – strength of implication – and it obeys the sum and product rules and is what I actually want in order to make progress.

Anton

PS How do I measure a probabilty? Please give me a simple example.

August 14, 2009 at 10:31 pm

I should have said that in the example with pi, I am assuming we dont know the answer. The frequency histogram from experimental trials is inspected and a model made, including hypotheses about the true values of the parameters that define that distribution. Further repeat experiments can then be used to test the distribution parameters by hypothesis testing or to make estimates and confidence intervals for the value of pi. Throughout there is an assumption that a true value of pi exists but an avoidance of interpreting the scatter in the experiment as indicating some variation in the true value of pi. Now you may correct me but doesn’t the Bayesian start by assuming a prior for pi which could be interpreted as an error on a true value or as having some degree of belief interpretation. The second interpretation is less nonsensical than the first but still may have no objective, repeat experiment verification.

How to measure a probability – measure the relative frequency of occurrence of an event, in a large number of repeat experiments.

Tom

August 14, 2009 at 10:46 pm

Tom.

How big is your “large number”? Don’t you need it to be infinite? If so, it’s not really “measurable”, is it?

Peter

August 15, 2009 at 10:02 am

Tom,

The Bayesian “distribution for pi” (whether prior or posterior) is a representation of how strongly our (prior or posterior) information *implies that the true value is 22/7, or 31416/10000,* or whatever. That is a crucial sentence which I hope will help you see where we Bayesians are coming from. We do not believe that pi can vary, nor is it implicit in our language. Info about pi can come from higher mathematical arguments, or observations of digit frequencies in the first 1000 digits, or wherever. (Actually, higher math tells us that any fraction is impossible, as you will know.)

A relative frequency may be equal in numerical value to a

degree-of-implication, but it’s not the same thing conceptually.

Please note that I’ve phrased that last sentence carefully to avoid any head-on clash about the meaning of the p-word. I don’t want to contend about that. All I am saying is that degree-of-implication obeys the sum and product rules and is what you actually want in all problems involving reasoning under uncertainty, whether one-offs like the mass of the universe or in repeated tossings of a coin (though the collection of outcomes is still a one-off in a large enough space, as Peter implies).

Anton

August 15, 2009 at 2:31 pm

Peter

the difference between large and infinite isn’t enough to make me abandon the objective approach.

Anton

I agree with everything you say – my problem is with the potentially subjective fuzziness in defining “degrees-of-implication”.

The IAU GA is now finished – have to say that the scope for debate in this blog has been significantly higher than in some of the big IAU sessions.

But I enjoyed it all!

August 15, 2009 at 3:12 pm

Tom,

The gap between “large” and “infinite” is itself infinite….

Peter

August 15, 2009 at 3:54 pm

Tom: Please see my comments about programming a robot to do inference (this thread, August 13th, 1042) in order to dispel subjectivity of the type to which you (rightly) object. I’d gladly say more if I could see how it would take this debate forward, but right now I’d simply be repeating those comments.

Cheers: Anton

January 25, 2010 at 1:37 am

[…] A Mountain of Truth « In the Dark […]