Bayes in the Dock

A few days ago John Peacock sent me a link to an interesting story about the use of Bayes’ theorem in legal proceedings and I’ve been meaning to post about it but haven’t had the time. I get the distinct feeling that John, who is of the frequentist persuasion,  feels a certain amount of delight that the beastly Bayesians have got their comeuppance at last.

The story in question concerns an erroneous argument given during a trial about the significance of a match found between a footprint found at a crime scene and footwear belonging to a suspect.  The judge took exception to the fact that the figures being used were not known sufficiently accurately to make a reliable assessment, and thus decided that Bayes’ theorem shouldn’t be used in court unless the data involved in its application were “firm”.

If you read the Guardian article you will see that there’s a lot of reaction from the legal establishment and statisticians about this, focussing on the forensic use of probabilistic reasoning. This all reminds me of the tragedy of the Sally Clark case and what a disgrace it is that nothing has been done since then to improve the misrepresentation of statistical arguments in trials. Some of my Bayesian colleagues have expressed dismay at the judge’s opinion, which no doubt pleases Professor Peacock no end.

My reaction to this affair is more muted than you would probably expect. First thing to say is that this is really not an issue relating to the Bayesian versus frequentist debate at all. It’s about a straightforward application of Bayes’ theorem which, as its name suggests, is a theorem; actually it’s just a straightforward consequence of the sum and product laws of the calculus of probabilities. No-one, not even the most die-hard frequentist, would argue that Bayes’ theorem is false. What happened in this case is that an “expert” applied Bayes’ theorem to unreliable data and by so doing obtained misleading results. The  issue is not Bayes’ theorem per se, but the application of it to inaccurate data. Garbage in, garbage out. There’s no place for garbage in the courtroom, so in my opinion the judge was quite right to throw this particular argument out.

But while I’m on the subject of using Bayesian logic in the courts, let me add a few wider comments. First, I think that Bayesian reasoning provides a rigorous mathematical foundation for the process of assessing quantitatively the extent to which evidence supports a given theory or interpretation. As such it describes accurately how scientific investigations proceed by updating probabilities in the light of new data. It also describes how a criminal investigation works too.

What Bayesian inference is not good at is achieving closure in the form of a definite verdict. There are two sides to this. One is that the maxim “innocent until proven guilty” cannot be incorporated in Bayesian reasoning. If one assigns a zero prior probability of guilt then no amount of evidence will be able to change this into a non-zero posterior probability; the required burden is infinite. On the other hand, there is the problem that the jury must decide guilt in a criminal trial “beyond reasonable doubt”. But how much doubt is reasonable, exactly? And will a jury understand a probabilistic argument anyway?

In pure science we never really need to achieve this kind of closure, collapsing the broad range of probability into a simple “true” or “false”, because this is a process of continual investigation. It’s a reasonable inference, for example, based on Supernovae and other observations that the Universe is accelerating. But is it proven that this is so? I’d say “no”,  and don’t think my doubts are at all unreasonable…

So what I’d say is that while statistical arguments are extremely important for investigating crimes – narrowing down the field of suspects, assessing the reliability of evidence, establishing lines of inquiry, and so on – I don’t think they should ever play a central role once the case has been brought to court unless there’s much clearer guidance given to juries on how to use it and stricter monitoring of so-called “expert” witnesses.

I’m sure various readers will wish to express diverse opinions on this case so, as usual, please feel free to contribute through the box below!

22 Responses to “Bayes in the Dock”

  1. IIRC, Bayes Theorem (or lack of its application) had a big bearing on the OJ Simpson trial: http://www.math.cornell.edu/~mec/2008-2009/TianyiZheng/Bayes.html (scroll down)

  2. Charles Jenkins Says:

    It’s a reasonable inference, for example, based on Supernovae and other observations that the Universe is accelerating. But is it proven that this is so? I’d say “no”, and don’t think my doubts are at all unreasonable…

    Evidently not doubts that were shared by the Swedish Academy of Sciences!

    • telescoper Says:

      As I said on my post about the Nobel Prize, I wouldn’t have phrased the citation the way they did, but I still think they deserved it!

  3. I wouldn’t say that statistics should never play a central role in court. I think like any tool it should be thoroughly explained. Just as DNA evidence isn’t the magic bullet that many people believe it is. Statistics is a useful tool to explain how likely any particular event is what role it plays in court depends on the evidence that is presented and the manner in which it is presented.

    In the case of the article it’s unclear what size the errors on the expert’s assumptions were. If he knew that there were 400,000 shoes with the same size and type of tred with an error of 50 then I think the judge was being a bit unreasonable. If on the other hand he’d guessed “about 40,000 give or take a few thousand” then the judge is utterly correct. The article doesn’t make it very clear (that I could see) what the limits of the analysis were.

  4. John lives in Scotland. Correct me if I am wrong, but doesn’t Scottish (or is that Scots (that’s the language, right?) or Scotch (that’s a type of pine tree, right?)?) law have a third possible verdict, in addition to “guilty” and “innocent”, namely “not proven”?

  5. John Peacock Says:

    Peter,

    Please don’t condemn me as a frequentist: I think I’m as Bayesian as you are, since we both agree that the Bayesian approach is the formally correct way of approaching issues of inference and hypothesis testing in the light of data. Such reservations as I have focus on exactly the same issue you emphasise: garbage in = garbage out. Too often, poor Bayes is made to work with a prior probability that was dug up from goodness knows where, so that his machine disgorges a probability whose robustness is clearly questionable – and yet tends to be treated as holy writ, because to be Bayesian must mean you are perfect in all respects. It amused me to see that a judge was more willing than many practicing scientists to state that the Bayesian emperor sometimes lacked a good deal of his clothing. I don’t think this sort of critique is to be considered controversial, and it will be taken as a mundane and obvious point by those who know what they are doing in this subject – but I don’t get the impression that everyone worries about it as much as they might.

    As for the frequentist heresy, I would submit that this doesn’t need to be a distinct world-view from Bayes. If you’re using a Bayesian decision process, you’re still entitled to ask how it would perform under repeated trials. That way, you address useful questions of experimental design, such as how often you would expect a given experiment to discriminate two hypotheses using a Bayesian analysis. This is worth knowing, before you spend $1B building it. Charles Jenkins and I mused over these issues in http://arxiv.org/abs/1101.4822

    • telescoper Says:

      I apologize unreservedly for the vile slander I’ve perpetrated in labelling you a frequentist.

      No doubt you will be taking me to court…

    • telescoper Says:

      ps. You can indeed ask how a Bayesian test would perform under repeated trials, but it surprises me that you would be interested in the answer.

      • If your Bayesian method said hypothesis H_0 was significantly more probable than hypothesis H_1, would you not want to know what percentage of times this would occur when H_0 was really true and what percentage of times this would occur when H_1 was really true? If you found that the former was not a number close to 100 and the latter not a number close to 0, surely that would be cause for concern? If you take your priors and likelihoods seriously, then this is a perfectly well defined question.

      • telescoper Says:

        It is indeed a well-defined question, but I still don’t think the answer is interesting, as the likelihood information to which you seem to be referring would already have been incorporated in the calculation of the posterior probabilities (along, of course, with the prior).

      • John Peacock Says:

        Peter: your reply about “the likelihood” seems to imply you’re thinking about just one dataset. But by “repeated trials” I mean doing the experiment many times, getting different data (and hence different likelihood functions) each time. Now, many Bayesians don’t like the idea of asking “what if I had got different data?” – you got what you got, and the Bayesian preference is to treat the experiment as a unique event and see what you can learn from it. Jaynes was particularly scathing about frequentists rejecting hypotheses on the basis of observations you *didn’t* make.

        But this is a bit rigid: many experiments involve repeating similar measurements many times (survey astronomy), so I see no reason not to think about the probability distribution of a Bayes-derived quantity over an ensemble of datasets.

        Certainly in experiment design you have to take this approach: you don’t have any data yet, and you’re wondering whether to spend money getting some. The Fisher matrix in an example of this approach, where (loosely) you average the Bayesian error expression over an ensemble of fictitious data. But in that case there are other questions, which are not normally asked: e.g. what is the error bar on your Fisher-based error bar?

      • telescoper Says:

        Instead of the error bar on an error bar, why not go the whole hog and calculate the whole probability distribution?

  6. I’m not sure ‘innocent until proven guilty’ is meant to be taken in quite the way you suggest. Surely it’s a directive as to how one should treat the party in question, not what you should believe.
    Bayesian reasoning does the opposite – it gives you a way (in principle, if you have lots of lovely data perhaps) of deciding what you should believe, but not how you should act – that requires weighing factors beyond mere probabilities.

  7. Anton Garrett Says:

    Agreed. I have never been happy with Bruno de Finetti’s derivation of the laws of probability via decision theory, ie from which way you would bet. Logically, you have to work out what you believe before you decide what to do. I go for RT Cox’s derivation of the sum and product rules form the properties of the algebra of the propositions which the probabilities are of. (Forgive the grammar!)

    Which, come to think of it, leaves me wondering what de Finetti’s analysis actually means…

  8. Anton Garrett Says:

    The issue in this sort of legal case is how to combine a numerically precise likelihood, eg from DNA, with hard-to-quantify information such as somebody who resembles the suspect in build being seen in the vicinity of the crime. A few expert witnesses have over-reached themselves by trying to teach juries how to quantify such information (which even mathematicians and scientists don’t know how). This approach will, in the eyes of judges, discredit even the good research. The only proper approach is instead to translate the numerical likelihood into qualitative concepts that the jury can understand, eg ‘one person in a city the size of [specify a town near to where the jury lives]’, or ‘one person in a full Wembley stadium’; and then tell the jury to mulll that over for themselves in combination with the other evidence.

    Anton

    • telescoper Says:

      I agree with that, but still run up against the phrase “beyond reasonable doubt”. I wonder what guidance is given to the interpretation of that?

    • Anton Garrett Says:

      The point is not to give guidance as to what the phrase means. The ambiguity of language, compared to numbers, has its purpose.

    • telescoper Says:

      This is why I’d be useless on a jury. I doubt everything.

      • Monica Grady Says:

        I recently did jury service. We were given some guidance about ‘beyond reasonable doubt’. The whole experience was uncomfortable. I investigate a problem, and if I don’t have sufficient evidence (data) to reach a conclusion, I acquire more evidence, or redesign the experiment, whatever. I don’t ignore data if it doesn’t fit the hypothesis i’m testing. I’m not used to being told that I have sufficient evidence to reach a conclusion, when it is clear that there are other data available. This happened in both the cases I was on – prosecution and defense had agreed in advance what would be presented. Not an experience I’d be in an hurry to repeat.

  9. […] this is a new report about a new case, it’s actually not an entirely new conclusion. I blogged about a similar case a couple of years ago, in fact. The earlier story n concerned an erroneous argument given during a trial about the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: