There was an interesting article on the BBC website this week that, for once, contains an example of a reasonable discussion of statistics in the mass media. I’m indebted to my friend Anton for pointing it out to me. I’ve filed it along with examples of Bad Statistics because the issue is usually poorly explained. I don’t think the article itself is bad. In fact, it’s rather good.

The question is all about cancer screening, specifically for breast cancer, but the lesson could apply to a host of other situations. In the original context, the question goes as follows:

Say that routine screening is 90% accurate. Say you have a positive test. What’s the chance that your positive test is accurate and you really have cancer?

Presumably there will be many of you that think the answer is 90%. Hands up if you think this!

If you don’t think it’s 90% then what do you think it is?

The correct answer is that you have no idea. I haven’t given you enough information.

To see why, imagine that the prevalence of cancer in the population is such that 1% of a randomly selected sample will have it. Out of a thousand people one would expect that, on average, ten would have cancer. If the test is 90% accurate then 9 of these will show positive signs and only one won’t.

However, 990 people out of the original thousand don’t have cancer. If the test is only 90% accurate then 10%, i.e. 99 of these will show a false positive.

Thus the total number of positive tests is 108 and only 9 of the individuals concerned actually have cancer. The odds are therefore 9/108. That’s only about a 1-in-12 chance that you have cancer.

But that depends on my assumption about the overall rate in the population. If that number is different it changes the odds. Without this information, the problem is ill-posed.

The more general way of looking at this is in terms of conditional probabilities. What you are given is that P(positive test| cancer)=P(+|C)=0.9 and P(negative test|no cancer)=0.9, while P(negative test|cancer)= 0.1 and P(positive test|no cancer)=P(+|N)=0.1. What you want to know is P(cancer|positive test)=P(C|+). This can be obtained from Bayes’ Theorem but only if you know P(cancer)=P(C)=1-P(N), since people either have cancer or they don’t.

The answer is given by P(C|+)=P(C)P(+|C)/[P(C)P(+|C)+P(N)P(+|N)], which for the numbers I gave above= 0.01 x 0.9/[0.01 x 0.9 + 0.99 x 0.1]=0.009/[0.009+0.099], which gives the same answer as before.

So the moral is that you shouldn’t panic if you get a positive test from a screening test of this type. As long as the condition being tested is relatively rarer than the likelihood of an error in the test result then the chances are high that you’ve got nothing to worry about. But of course, you should take more detailed tests.

The Bayesian way is the easy way!