## Why we should abandon “statistical significance”

So a nice paper by McShane et al. has appeared on the arXiv with the title Abandon Statistical Significance and abstract:

In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration–often scant–given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.

This piece is in part a reaction to a paper by Benjamin et al. in Nature Human Behaviour that argues for the adoption of a standard threshold of p=0.005 rather than the more usual p=0.05. This latter paper has generated a lot of interest, but I think it misses the point entirely. The fundamental problem is not what number is chosen for the threshold p-value, but what this statistic does (and does not) mean. It seems to me the p-value is usually an answer to a question which is quite different from that which a scientist would want to ask, which is what the data have to say about a given hypothesis. I’ve banged on about Bayesian methods quite enough on this blog so I won’t repeat the arguments here, except that such approaches focus on the probability of a hypothesis being right given the data, rather than on properties that the data might have given the hypothesis.

While I generally agree with the arguments given in McShane et al, I don’t think it goes far enough. I think p-values are so misleading, if I had my way I’d ban them altogether!

### 9 Responses to “Why we should abandon “statistical significance””

1. I’m more hesitant to throw out the classical statistical framework built up by Pearson, Neyman, Fisher, etc — these guys were certainly careful thinkers and mathematicians. With p-values you get an objective answer to a well-defined question. Unfortunately, the “probability of a hypothesis being right” has a fuzzy definition that can only be quantified in terms of “degree of belief”, an inherently subjective and ill-defined thing. I like seeing p-values and statistical power and confidence intervals — in the best cases, at least, they’re numbers I can wrap my head around.

• The answer you get for a p-value is well-defined, but it is an answer to a question which is almost always irrelevant to the problem at hand.

Bayesian probability needs to include prior information, and is subjective only in the sense that different people have access to different information. There’s nothing “ill-defined” about it however: the logical meaning of a posterior probability is very clear.

• I think the logic behind going from prior to posterior is consistent. Unfortunately, the quantity that is being updated, the Degree of Belief in the hypothesis, has no physical meaning that I know of.

Frequentist probability I understand: flipping coins, spinning roulette wheels, repeating measurements. Degree of belief, I’m not so sure. What does it mean to have a 95% degree of belief that the Hubble Constant is between 67 and 69 km/s/Mpc? Is there a definition that doesn’t boil down to “it’s my personal attempt to assign a number to a gut instinct”?

• Bayesian probability is the only consistent way of generalising the Boolean logic of 0 and 1 to the intermediate case in which there is insufficient information to assign 0 or 1. Any other system is inconsistent.

I’m puzzled by your assertion that this means that Bayesian probability has `no physical meaning’.

It’s worth also pointing out that frequentist probabilities are defined using an infinite number of repeated experiments. That’s not really a physical definition.

• “What does it mean to have a 95% degree of belief that the Hubble Constant is between 67 and 69 km/s/Mpc?”

As Bill Press said (after introducing himself as the front end of the Press-Schechter horse, with Paul Schechter in the audience), someone knows the Hubble constant to 5 significant figures—we just don’t know who that person is.

• “It’s worth also pointing out that frequentist probabilities are defined using an infinite number of repeated experiments. That’s not really a physical definition.”

Infinity occurs in many places in physics and maths, at least as a limiting case. I don’t see this one as being different.

Yes, the Bayesian framework is consistent. Nevertheless, one can’t just avoid discussion of subjectivity of priors.

• It seems a bit odd to me to use infinity in an operational definition, though I’ll grant that one does sometimes use limits. Where infinities occur in physics they tend represent places where theory is incomplete, not entities that exist in the physical world. My main objection, however, is the use of the concept of an ensemble in such definitions. I think that’s unphysical.

I mentioned subjectivity of priors in the post. They are subjective only in the sense that different people have different information. At least the Bayesian approach puts the prior information or prior assumptions on the table, whereas frequentist approaches ignore it entirely.
I like priors and do not feel even slightly defensive about them. You nearly always know more than you think you do.

• I was thinking about things like assuming that a solid body is infinite. This is often a useful approximation when studying small-scale phenomena.

Half of the tosses of a fair coin are heads—in the limit of an infinite number of tosses. Even if no-one tosses a coin an infinite number of times, I think that the concept of a limit is clear.

• Anton Garrett Says:

Forget about degree of belief with all the psychological baggage. Consider probability as a numerical measure of the extent to which one proposition implies the truth of another, according to the relations you are aware of between the things to which the propositions refer.

And if you object to that definition, it’s no matter, for strength of implication is what you actually want in every problem in which you have uncertainty, and it can be shown to obey two relations which are the sum and product rules, commonly known as “the laws of probability”.

It is true that we do not know how to quantify the prior information we have in many cases, but that (Phillip) does not mean priors are subjective – it should be a motivation to do more research rather than to give up.