Archive for arXiv:1709.07588

Why we should abandon “statistical significance”

Posted in Bad Statistics with tags , on September 27, 2017 by telescoper

So a nice paper by McShane et al. has appeared on the arXiv with the title Abandon Statistical Significance and abstract:

In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration–often scant–given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.

This piece is in part a reaction to a paper by Benjamin et al. in Nature Human Behaviour that argues for the adoption of a standard threshold of p=0.005 rather than the more usual p=0.05. This latter paper has generated a lot of interest, but I think it misses the point entirely. The fundamental problem is not what number is chosen for the threshold p-value, but what this statistic does (and does not) mean. It seems to me the p-value is usually an answer to a question which is quite different from that which a scientist would want to ask, which is what the data have to say about a given hypothesis. I’ve banged on about Bayesian methods quite enough on this blog so I won’t repeat the arguments here, except that such approaches focus on the probability of a hypothesis being right given the data, rather than on properties that the data might have given the hypothesis.

While I generally agree with the arguments given in McShane et al, I don’t think it goes far enough. I think p-values are so misleading, if I had my way I’d ban them altogether!