The vast majority of experimental work in behavioral and biomedical science involves group comparison—in the simplest case, an experimental and a control group. Average data are compared and the variability within each group is used to estimate the probability that any mean difference could have occurred by chance. The estimation method typically used, the Null Hypothesis Statistical Test (NHST) method, was devised by Ronald Fisher in a context to be discussed in a moment.
The replication crisis in science is a big deal. It raises the spectre that the academic community cannot necessarily trust published claims and imperils public trust in science. The statistician community do not agree on the best way to address the problem while facilitating research practice. This interesting piece dives into the debate. We have included links to four related items.
Replicability would sometimes be improved by a tougher criterion; but a p-value this small would also eliminate much social science research that uses NHST; the publication rate in social and biomedical science would plummet. Partly for this reason, more than 80 scientists signed a November 2017 letter (Lakens et al, 2017) to Nature rejecting the suggestion of Benjamin et al., instead recommending “that the label ‘statistically significant’ should no longer be used” and concluding instead “that researchers should transparently report and justify all choices they make when designing a study, including the alpha [critical p-value] level.”