Does testing for statistical significance encourage or discourage thoughtful data analysis?

In light of Jane’s post yesterday on Sizeless Science, it’s interesting to consider the position outlined in a policy statement in Epidemiology,the Official Journal of the International Society for Environmental Epidemiology. (Rothman, K. J. (1998). Special article: Writing for epidemiology. Epidemiology, 9(3):333–337). which argues not only for thoughtful interpretation of findings, but for not reporting statistical significance at all.

When writing for Epidemiology, you can . . . enhance your prospects if you omit tests of statistical significance. . . . [E]very worthwhile journal will accept papers that omit them entirely. In Epidemiology, we do not publish them at all. Not only do we eschew publishing claims of the presence or absence of statistical significance, we discourage the use of this type of thinking in the data analysis . . . . We also would like to see the interpretation of a study based not on statistical significance, or lack of it, for one or more study variables, but rather on careful quantitative consideration of the data in light of competing explanations for the findings.

For example, we prefer a researcher to consider whether the magnitude of an estimated effect could be readily explained by uncontrolled confounding or selection biases, rather than simply to offer the uninspired interpretation that the estimated effect is significant, as if neither chance nor bias could then account for the findings. . . . [Emphasis added]

Many data analysts appear to remain oblivious to the qualitative nature of significance testing. Although calculations based on mountains of valuable quantitative information may go into it, statistical significance is itself only a dichotomous indicator. As it has only two values, ’significant’ or ’not significant’, it cannot convey much useful information. Even worse, those two values often signal just the wrong interpretation. These misleading signals occur when a trivial effect is found to be ’significant’, as often happens in large studies, or when a strong relation is found ’nonsignificant’, as often happens in small studies.

Interesting to consider whether the routine use of statistical significance testing leads to less thoughtful analysis or more thoughtful analysis. I think there is a real risk of the former and can think of many examples where a finding of statistical significance has been interpreted in exactly the erroneous way described above (in bold). (Please feel free to add examples!)

Another useful paper on this issue is Kristin Sainani, (2010) “Misleading Comparisons: The Fallacy of Comparing Statistical Significance”Physical Medicine and Rehabilitation, Vol. 2 (June), 559-562 which discusses the need to look carefully at within-group differences as well as between-group differences, and at sub-group significance compared to interaction. She concludes:

‘Readers should have a particularly high index of suspicion for controlled studies that fail to report between-group comparisons, because these likely represent attempts to “spin” null results.”

1 comment to Does testing for statistical significance encourage or discourage thoughtful data analysis?

  • David Earle

    This and the previous post raise some very useful discussion. I wish I had a bit more time to respond in detail. But there are a few useful mantras in this area:
    – always look at the underlying data patterns first (from Jerry Winston)
    – God did not decree the 95% confidence interval (from a stats teacher)
    – maths is the practice of calculating a right answer – statistics is the practice of estimating probability of an event – and the two should not be confused (recent insight of my own)