Wednesday, June 18, 2008

Controversy in Null Hypothesis Significance Testing

Macdonald, R. R. (1997). On statistical testing in psychology. BRITISH JOURNAL OF PSYCHOLOGY. 88 (2), 333-348.
*Criticisms of NHST apply to Neyman-Pearson approach, but not the Fisherian approach

Huberty, C. J. (1993). Historical Origins of Statistical Testing Practices. Journal of Experimental Education. 61 (4), 317-33.
*Textbooks in psychology confuse the use and interpretation of p-values and alpha-levels

Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311-339). Hillsdale, NJ: Lawrence Erlbaum Associates.
*Textbooks present an incoherent “hybrid logic,” mixing Neyman-Pearson and Fisher

Kaiser, H. (1960). Directional statistical decisions. Psychological Review. 67, 160-167.
*When using two-sided tests, it makes no sense to use a non-directional test

Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1-20.
*The false ideas that null results are useless or more likely to be due to incompetence prevents (or at least slows) scientific progress

Rosenthal, R. (1979). The “File Drawer Problem” and tolerance for null results. Psychological Bulletin, 86, 638-641.
*Research that results in null results are rarely published, making a field of research look more “significant” than it might actually be

Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin. 57, 416-28.
*Using statistical tests to make “decisions” is naïve and rejection criteria are arbitrary, calling for a use of confidence intervals and (if possible) Bayesian statistics

Bakan D. (1966). The test of significance in psychological research. Psychological Bulletin. 66 (6), 423-37.
*Statistical results are often misinterpreted, calling for Bayesian methods

Hunter, J. E. (1997). Needed: A Ban on the Significance Test. PSYCHOLOGICAL SCIENCE -CAMBRIDGE-. 8 (1), 3-7.
*NHST breaks down when H0 is false and most studies purposely use H0's they know to be false, causing the error rate (Type I and II) of NHST to be around 60%

Cohen, J. (1994). The Earth Is Round (p <.05). AMERICAN PSYCHOLOGIST. 49 (12), 997.
*The logic of NHST is flawed and backwards; we need to better understand our data

Cohen, J. (1990). Things I Have Learned (So Far). American Psychologist. 45 (12), 1304-12.
*Informed judgment from the researcher is indispensable; power analysis can help

Loftus, G. R. (1996). Psychology Will Be a Much Better Science When We Change the Way We Analyze Data. CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE. 5 (6), 161-170.
*Null hypotheses are rarely possible, making “significance” useless; power is under-attended and the dichotomy of effects/non-effects is artificial

Harris, R. J. (1997). Reforming significance testing via three-valued logic. In Harlow, L.L., Mulaik, S.A., & Steiger, J.H. (Eds.) What if there were no significance tests? Hillsdale, NJ: Erlbaum.
*Three-valued logic can establish directionality and address Type III error

Wilkinson, L. (1999). Statistical Methods in Psychology Journals: Guidelines and Explanations. AMERICAN PSYCHOLOGIST. 54 (8), 594-604.
*APA decided not to ban NHST, instead urging researchers to distinguish between statistical and theoretical significance, and also use modern statistical graphics

Abelson, R. P. (1997). On the Surprising Longevity of Flogged Horses: Why There Is a Case for the Significance Test. PSYCHOLOGICAL SCIENCE -CAMBRIDGE-. 8 (1), 12-15.
*NHST can be used effectively in combination with other methods; enforcing a complete ban on it would be throwing away a tool that can be useful in a number of situations

Greenwald, A. G., Gonzalez, R., Harris, R. J., & Guthrie, D. (1996). Effect Sizes and p Values: What Should Be Reported and What Should Be Replicated? PSYCHOPHYSIOLOGY. 33 (2), 175-183.
*The interpretation of p-values in terms of replicability is widely mistaken

Harris, R. J. (1997). Significance Tests Have Their Place. PSYCHOLOGICAL SCIENCE -CAMBRIDGE-. 8 (1), 8-11.
*Three-valued logic can help NHST; using confidence intervals as an alternative runs into the same problems as NHST, while providing less information than a p-value would

Jones, L. V., & Tukey, J. W. (2000). A Sensible Formulation of the Significance Test. PSYCHOLOGICAL METHODS. 5, 411-414.
*Yet another iteration of the virtues of three-valued logic applied to NHST

No comments: