Macdonald, R. R. (1997). On statistical testing in psychology. BRITISH JOURNAL OF PSYCHOLOGY. 88 (2), 333-348.
*Criticisms of NHST apply to Neyman-Pearson approach, but not the Fisherian approach
Huberty, C. J. (1993). Historical Origins of Statistical Testing Practices. Journal of Experimental Education. 61 (4), 317-33.
*Textbooks in psychology confuse the use and interpretation of p-values and alpha-levels
Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311-339). Hillsdale, NJ: Lawrence Erlbaum Associates.
*Textbooks present an incoherent “hybrid logic,” mixing Neyman-Pearson and Fisher
Kaiser, H. (1960). Directional statistical decisions. Psychological Review. 67, 160-167.
*When using two-sided tests, it makes no sense to use a non-directional test
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1-20.
*The false ideas that null results are useless or more likely to be due to incompetence prevents (or at least slows) scientific progress
Rosenthal, R. (1979). The “File Drawer Problem” and tolerance for null results. Psychological Bulletin, 86, 638-641.
*Research that results in null results are rarely published, making a field of research look more “significant” than it might actually be
Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin. 57, 416-28.
*Using statistical tests to make “decisions” is naïve and rejection criteria are arbitrary, calling for a use of confidence intervals and (if possible) Bayesian statistics
Bakan D. (1966). The test of significance in psychological research. Psychological Bulletin. 66 (6), 423-37.
*Statistical results are often misinterpreted, calling for Bayesian methods
Hunter, J. E. (1997). Needed: A Ban on the Significance Test. PSYCHOLOGICAL SCIENCE -CAMBRIDGE-. 8 (1), 3-7.
*NHST breaks down when H0 is false and most studies purposely use H0's they know to be false, causing the error rate (Type I and II) of NHST to be around 60%
Cohen, J. (1994). The Earth Is Round (p <.05). AMERICAN PSYCHOLOGIST. 49 (12), 997.
*The logic of NHST is flawed and backwards; we need to better understand our data
Cohen, J. (1990). Things I Have Learned (So Far). American Psychologist. 45 (12), 1304-12.
*Informed judgment from the researcher is indispensable; power analysis can help
Loftus, G. R. (1996). Psychology Will Be a Much Better Science When We Change the Way We Analyze Data. CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE. 5 (6), 161-170.
*Null hypotheses are rarely possible, making “significance” useless; power is under-attended and the dichotomy of effects/non-effects is artificial
Harris, R. J. (1997). Reforming significance testing via three-valued logic. In Harlow, L.L., Mulaik, S.A., & Steiger, J.H. (Eds.) What if there were no significance tests? Hillsdale, NJ: Erlbaum.
*Three-valued logic can establish directionality and address Type III error
Wilkinson, L. (1999). Statistical Methods in Psychology Journals: Guidelines and Explanations. AMERICAN PSYCHOLOGIST. 54 (8), 594-604.
*APA decided not to ban NHST, instead urging researchers to distinguish between statistical and theoretical significance, and also use modern statistical graphics
Abelson, R. P. (1997). On the Surprising Longevity of Flogged Horses: Why There Is a Case for the Significance Test. PSYCHOLOGICAL SCIENCE -CAMBRIDGE-. 8 (1), 12-15.
*NHST can be used effectively in combination with other methods; enforcing a complete ban on it would be throwing away a tool that can be useful in a number of situations
Greenwald, A. G., Gonzalez, R., Harris, R. J., & Guthrie, D. (1996). Effect Sizes and p Values: What Should Be Reported and What Should Be Replicated? PSYCHOPHYSIOLOGY. 33 (2), 175-183.
*The interpretation of p-values in terms of replicability is widely mistaken
Harris, R. J. (1997). Significance Tests Have Their Place. PSYCHOLOGICAL SCIENCE -CAMBRIDGE-. 8 (1), 8-11.
*Three-valued logic can help NHST; using confidence intervals as an alternative runs into the same problems as NHST, while providing less information than a p-value would
Jones, L. V., & Tukey, J. W. (2000). A Sensible Formulation of the Significance Test. PSYCHOLOGICAL METHODS. 5, 411-414.
*Yet another iteration of the virtues of three-valued logic applied to NHST
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment