Statistical_hypothesis_testing

Statistical hypothesis test

Method of statistical inference

A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently support a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.^[1]^[2]

The above image shows a table with some of the most common test statistics and their corresponding tests or models.

Performing a frequentist hypothesis test in practice

The typical steps involved in performing a frequentist hypothesis test in practice are:

Define a hypothesis (claim which is testable using data).
Select a relevant statistical test with associated test statistic T.
Derive the distribution of the test statistic under the null hypothesis from the assumptions. In standard cases this will be a well-known result. For example, the test statistic might follow a Student's t distribution with known degrees of freedom, or a normal distribution with known mean and variance.
Select a significance level (α), the maximum acceptable false positive rate. Common values are 5% and 1%.
Compute from the observations the observed value t_obs of the test statistic T.
Decide to either reject the null hypothesis in favor of the alternative or not reject it. The Neyman-Pearson decision rule is to reject the null hypothesis H₀ if the observed value t_obs is in the critical region, and not to reject the null hypothesis otherwise.^[31]

Interpretation

When the null hypothesis is true and statistical assumptions are met, the probability that the p-value will be less than or equal to the significance level $\alpha$ is at most $\alpha$ . This ensures that the hypothesis test maintains its specified false positive rate (provided that statistical assumptions are met).^[35]

The p-value is the probability that a test statistic which is at least as extreme as the one obtained would occur under the null hypothesis. At a significance level of 0.05, a fair coin would be expected to (incorrectly) reject the null hypothesis (that it is fair) in 1 out of 20 tests on average. The p-value does not provide the probability that either the null hypothesis or its opposite is correct (a common source of confusion).^[36]

If the p-value is less than the chosen significance threshold (equivalently, if the observed test statistic is in the critical region), then we say the null hypothesis is rejected at the chosen level of significance. If the p-value is not less than the chosen significance threshold (equivalently, if the observed test statistic is outside the critical region), then the null hypothesis is not rejected at the chosen level of significance.

In the "lady tasting tea" example (below), Fisher required the lady to properly categorize all of the cups of tea to justify the conclusion that the result was unlikely to result from chance. His test revealed that if the lady was effectively guessing at random (the null hypothesis), there was a 1.4% chance that the observed results (perfectly ordered tea) would occur.

Use and importance

Statistics are helpful in analyzing most collections of data. This is equally true of hypothesis testing which can justify conclusions even when no scientific theory exists. In the Lady tasting tea example, it was "obvious" that no difference existed between (milk poured into tea) and (tea poured into milk). The data contradicted the "obvious".

Real world applications of hypothesis testing include:^[37]

Testing whether more men than women suffer from nightmares
Establishing authorship of documents
Evaluating the effect of the full moon on behavior
Determining the range at which a bat can detect an insect by echo
Deciding whether hospital carpeting results in more infections
Selecting the best means to stop smoking
Checking whether bumper stickers reflect car owner behavior
Testing the claims of handwriting analysts

Statistical hypothesis testing plays an important role in the whole of statistics and in statistical inference. For example, Lehmann (1992) in a review of the fundamental paper by Neyman and Pearson (1933) says: "Nevertheless, despite their shortcomings, the new paradigm formulated in the 1933 paper, and the many developments carried out within its framework continue to play a central role in both the theory and practice of statistics and can be expected to do so in the foreseeable future".

Significance testing has been the favored statistical tool in some experimental social sciences (over 90% of articles in the Journal of Applied Psychology during the early 1990s).^[38] Other fields have favored the estimation of parameters (e.g. effect size). Significance testing is used as a substitute for the traditional comparison of predicted value and experimental result at the core of the scientific method. When theory is only capable of predicting the sign of a relationship, a directional (one-sided) hypothesis test can be configured so that only a statistically significant result supports theory. This form of theory appraisal is the most heavily criticized application of hypothesis testing.

Examples

Courtroom trial

A statistical test procedure is comparable to a criminal trial; a defendant is considered not guilty as long as his or her guilt is not proven. The prosecutor tries to prove the guilt of the defendant. Only when there is enough evidence for the prosecution is the defendant convicted.

In the start of the procedure, there are two hypotheses $H_{0}$ : "the defendant is not guilty", and $H_{1}$ : "the defendant is guilty". The first one, $H_{0}$ , is called the null hypothesis. The second one, $H_{1}$ , is called the alternative hypothesis. It is the alternative hypothesis that one hopes to support.

The hypothesis of innocence is rejected only when an error is very unlikely, because one does not want to convict an innocent defendant. Such an error is called error of the first kind (i.e., the conviction of an innocent person), and the occurrence of this error is controlled to be rare. As a consequence of this asymmetric behaviour, an error of the second kind (acquitting a person who committed the crime), is more common.

More information H0 is true Truly not guilty, H1 is true Truly guilty ...

A criminal trial can be regarded as either or both of two decision processes: guilty vs not guilty or evidence vs a threshold ("beyond a reasonable doubt"). In one view, the defendant is judged; in the other view the performance of the prosecution (which bears the burden of proof) is judged. A hypothesis test can be regarded as either a judgment of a hypothesis or as a judgment of evidence.

Clairvoyant card game

A person (the subject) is tested for clairvoyance. They are shown the back face of a randomly chosen playing card 25 times and asked which of the four suits it belongs to. The number of hits, or correct answers, is called X.

As we try to find evidence of their clairvoyance, for the time being the null hypothesis is that the person is not clairvoyant.^[56] The alternative is: the person is (more or less) clairvoyant.

If the null hypothesis is valid, the only thing the test person can do is guess. For every card, the probability (relative frequency) of any single suit appearing is 1/4. If the alternative is valid, the test subject will predict the suit correctly with probability greater than 1/4. We will call the probability of guessing correctly p. The hypotheses, then, are:

null hypothesis ${\text{:}}\qquad H_{0}:p={\tfrac {1}{4}}$ (just guessing)

and

alternative hypothesis ${\text{:}}H_{1}:p>{\tfrac {1}{4}}$ (true clairvoyant).

When the test subject correctly predicts all 25 cards, we will consider them clairvoyant, and reject the null hypothesis. Thus also with 24 or 23 hits. With only 5 or 6 hits, on the other hand, there is no cause to consider them so. But what about 12 hits, or 17 hits? What is the critical number, c, of hits, at which point we consider the subject to be clairvoyant? How do we determine the critical value c? With the choice c=25 (i.e. we only accept clairvoyance when all cards are predicted correctly) we're more critical than with c=10. In the first case almost no test subjects will be recognized to be clairvoyant, in the second case, a certain number will pass the test. In practice, one decides how critical one will be. That is, one decides how often one accepts an error of the first kind – a false positive, or Type I error. With c = 25 the probability of such an error is:

P({\text{reject }}H_{0}\mid H_{0}{\text{ is valid}})=P\left(X=25\mid p={\frac {1}{4}}\right)=\left({\frac {1}{4}}\right)^{25}\approx 10^{-15}

,

and hence, very small. The probability of a false positive is the probability of randomly guessing correctly all 25 times.

Being less critical, with c = 10, gives:

P({\text{reject }}H_{0}\mid H_{0}{\text{ is valid}})=P\left(X\geq 10\mid p={\frac {1}{4}}\right)=\sum _{k=10}^{25}P\left(X=k\mid p={\frac {1}{4}}\right)=\sum _{k=10}^{25}{\binom {25}{k}}\left(1-{\frac {1}{4}}\right)^{25-k}\left({\frac {1}{4}}\right)^{k}\approx 0.0713

.

Thus, c = 10 yields a much greater probability of false positive.

Before the test is actually performed, the maximum acceptable probability of a Type I error (α) is determined. Typically, values in the range of 1% to 5% are selected. (If the maximum acceptable error rate is zero, an infinite number of correct guesses is required.) Depending on this Type 1 error rate, the critical value c is calculated. For example, if we select an error rate of 1%, c is calculated thus:

P({\text{reject }}H_{0}\mid H_{0}{\text{ is valid}})=P\left(X\geq c\mid p={\frac {1}{4}}\right)\leq 0.01

.

From all the numbers c, with this property, we choose the smallest, in order to minimize the probability of a Type II error, a false negative. For the above example, we select: $c=13$ .

References

[1]
Lewis, Nancy D.; Lewis, Nigel Da Costa; Lewis, N. D. (2013). 100 Statistical Tests in R: What to Choose, how to Easily Calculate, with Over 300 Illustrations and Examples. Heather Hills Press. ISBN 978-1-4840-5299-0.
[2]
Kanji, Gopal K. (18 July 2006). 100 Statistical Tests. SAGE. ISBN 978-1-4462-2250-8.
[3]
Bellhouse, P. (2001), "John Arbuthnot", in Statisticians of the Centuries by C.C. Heyde and E. Seneta, Springer, pp. 39–42, ISBN 978-0-387-95329-8
[4]
Meehl, P (1990). "Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles That Warrant It" (PDF). Psychological Inquiry. 1 (2): 108–141. doi:10.1207/s15327965pli0102_1.
[5]
Laplace, P. (1778). "Mémoire sur les probabilités" (PDF). Mémoires de l'Académie Royale des Sciences de Paris. 9: 227–332. Archived from the original (PDF) on April 27, 2015. Retrieved September 5, 2013.
[6]
Pearson, K (1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling" (PDF). The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 5 (50): 157–175. doi:10.1080/14786440009463897.
[7]
Pearson, K (1904). "On the Theory of Contingency and Its Relation to Association and Normal Correlation". Drapers' Company Research Memoirs Biometric Series. 1: 1–35.
[8]
Zabell, S (1989). "R. A. Fisher on the History of Inverse Probability". Statistical Science. 4 (3): 247–256. doi:10.1214/ss/1177012488. JSTOR 2245634.
[9]
Raymond Hubbard, M. J. Bayarri, P Values are not Error Probabilities Archived September 4, 2013, at the Wayback Machine. A working paper that explains the difference between Fisher's evidential p-value and the Neyman–Pearson Type I error rate $\alpha$ .
[10]
Fisher, R (1955). "Statistical Methods and Scientific Induction" (PDF). Journal of the Royal Statistical Society, Series B. 17 (1): 69–78.
[11]
Neyman, J; Pearson, E. S. (January 1, 1933). "On the Problem of the most Efficient Tests of Statistical Hypotheses". Philosophical Transactions of the Royal Society A. 231 (694–706): 289–337. Bibcode:1933RSPTA.231..289N. doi:10.1098/rsta.1933.0009.
[12]
Goodman, S N (June 15, 1999). "Toward evidence-based medical statistics. 1: The P Value Fallacy". Ann Intern Med. 130 (12): 995–1004. doi:10.7326/0003-4819-130-12-199906150-00008. PMID 10383371. S2CID 7534212.
[13]
Lehmann, E. L. (December 1993). "The Fisher, Neyman–Pearson Theories of Testing Hypotheses: One Theory or Two?". Journal of the American Statistical Association. 88 (424): 1242–1249. doi:10.1080/01621459.1993.10476404.
[14]
Fisher, R N (1958). "The Nature of Probability" (PDF). Centennial Review. 2: 261–274. We are quite in danger of sending highly trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be. In this century, of course, they will be working on guided missiles and advising the medical profession on the control of disease, and there is no limit to the extent to which they could impede every sort of national effort.
[15]
Lenhard, Johannes (2006). "Models and Statistical Inference: The Controversy between Fisher and Neyman–Pearson". Br. J. Philos. Sci. 57: 69–91. doi:10.1093/bjps/axi152. S2CID 14136146.
[16]
Neyman, Jerzy (1967). "RA Fisher (1890—1962): An Appreciation". Science. 156 (3781): 1456–1460. Bibcode:1967Sci...156.1456N. doi:10.1126/science.156.3781.1456. PMID 17741062. S2CID 44708120.
[17]
Losavich, J. L.; Neyman, J.; Scott, E. L.; Wells, M. A. (1971). "Hypothetical explanations of the negative apparent effects of cloud seeding in the Whitetop Experiment". Proceedings of the National Academy of Sciences of the United States of America. 68 (11): 2643–2646. Bibcode:1971PNAS...68.2643L. doi:10.1073/pnas.68.11.2643. PMC 389491. PMID 16591951.
[18]
Halpin, P F; Stam, HJ (Winter 2006). "Inductive Inference or Inductive Behavior: Fisher and Neyman: Pearson Approaches to Statistical Testing in Psychological Research (1940–1960)". The American Journal of Psychology. 119 (4): 625–653. doi:10.2307/20445367. JSTOR 20445367. PMID 17286092.
[19]
Gigerenzer, Gerd; Zeno Swijtink; Theodore Porter; Lorraine Daston; John Beatty; Lorenz Kruger (1989). "Part 3: The Inference Experts". The Empire of Chance: How Probability Changed Science and Everyday Life. Cambridge University Press. pp. 70–122. ISBN 978-0-521-39838-1.
[20]
Mayo, D. G.; Spanos, A. (2006). "Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction". The British Journal for the Philosophy of Science. 57 (2): 323–357. CiteSeerX 10.1.1.130.8131. doi:10.1093/bjps/axl003. S2CID 7176653.
[21]
Mathematics > High School: Statistics & Probability > Introduction Archived July 28, 2012, at archive.today Common Core State Standards Initiative (relates to USA students)
[22]
College Board Tests > AP: Subjects > Statistics The College Board (relates to USA students)
[23]
Huff, Darrell (1993). How to lie with statistics. New York: Norton. p. 8. ISBN 978-0-393-31072-6.'Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, "opinion" polls, the census. But without writers who use the words with honesty and readers who know what they mean, the result can only be semantic nonsense.'
[24]
Snedecor, George W.; Cochran, William G. (1967). Statistical Methods (6 ed.). Ames, Iowa: Iowa State University Press. p. 3. "...the basic ideas in statistics assist us in thinking clearly about the problem, provide some guidance about the conditions that must be satisfied if sound inferences are to be made, and enable us to detect many inferences that have no good logical foundation."
[25]
E. L. Lehmann (1997). "Testing Statistical Hypotheses: The Story of a Book". Statistical Science. 12 (1): 48–52. doi:10.1214/ss/1029963261.
[26]
Sotos, Ana Elisa Castro; Vanhoof, Stijn; Noortgate, Wim Van den; Onghena, Patrick (2007). "Students' Misconceptions of Statistical Inference: A Review of the Empirical Evidence from Research on Statistics Education" (PDF). Educational Research Review. 2 (2): 98–113. doi:10.1016/j.edurev.2007.04.001.
[27]
Moore, David S. (1997). "New Pedagogy and New Content: The Case of Statistics" (PDF). International Statistical Review. 65 (2): 123–165. doi:10.2307/1403333. JSTOR 1403333.
[28]
Hubbard, Raymond; Armstrong, J. Scott (2006). "Why We Don't Really Know What Statistical Significance Means: Implications for Educators". Journal of Marketing Education. 28 (2): 114–120. doi:10.1177/0273475306288399. hdl:2092/413. S2CID 34729227.
[29]
Sotos, Ana Elisa Castro; Vanhoof, Stijn; Noortgate, Wim Van den; Onghena, Patrick (2009). "How Confident Are Students in Their Misconceptions about Hypothesis Tests?". Journal of Statistics Education. 17 (2). doi:10.1080/10691898.2009.11889514.
[30]
Gigerenzer, G. (2004). "The Null Ritual What You Always Wanted to Know About Significant Testing but Were Afraid to Ask" (PDF). The SAGE Handbook of Quantitative Methodology for the Social Sciences. pp. 391–408. doi:10.4135/9781412986311. ISBN 9780761923596.
[31]
"Testing Statistical Hypotheses". Springer Texts in Statistics. 2005. doi:10.1007/0-387-27605-x. ISBN 978-0-387-98864-1. ISSN 1431-875X.
[32]
Hinkelmann, Klaus; Kempthorne, Oscar (2008). Design and Analysis of Experiments. Vol. I and II (Second ed.). Wiley. ISBN 978-0-470-38551-7.
[33]
Montgomery, Douglas (2009). Design and analysis of experiments. Hoboken, N.J.: Wiley. ISBN 978-0-470-12866-4.
[34]
R. A. Fisher (1925).Statistical Methods for Research Workers, Edinburgh: Oliver and Boyd, 1925, p.43.
[35]
Lehmann, E. L.; Romano, Joseph P. (2005). Testing Statistical Hypotheses (3E ed.). New York: Springer. ISBN 978-0-387-98864-1.
[36]
Nuzzo, Regina (2014). "Scientific method: Statistical errors". Nature. 506 (7487): 150–152. Bibcode:2014Natur.506..150N. doi:10.1038/506150a. PMID 24522584.
[37]
Richard J. Larsen; Donna Fox Stroup (1976). Statistics in the Real World: a book of examples. Macmillan. ISBN 978-0023677205.
[38]
Hubbard, R.; Parsa, A. R.; Luthy, M. R. (1997). "The Spread of Statistical Significance Testing in Psychology: The Case of the Journal of Applied Psychology". Theory and Psychology. 7 (4): 545–554. doi:10.1177/0959354397074006. S2CID 145576828.
[39]
Moore, David (2003). Introduction to the Practice of Statistics. New York: W.H. Freeman and Co. p. 426. ISBN 9780716796572.
[40]
Ranganathan, Priya; Pramesh, C. S; Buyse, Marc (April–June 2016). "Common pitfalls in statistical analysis: The perils of multiple testing". Perspect Clin Res. 7 (2): 106–107. doi:10.4103/2229-3485.179436. PMC 4840791. PMID 27141478.
[41]
Hughes, Ann J.; Grawoig, Dennis E. (1971). Statistics: A Foundation for Analysis. Reading, Mass.: Addison-Wesley. p. 191. ISBN 0-201-03021-7.
[42]
Hall, P. and Wilson, S.R., 1991. Two guidelines for bootstrap hypothesis testing. Biometrics, pp.757-762.
[43]
Tibshirani, R.J. and Efron, B., 1993. An introduction to the bootstrap. Monographs on statistics and applied probability, 57(1).
[44]
Martin, M.A., 2007. Bootstrap hypothesis testing for some common statistical problems: A critical evaluation of size and power properties. Computational Statistics & Data Analysis, 51(12), pp.6321-6342.
[45]
Horowitz, J.L., 2019. Bootstrap methods in econometrics. Annual Review of Economics, 11, pp.193-224. I'm
[46]
John Arbuthnot (1710). "An argument for Divine Providence, taken from the constant regularity observed in the births of both sexes" (PDF). Philosophical Transactions of the Royal Society of London. 27 (325–336): 186–190. doi:10.1098/rstl.1710.0011. S2CID 186209819.
[47]
Brian, Éric; Jaisson, Marie (2007). "Physico-Theology and Mathematics (1710–1794)". The Descent of Human Sex Ratio at Birth. Springer Science & Business Media. pp. 1–25. ISBN 978-1-4020-6036-6.
[48]
Conover, W.J. (1999), "Chapter 3.4: The Sign Test", Practical Nonparametric Statistics (Third ed.), Wiley, pp. 157–176, ISBN 978-0-471-16068-7
[49]
Sprent, P. (1989), Applied Nonparametric Statistical Methods (Second ed.), Chapman & Hall, ISBN 978-0-412-44980-2
[50]
Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press. pp. 225–226. ISBN 978-0-67440341-3.
[51]
Laplace, P. (1778). "Mémoire sur les probabilités (XIX, XX)". Oeuvres complètes de Laplace. Vol. 9. pp. 429–438. {{cite book}}: |journal= ignored (help)
[52]
Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty before 1900. Cambridge, Mass: Belknap Press of Harvard University Press. p. 134. ISBN 978-0-674-40340-6.
[53]
Fisher, Sir Ronald A. (1956) [1935]. "Mathematics of a Lady Tasting Tea". In James Roy Newman (ed.). The World of Mathematics, volume 3 [Design of Experiments]. Courier Dover Publications. ISBN 978-0-486-41151-4. Originally from Fisher's book Design of Experiments.
[54]
Box, Joan Fisher (1978). R.A. Fisher, The Life of a Scientist. New York: Wiley. p. 134. ISBN 978-0-471-09300-8.
[55]
C. S. Peirce (August 1878). "Illustrations of the Logic of Science VI: Deduction, Induction, and Hypothesis". Popular Science Monthly. 13. Retrieved March 30, 2012.
[56]
Jaynes, E. T. (2007). Probability theory : the logic of science (5. print. ed.). Cambridge [u.a.]: Cambridge Univ. Press. ISBN 978-0-521-59271-0.
[57]
Schervish, M (1996) Theory of Statistics, p. 218. Springer ISBN 0-387-94546-6
[58]
Kaye, David H.; Freedman, David A. (2011). "Reference Guide on Statistics". Reference Manual on Scientific Evidence (3rd ed.). Eagan, MN Washington, D.C: West National Academies Press. p. 259. ISBN 978-0-309-21421-6.
[59]
Ash, Robert (1970). Basic probability theory. New York: Wiley. ISBN 978-0471034506.Section 8.2
[60]
Tukey, John W. (1960). "Conclusions vs decisions". Technometrics. 26 (4): 423–433. doi:10.1080/00401706.1960.10489909. "Until we go through the accounts of testing hypotheses, separating [Neyman–Pearson] decision elements from [Fisher] conclusion elements, the intimate mixture of disparate elements will be a continual source of confusion." ... "There is a place for both "doing one's best" and "saying only what is certain," but it is important to know, in each instance, both which one is being done, and which one ought to be done."
[61]
Stigler, Stephen M. (August 1996). "The History of Statistics in 1933". Statistical Science. 11 (3): 244–252. doi:10.1214/ss/1032280216. JSTOR 2246117.
[62]
Berger, James O. (2003). "Could Fisher, Jeffreys and Neyman Have Agreed on Testing?". Statistical Science. 18 (1): 1–32. doi:10.1214/ss/1056397485.
[63]
Morrison, Denton; Henkel, Ramon, eds. (2006) [1970]. The Significance Test Controversy. Aldine Transaction. ISBN 978-0-202-30879-1.
[64]
Oakes, Michael (1986). Statistical Inference: A Commentary for the Social and Behavioural Sciences. Chichester New York: Wiley. ISBN 978-0471104438.
[65]
Chow, Siu L. (1997). Statistical Significance: Rationale, Validity and Utility. SAGE Publications. ISBN 978-0-7619-5205-3.
[66]
Harlow, Lisa Lavoie; Stanley A. Mulaik; James H. Steiger, eds. (1997). What If There Were No Significance Tests?. Lawrence Erlbaum Associates. ISBN 978-0-8058-2634-0.
[67]
Kline, Rex (2004). Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research. Washington, D.C.: American Psychological Association. ISBN 9781591471189.
[68]
McCloskey, Deirdre N.; Stephen T. Ziliak (2008). The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. University of Michigan Press. ISBN 978-0-472-05007-9.
[69]
Cornfield, Jerome (1976). "Recent Methodological Contributions to Clinical Trials" (PDF). American Journal of Epidemiology. 104 (4): 408–421. doi:10.1093/oxfordjournals.aje.a112313. PMID 788503.
[70]
Yates, Frank (1951). "The Influence of Statistical Methods for Research Workers on the Development of the Science of Statistics". Journal of the American Statistical Association. 46 (253): 19–34. doi:10.1080/01621459.1951.10500764. "The emphasis given to formal tests of significance throughout [R.A. Fisher's] Statistical Methods ... has caused scientific research workers to pay undue attention to the results of the tests of significance they perform on their data, particularly data derived from experiments, and too little to the estimates of the magnitude of the effects they are investigating." ... "The emphasis on tests of significance and the consideration of the results of each experiment in isolation, have had the unfortunate consequence that scientific workers have often regarded the execution of a test of significance on an experiment as the ultimate objective."
[71]
Begg, Colin B.; Berlin, Jesse A. (1988). "Publication bias: a problem in interpreting medical data". Journal of the Royal Statistical Society, Series A. 151 (3): 419–463. doi:10.2307/2982993. JSTOR 2982993. S2CID 121054702.
[72]
Meehl, Paul E. (1967). "Theory-Testing in Psychology and Physics: A Methodological Paradox" (PDF). Philosophy of Science. 34 (2): 103–115. doi:10.1086/288135. S2CID 96422880. Archived from the original (PDF) on December 3, 2013. Thirty years later, Meehl acknowledged statistical significance theory to be mathematically sound while continuing to question the default choice of null hypothesis, blaming instead the "social scientists' poor understanding of the logical relation between theory and fact" in "The Problem Is Epistemology, Not Statistics: Replace Significance Tests by Confidence Intervals and Quantify Accuracy of Risky Numerical Predictions" (Chapter 14 in Harlow (1997)).
[73]
Bakan, David (1966). "The test of significance in psychological research". Psychological Bulletin. 66 (6): 423–437. doi:10.1037/h0020412. PMID 5974619.
[74]
Gigerenzer, G (November 2004). "Mindless statistics". The Journal of Socio-Economics. 33 (5): 587–606. doi:10.1016/j.socec.2004.09.033.
[75]
Nunnally, Jum (1960). "The place of statistics in psychology". Educational and Psychological Measurement. 20 (4): 641–650. doi:10.1177/001316446002000401. S2CID 144813784.
[76]
Lykken, David T. (1991). "What's wrong with psychology, anyway?". Thinking Clearly About Psychology. 1: 3–39.
[77]
Jacob Cohen (December 1994). "The Earth Is Round (p < .05)". American Psychologist. 49 (12): 997–1003. doi:10.1037/0003-066X.49.12.997. S2CID 380942. This paper lead to the review of statistical practices by the APA. Cohen was a member of the Task Force that did the review.
[78]
Nickerson, Raymond S. (2000). "Null Hypothesis Significance Tests: A Review of an Old and Continuing Controversy". Psychological Methods. 5 (2): 241–301. doi:10.1037/1082-989X.5.2.241. PMID 10937333. S2CID 28340967.
[79]
Branch, Mark (2014). "Malignant side effects of null hypothesis significance testing". Theory & Psychology. 24 (2): 256–277. doi:10.1177/0959354314525282. S2CID 40712136.
[80]
Hunter, John E. (January 1997). "Needed: A Ban on the Significance Test". Psychological Science. 8 (1): 3–7. doi:10.1111/j.1467-9280.1997.tb00534.x. S2CID 145422959.
[81]
Wilkinson, Leland (1999). "Statistical Methods in Psychology Journals; Guidelines and Explanations". American Psychologist. 54 (8): 594–604. doi:10.1037/0003-066X.54.8.594. S2CID 428023. "Hypothesis tests. It is hard to imagine a situation in which a dichotomous accept-reject decision is better than reporting an actual p value or, better still, a confidence interval." (p 599). The committee used the cautionary term "forbearance" in describing its decision against a ban of hypothesis testing in psychology reporting. (p 603)
[82]
"ICMJE: Obligation to Publish Negative Studies". Archived from the original on July 16, 2012. Retrieved September 3, 2012. Editors should seriously consider for publication any carefully done study of an important question, relevant to their readers, whether the results for the primary or any additional outcome are statistically significant. Failure to submit or publish findings because of lack of statistical significance is an important cause of publication bias.
[83]
Journal of Articles in Support of the Null Hypothesis website: JASNH homepage. Volume 1 number 1 was published in 2002, and all articles are on psychology-related subjects.
[84]
Howell, David (2002). Statistical Methods for Psychology (5 ed.). Duxbury. p. 94. ISBN 978-0-534-37770-0.
[85]
Williams S, Carson R, Tóth K (October 10, 2023). "Moving beyond P values in The Journal of Physiology: A primer on the value of effect sizes and confidence intervals". J Physiol. 601 (23): 5131–5133. doi:10.1113/JP285575. PMID 37815959. S2CID 263827430.{{cite journal}}: CS1 maint: multiple names: authors list (link)
[86]
Kruschke, J K (July 9, 2012). "Bayesian Estimation Supersedes the T Test" (PDF). Journal of Experimental Psychology: General. 142 (2): 573–603. doi:10.1037/a0029146. PMID 22774788. S2CID 5610231.
[87]
Kruschke, J K (May 8, 2018). "Rejecting or Accepting Parameter Values in Bayesian Estimation" (PDF). Advances in Methods and Practices in Psychological Science. 1 (2): 270–280. doi:10.1177/2515245918771304. S2CID 125788648.
[88]
Armstrong, J. Scott (2007). "Significance tests harm progress in forecasting". International Journal of Forecasting. 23 (2): 321–327. CiteSeerX 10.1.1.343.9516. doi:10.1016/j.ijforecast.2007.03.004. S2CID 1550979.
[89]
Kass, R. E. (1993). Bayes factors and model uncertainty (PDF) (Report). Department of Statistics, University of Washington.
[90]
Rozeboom, William W (1960). "The fallacy of the null-hypothesis significance test" (PDF). Psychological Bulletin. 57 (5): 416–428. CiteSeerX 10.1.1.398.9002. doi:10.1037/h0042040. PMID 13744252. "...the proper application of statistics to scientific inference is irrevocably committed to extensive consideration of inverse [AKA Bayesian] probabilities..." It was acknowledged, with regret, that a priori probability distributions were available "only as a subjective feel, differing from one person to the next" "in the more immediate future, at least".
[91]
Berger, James (2006). "The Case for Objective Bayesian Analysis". Bayesian Analysis. 1 (3): 385–402. doi:10.1214/06-ba115. In listing the competing definitions of "objective" Bayesian analysis, "A major goal of statistics (indeed science) is to find a completely coherent objective Bayesian methodology for learning from data." The author expressed the view that this goal "is not attainable".
[92]
Aldrich, J (2008). "R. A. Fisher on Bayes and Bayes' theorem". Bayesian Analysis. 3 (1): 161–170. doi:10.1214/08-BA306.

Share this article:

This article uses material from the Wikipedia article Statistical_hypothesis_testing, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[1] [1]
Lewis, Nancy D.; Lewis, Nigel Da Costa; Lewis, N. D. (2013). 100 Statistical Tests in R: What to Choose, how to Easily Calculate, with Over 300 Illustrations and Examples. Heather Hills Press. ISBN 978-1-4840-5299-0.

[2] [2]
Kanji, Gopal K. (18 July 2006). 100 Statistical Tests. SAGE. ISBN 978-1-4462-2250-8.

[Bellhouse2001-3] [3]
Bellhouse, P. (2001), "John Arbuthnot", in Statisticians of the Centuries by C.C. Heyde and E. Seneta, Springer, pp. 39–42, ISBN 978-0-387-95329-8

[4] [4]
Meehl, P (1990). "Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles That Warrant It" (PDF). Psychological Inquiry. 1 (2): 108–141. doi:10.1207/s15327965pli0102_1.

[Laplace_1778-5] [5]
Laplace, P. (1778). "Mémoire sur les probabilités" (PDF). Mémoires de l'Académie Royale des Sciences de Paris. 9: 227–332. Archived from the original (PDF) on April 27, 2015. Retrieved September 5, 2013.

[Pearson_1900-6] [6]
Pearson, K (1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling" (PDF). The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 5 (50): 157–175. doi:10.1080/14786440009463897.

[Pearson_1904-7] [7]
Pearson, K (1904). "On the Theory of Contingency and Its Relation to Association and Normal Correlation". Drapers' Company Research Memoirs Biometric Series. 1: 1–35.

[8] [8]
Zabell, S (1989). "R. A. Fisher on the History of Inverse Probability". Statistical Science. 4 (3): 247–256. doi:10.1214/ss/1177012488. JSTOR 2245634.

[ftp.isds.duke-9] [9]
Raymond Hubbard, M. J. Bayarri, P Values are not Error Probabilities Archived September 4, 2013, at the Wayback Machine. A working paper that explains the difference between Fisher's evidential p-value and the Neyman–Pearson Type I error rate $\alpha$ .

[Fisher_1955_69–78-10] [10]
Fisher, R (1955). "Statistical Methods and Scientific Induction" (PDF). Journal of the Royal Statistical Society, Series B. 17 (1): 69–78.

[Neyman_289–337-11] [11]
Neyman, J; Pearson, E. S. (January 1, 1933). "On the Problem of the most Efficient Tests of Statistical Hypotheses". Philosophical Transactions of the Royal Society A. 231 (694–706): 289–337. Bibcode:1933RSPTA.231..289N. doi:10.1098/rsta.1933.0009.

[12] [12]
Goodman, S N (June 15, 1999). "Toward evidence-based medical statistics. 1: The P Value Fallacy". Ann Intern Med. 130 (12): 995–1004. doi:10.7326/0003-4819-130-12-199906150-00008. PMID 10383371. S2CID 7534212.

[Lehmann93-13] [13]
Lehmann, E. L. (December 1993). "The Fisher, Neyman–Pearson Theories of Testing Hypotheses: One Theory or Two?". Journal of the American Statistical Association. 88 (424): 1242–1249. doi:10.1080/01621459.1993.10476404.

[14] [14]
Fisher, R N (1958). "The Nature of Probability" (PDF). Centennial Review. 2: 261–274. We are quite in danger of sending highly trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be. In this century, of course, they will be working on guided missiles and advising the medical profession on the control of disease, and there is no limit to the extent to which they could impede every sort of national effort.

[Lenhard-15] [15]
Lenhard, Johannes (2006). "Models and Statistical Inference: The Controversy between Fisher and Neyman–Pearson". Br. J. Philos. Sci. 57: 69–91. doi:10.1093/bjps/axi152. S2CID 14136146.

[16] [16]
Neyman, Jerzy (1967). "RA Fisher (1890—1962): An Appreciation". Science. 156 (3781): 1456–1460. Bibcode:1967Sci...156.1456N. doi:10.1126/science.156.3781.1456. PMID 17741062. S2CID 44708120.

[17] [17]
Losavich, J. L.; Neyman, J.; Scott, E. L.; Wells, M. A. (1971). "Hypothetical explanations of the negative apparent effects of cloud seeding in the Whitetop Experiment". Proceedings of the National Academy of Sciences of the United States of America. 68 (11): 2643–2646. Bibcode:1971PNAS...68.2643L. doi:10.1073/pnas.68.11.2643. PMC 389491. PMID 16591951.

[Halpin_625–653-18] [18]
Halpin, P F; Stam, HJ (Winter 2006). "Inductive Inference or Inductive Behavior: Fisher and Neyman: Pearson Approaches to Statistical Testing in Psychological Research (1940–1960)". The American Journal of Psychology. 119 (4): 625–653. doi:10.2307/20445367. JSTOR 20445367. PMID 17286092.

[Gigerenzer-19] [19]
Gigerenzer, Gerd; Zeno Swijtink; Theodore Porter; Lorraine Daston; John Beatty; Lorenz Kruger (1989). "Part 3: The Inference Experts". The Empire of Chance: How Probability Changed Science and Everyday Life. Cambridge University Press. pp. 70–122. ISBN 978-0-521-39838-1.

[doi10.1093/bjps/axl003-20] [20]
Mayo, D. G.; Spanos, A. (2006). "Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction". The British Journal for the Philosophy of Science. 57 (2): 323–357. CiteSeerX 10.1.1.130.8131. doi:10.1093/bjps/axl003. S2CID 7176653.

[21] [21]
Mathematics > High School: Statistics & Probability > Introduction Archived July 28, 2012, at archive.today Common Core State Standards Initiative (relates to USA students)

[22] [22]
College Board Tests > AP: Subjects > Statistics The College Board (relates to USA students)

[Huff8-23] [23]
Huff, Darrell (1993). How to lie with statistics. New York: Norton. p. 8. ISBN 978-0-393-31072-6.'Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, "opinion" polls, the census. But without writers who use the words with honesty and readers who know what they mean, the result can only be semantic nonsense.'

[S&C-24] [24]
Snedecor, George W.; Cochran, William G. (1967). Statistical Methods (6 ed.). Ames, Iowa: Iowa State University Press. p. 3. "...the basic ideas in statistics assist us in thinking clearly about the problem, provide some guidance about the conditions that must be satisfied if sound inferences are to be made, and enable us to detect many inferences that have no good logical foundation."

[Lehmann97-25] [25]
E. L. Lehmann (1997). "Testing Statistical Hypotheses: The Story of a Book". Statistical Science. 12 (1): 48–52. doi:10.1214/ss/1029963261.

[26] [26]
Sotos, Ana Elisa Castro; Vanhoof, Stijn; Noortgate, Wim Van den; Onghena, Patrick (2007). "Students' Misconceptions of Statistical Inference: A Review of the Empirical Evidence from Research on Statistics Education" (PDF). Educational Research Review. 2 (2): 98–113. doi:10.1016/j.edurev.2007.04.001.

[27] [27]
Moore, David S. (1997). "New Pedagogy and New Content: The Case of Statistics" (PDF). International Statistical Review. 65 (2): 123–165. doi:10.2307/1403333. JSTOR 1403333.

[28] [28]
Hubbard, Raymond; Armstrong, J. Scott (2006). "Why We Don't Really Know What Statistical Significance Means: Implications for Educators". Journal of Marketing Education. 28 (2): 114–120. doi:10.1177/0273475306288399. hdl:2092/413. S2CID 34729227.

[29] [29]
Sotos, Ana Elisa Castro; Vanhoof, Stijn; Noortgate, Wim Van den; Onghena, Patrick (2009). "How Confident Are Students in Their Misconceptions about Hypothesis Tests?". Journal of Statistics Education. 17 (2). doi:10.1080/10691898.2009.11889514.

[Gigerenzer_2004_391–408-30] [30]
Gigerenzer, G. (2004). "The Null Ritual What You Always Wanted to Know About Significant Testing but Were Afraid to Ask" (PDF). The SAGE Handbook of Quantitative Methodology for the Social Sciences. pp. 391–408. doi:10.4135/9781412986311. ISBN 9780761923596.

[31] [31]
"Testing Statistical Hypotheses". Springer Texts in Statistics. 2005. doi:10.1007/0-387-27605-x. ISBN 978-0-387-98864-1. ISSN 1431-875X.

[32] [32]
Hinkelmann, Klaus; Kempthorne, Oscar (2008). Design and Analysis of Experiments. Vol. I and II (Second ed.). Wiley. ISBN 978-0-470-38551-7.

[33] [33]
Montgomery, Douglas (2009). Design and analysis of experiments. Hoboken, N.J.: Wiley. ISBN 978-0-470-12866-4.

[Fisher1925-34] [34]
R. A. Fisher (1925).Statistical Methods for Research Workers, Edinburgh: Oliver and Boyd, 1925, p.43.

[LR-35] [35]
Lehmann, E. L.; Romano, Joseph P. (2005). Testing Statistical Hypotheses (3E ed.). New York: Springer. ISBN 978-0-387-98864-1.

[36] [36]
Nuzzo, Regina (2014). "Scientific method: Statistical errors". Nature. 506 (7487): 150–152. Bibcode:2014Natur.506..150N. doi:10.1038/506150a. PMID 24522584.

[larsen-37] [37]
Richard J. Larsen; Donna Fox Stroup (1976). Statistics in the Real World: a book of examples. Macmillan. ISBN 978-0023677205.

[hubbard-38] [38]
Hubbard, R.; Parsa, A. R.; Luthy, M. R. (1997). "The Spread of Statistical Significance Testing in Psychology: The Case of the Journal of Applied Psychology". Theory and Psychology. 7 (4): 545–554. doi:10.1177/0959354397074006. S2CID 145576828.

[moore-39] [39]
Moore, David (2003). Introduction to the Practice of Statistics. New York: W.H. Freeman and Co. p. 426. ISBN 9780716796572.

[40] [40]
Ranganathan, Priya; Pramesh, C. S; Buyse, Marc (April–June 2016). "Common pitfalls in statistical analysis: The perils of multiple testing". Perspect Clin Res. 7 (2): 106–107. doi:10.4103/2229-3485.179436. PMC 4840791. PMID 27141478.

[41] [41]
Hughes, Ann J.; Grawoig, Dennis E. (1971). Statistics: A Foundation for Analysis. Reading, Mass.: Addison-Wesley. p. 191. ISBN 0-201-03021-7.

[42] [42]
Hall, P. and Wilson, S.R., 1991. Two guidelines for bootstrap hypothesis testing. Biometrics, pp.757-762.

[43] [43]
Tibshirani, R.J. and Efron, B., 1993. An introduction to the bootstrap. Monographs on statistics and applied probability, 57(1).

[44] [44]
Martin, M.A., 2007. Bootstrap hypothesis testing for some common statistical problems: A critical evaluation of size and power properties. Computational Statistics & Data Analysis, 51(12), pp.6321-6342.

[45] [45]
Horowitz, J.L., 2019. Bootstrap methods in econometrics. Annual Review of Economics, 11, pp.193-224. I'm

[46] [46]
John Arbuthnot (1710). "An argument for Divine Providence, taken from the constant regularity observed in the births of both sexes" (PDF). Philosophical Transactions of the Royal Society of London. 27 (325–336): 186–190. doi:10.1098/rstl.1710.0011. S2CID 186209819.

[47] [47]
Brian, Éric; Jaisson, Marie (2007). "Physico-Theology and Mathematics (1710–1794)". The Descent of Human Sex Ratio at Birth. Springer Science & Business Media. pp. 1–25. ISBN 978-1-4020-6036-6.

[Conover1999-48] [48]
Conover, W.J. (1999), "Chapter 3.4: The Sign Test", Practical Nonparametric Statistics (Third ed.), Wiley, pp. 157–176, ISBN 978-0-471-16068-7

[Sprent1989-49] [49]
Sprent, P. (1989), Applied Nonparametric Statistical Methods (Second ed.), Chapman & Hall, ISBN 978-0-412-44980-2

[50] [50]
Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press. pp. 225–226. ISBN 978-0-67440341-3.

[Laplace_1878-51] [51]
Laplace, P. (1778). "Mémoire sur les probabilités (XIX, XX)". Oeuvres complètes de Laplace. Vol. 9. pp. 429–438. {{cite book}}: |journal= ignored (help)

[52] [52]
Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty before 1900. Cambridge, Mass: Belknap Press of Harvard University Press. p. 134. ISBN 978-0-674-40340-6.

[fisher-53] [53]
Fisher, Sir Ronald A. (1956) [1935]. "Mathematics of a Lady Tasting Tea". In James Roy Newman (ed.). The World of Mathematics, volume 3 [Design of Experiments]. Courier Dover Publications. ISBN 978-0-486-41151-4. Originally from Fisher's book Design of Experiments.

[54] [54]
Box, Joan Fisher (1978). R.A. Fisher, The Life of a Scientist. New York: Wiley. p. 134. ISBN 978-0-471-09300-8.

[55] [55]
C. S. Peirce (August 1878). "Illustrations of the Logic of Science VI: Deduction, Induction, and Hypothesis". Popular Science Monthly. 13. Retrieved March 30, 2012.

[56] [56]
Jaynes, E. T. (2007). Probability theory : the logic of science (5. print. ed.). Cambridge [u.a.]: Cambridge Univ. Press. ISBN 978-0-521-59271-0.

[57] [57]
Schervish, M (1996) Theory of Statistics, p. 218. Springer ISBN 0-387-94546-6

[58] [58]
Kaye, David H.; Freedman, David A. (2011). "Reference Guide on Statistics". Reference Manual on Scientific Evidence (3rd ed.). Eagan, MN Washington, D.C: West National Academies Press. p. 259. ISBN 978-0-309-21421-6.

[Ash-59] [59]
Ash, Robert (1970). Basic probability theory. New York: Wiley. ISBN 978-0471034506.Section 8.2

[Tukey60-60] [60]
Tukey, John W. (1960). "Conclusions vs decisions". Technometrics. 26 (4): 423–433. doi:10.1080/00401706.1960.10489909. "Until we go through the accounts of testing hypotheses, separating [Neyman–Pearson] decision elements from [Fisher] conclusion elements, the intimate mixture of disparate elements will be a continual source of confusion." ... "There is a place for both "doing one's best" and "saying only what is certain," but it is important to know, in each instance, both which one is being done, and which one ought to be done."

[61] [61]
Stigler, Stephen M. (August 1996). "The History of Statistics in 1933". Statistical Science. 11 (3): 244–252. doi:10.1214/ss/1032280216. JSTOR 2246117.

[62] [62]
Berger, James O. (2003). "Could Fisher, Jeffreys and Neyman Have Agreed on Testing?". Statistical Science. 18 (1): 1–32. doi:10.1214/ss/1056397485.

[morrison-63] [63]
Morrison, Denton; Henkel, Ramon, eds. (2006) [1970]. The Significance Test Controversy. Aldine Transaction. ISBN 978-0-202-30879-1.

[64] [64]
Oakes, Michael (1986). Statistical Inference: A Commentary for the Social and Behavioural Sciences. Chichester New York: Wiley. ISBN 978-0471104438.

[chow-65] [65]
Chow, Siu L. (1997). Statistical Significance: Rationale, Validity and Utility. SAGE Publications. ISBN 978-0-7619-5205-3.

[harlow-66] [66]
Harlow, Lisa Lavoie; Stanley A. Mulaik; James H. Steiger, eds. (1997). What If There Were No Significance Tests?. Lawrence Erlbaum Associates. ISBN 978-0-8058-2634-0.

[kline-67] [67]
Kline, Rex (2004). Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research. Washington, D.C.: American Psychological Association. ISBN 9781591471189.

[mccloskey-68] [68]
McCloskey, Deirdre N.; Stephen T. Ziliak (2008). The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. University of Michigan Press. ISBN 978-0-472-05007-9.

[69] [69]
Cornfield, Jerome (1976). "Recent Methodological Contributions to Clinical Trials" (PDF). American Journal of Epidemiology. 104 (4): 408–421. doi:10.1093/oxfordjournals.aje.a112313. PMID 788503.

[70] [70]
Yates, Frank (1951). "The Influence of Statistical Methods for Research Workers on the Development of the Science of Statistics". Journal of the American Statistical Association. 46 (253): 19–34. doi:10.1080/01621459.1951.10500764. "The emphasis given to formal tests of significance throughout [R.A. Fisher's] Statistical Methods ... has caused scientific research workers to pay undue attention to the results of the tests of significance they perform on their data, particularly data derived from experiments, and too little to the estimates of the magnitude of the effects they are investigating." ... "The emphasis on tests of significance and the consideration of the results of each experiment in isolation, have had the unfortunate consequence that scientific workers have often regarded the execution of a test of significance on an experiment as the ultimate objective."

[71] [71]
Begg, Colin B.; Berlin, Jesse A. (1988). "Publication bias: a problem in interpreting medical data". Journal of the Royal Statistical Society, Series A. 151 (3): 419–463. doi:10.2307/2982993. JSTOR 2982993. S2CID 121054702.

[72] [72]
Meehl, Paul E. (1967). "Theory-Testing in Psychology and Physics: A Methodological Paradox" (PDF). Philosophy of Science. 34 (2): 103–115. doi:10.1086/288135. S2CID 96422880. Archived from the original (PDF) on December 3, 2013. Thirty years later, Meehl acknowledged statistical significance theory to be mathematically sound while continuing to question the default choice of null hypothesis, blaming instead the "social scientists' poor understanding of the logical relation between theory and fact" in "The Problem Is Epistemology, Not Statistics: Replace Significance Tests by Confidence Intervals and Quantify Accuracy of Risky Numerical Predictions" (Chapter 14 in Harlow (1997)).

[bakan66-73] [73]
Bakan, David (1966). "The test of significance in psychological research". Psychological Bulletin. 66 (6): 423–437. doi:10.1037/h0020412. PMID 5974619.

[Gigerenzer_587–606-74] [74]
Gigerenzer, G (November 2004). "Mindless statistics". The Journal of Socio-Economics. 33 (5): 587–606. doi:10.1016/j.socec.2004.09.033.

[75] [75]
Nunnally, Jum (1960). "The place of statistics in psychology". Educational and Psychological Measurement. 20 (4): 641–650. doi:10.1177/001316446002000401. S2CID 144813784.

[76] [76]
Lykken, David T. (1991). "What's wrong with psychology, anyway?". Thinking Clearly About Psychology. 1: 3–39.

[cohen94-77] [77]
Jacob Cohen (December 1994). "The Earth Is Round (p < .05)". American Psychologist. 49 (12): 997–1003. doi:10.1037/0003-066X.49.12.997. S2CID 380942. This paper lead to the review of statistical practices by the APA. Cohen was a member of the Task Force that did the review.

[nickerson-78] [78]
Nickerson, Raymond S. (2000). "Null Hypothesis Significance Tests: A Review of an Old and Continuing Controversy". Psychological Methods. 5 (2): 241–301. doi:10.1037/1082-989X.5.2.241. PMID 10937333. S2CID 28340967.

[branch-79] [79]
Branch, Mark (2014). "Malignant side effects of null hypothesis significance testing". Theory & Psychology. 24 (2): 256–277. doi:10.1177/0959354314525282. S2CID 40712136.

[80] [80]
Hunter, John E. (January 1997). "Needed: A Ban on the Significance Test". Psychological Science. 8 (1): 3–7. doi:10.1111/j.1467-9280.1997.tb00534.x. S2CID 145422959.

[wilkinson-81] [81]
Wilkinson, Leland (1999). "Statistical Methods in Psychology Journals; Guidelines and Explanations". American Psychologist. 54 (8): 594–604. doi:10.1037/0003-066X.54.8.594. S2CID 428023. "Hypothesis tests. It is hard to imagine a situation in which a dichotomous accept-reject decision is better than reporting an actual p value or, better still, a confidence interval." (p 599). The committee used the cautionary term "forbearance" in describing its decision against a ban of hypothesis testing in psychology reporting. (p 603)

[82] [82]
"ICMJE: Obligation to Publish Negative Studies". Archived from the original on July 16, 2012. Retrieved September 3, 2012. Editors should seriously consider for publication any carefully done study of an important question, relevant to their readers, whether the results for the primary or any additional outcome are statistically significant. Failure to submit or publish findings because of lack of statistical significance is an important cause of publication bias.

[JASNH-83] [83]
Journal of Articles in Support of the Null Hypothesis website: JASNH homepage. Volume 1 number 1 was published in 2002, and all articles are on psychology-related subjects.

[84] [84]
Howell, David (2002). Statistical Methods for Psychology (5 ed.). Duxbury. p. 94. ISBN 978-0-534-37770-0.

[WilliamsToth2023-85] [85]
Williams S, Carson R, Tóth K (October 10, 2023). "Moving beyond P values in The Journal of Physiology: A primer on the value of effect sizes and confidence intervals". J Physiol. 601 (23): 5131–5133. doi:10.1113/JP285575. PMID 37815959. S2CID 263827430.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Kruschke_2012-86] [86]
Kruschke, J K (July 9, 2012). "Bayesian Estimation Supersedes the T Test" (PDF). Journal of Experimental Psychology: General. 142 (2): 573–603. doi:10.1037/a0029146. PMID 22774788. S2CID 5610231.

[Kruschke_2018-87] [87]
Kruschke, J K (May 8, 2018). "Rejecting or Accepting Parameter Values in Bayesian Estimation" (PDF). Advances in Methods and Practices in Psychological Science. 1 (2): 270–280. doi:10.1177/2515245918771304. S2CID 125788648.

[Armstrong1-88] [88]
Armstrong, J. Scott (2007). "Significance tests harm progress in forecasting". International Journal of Forecasting. 23 (2): 321–327. CiteSeerX 10.1.1.343.9516. doi:10.1016/j.ijforecast.2007.03.004. S2CID 1550979.

[89] [89]
Kass, R. E. (1993). Bayes factors and model uncertainty (PDF) (Report). Department of Statistics, University of Washington.

[90] [90]
Rozeboom, William W (1960). "The fallacy of the null-hypothesis significance test" (PDF). Psychological Bulletin. 57 (5): 416–428. CiteSeerX 10.1.1.398.9002. doi:10.1037/h0042040. PMID 13744252. "...the proper application of statistics to scientific inference is irrevocably committed to extensive consideration of inverse [AKA Bayesian] probabilities..." It was acknowledged, with regret, that a priori probability distributions were available "only as a subjective feel, differing from one person to the next" "in the more immediate future, at least".

[91] [91]
Berger, James (2006). "The Case for Objective Bayesian Analysis". Bayesian Analysis. 1 (3): 385–402. doi:10.1214/06-ba115. In listing the competing definitions of "objective" Bayesian analysis, "A major goal of statistics (indeed science) is to find a completely coherent objective Bayesian methodology for learning from data." The author expressed the view that this goal "is not attainable".

[92] [92]
Aldrich, J (2008). "R. A. Fisher on Bayes and Bayes' theorem". Bayesian Analysis. 3 (1): 161–170. doi:10.1214/08-BA306.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

#	Fisher's null hypothesis testing	Neyman–Pearson decision theory
1	Set up a statistical null hypothesis. The null need not be a nil hypothesis (i.e., zero difference).	Set up two statistical hypotheses, H1 and H2, and decide about α, β, and sample size before the experiment, based on subjective cost-benefit considerations. These define a rejection region for each hypothesis.
2	Report the exact level of significance (e.g. p = 0.051 or p = 0.049). Do not use a conventional 5% level, and do not talk about accepting or rejecting hypotheses. If the result is "not significant", draw no conclusions and make no decisions, but suspend judgement until further data is available.	If the data falls into the rejection region of H1, accept H2; otherwise accept H1. Accepting a hypothesis does not mean that you believe in it, but only that you act as if it were true.
3	Use this procedure only if little is known about the problem at hand, and only to draw provisional conclusions in the context of an attempt to understand the experimental situation.	The usefulness of the procedure is limited among others to situations where you have a disjunction of hypotheses (e.g. either μ1 = 8 or μ2 = 10 is true) and where you can make meaningful cost-benefit trade-offs for choosing alpha and beta.

	H₀ is true Truly not guilty	H₁ is true Truly guilty
Do not reject the null hypothesis Acquittal	Right decision	Wrong decision Type II Error
Reject null hypothesis Conviction	Wrong decision Type I Error	Right decision

Statistical_hypothesis_testing

Statistical hypothesis test

History

Choice of null hypothesis

Modern origins and early controversy

Philosophy

Education

Performing a frequentist hypothesis test in practice

Practical example

Interpretation

Use and importance

Cautions

Definition of terms

Nonparametric bootstrap hypothesis testing

Examples

Human sex ratio

Lady tasting tea

Courtroom trial

Philosopher's beans

Clairvoyant card game

Variations and sub-classes

Neyman–Pearson hypothesis testing

Criticism

Alternatives

See also

References

Further reading

External links

Online calculators

Share this article: