The p value has evolved as the gold standard for evaluating much of the research evidence in communication sciences and disorders (CSD) and other disciplines. Is reliance on the p value the best approach for evaluating clinical significance, or might subjectivity have a role and value in clinical practice and scientific research? In this article, we review some of the challenges of the p value and suggest the benefits of incorporating Bayesian methods as an alternative.
Relevance of Bayesian Methods to Evidence-Based Practice (EBP) and Education of Graduate Students
A revolution in data analysis is emerging in many health-related fields. Bayesian methods are one new tool for enhancing hypothesis testing. The major advantage of Bayesian estimation methods over traditional statistics (i.e., p value) is that they allow for incorporating information vital to achieving the goals of EBP. Sackett, Rosenberg, Haynes, Richard, and Gray (1996) proposed three goals: (a) integration of clinical experience and expertise; (b) an understanding of client/patient and family values and preferences; and (c) the best research evidence available for decision making. Future applications of Bayesian methods may foster improvements in clinician–researcher collaboration to achieve ASHA’s goals for the Clinicians and Researchers Collaborating (CLARC)program. Researchers, academicians, and clinical educators who teach in the classroom can derive significant benefit from gaining familiarity about the appropriate use of these tools. Likewise, Bayesian estimation methods are useful for clinicians seeking evidence upon which to base their practice (Maxwell & Satake, 2006).
Challenges of the p < .05 Standard
The seminal groundwork for testing a null hypothesis (i.e., the hypothesis of no difference between conditions or treatment effects) was developed by British statisticians, most notably in a paper written by Ronald Fisher (1926). Fisher viewed p < .05 as a convenient cut-off value for evaluating the strength of evidence against the null hypothesis—not as a hard and fast rule. He implied that the subjective views of the researcher or other background information are important aspects to consider in decision making.
Reliance on the p < .05 standard for statistical significance testing and data interpretation has become a tradition in many scientific fields—a tradition that often overshadows the importance of research findings that fall short of an arbitrary standard for determining “significance” or lack thereof. Lamenting this fact, Rosnow and Rosenthal (1989) commented, “Surely God loves the .06 as much as the .05” (p. 1276). Nevertheless, studies reporting larger p values than .05 are commonly relegated to a file drawer where they might not be repeated, remain inaccessible for inclusion in systematic reviews of literature and meta-analysis, and thereby impede the growth of knowledge (Cohen, 1994; Gigerenzer, 2004).
Misconceptions of the p Value
Advocates of Bayesian methods believe that a p value does not reflect the probability of a sampling error in future experiments, the reproducibility of the data, the truth or falsity of the null or alternative hypothesis, the size of the treatment effect, or the practical/clinical significance of the results. Rather, the p value provides the probability that the results would be equal to or more extreme than the observed value (Wasserman & Lazar, 2016).
Perhaps the most serious fallacy of p value significance testing is that it yields information about the credibility of a hypothesis, which is in fact what a researcher wants to know. Yet, what the p value only expresses is the probability of the data given the truth of the null hypothesis. Bayesian statistics, however, yield information about the probability of the hypothesis given the data.
Bayesian statistics mathematically combine (a) preexisting beliefs, expressed as hypotheses, with (b) information obtained from actual sample data from an experiment to update beliefs as to their plausibility (Kline, 2005). Although Bayesian methods incorporate the researcher’s personal beliefs, it can be argued that, by explicitly stating prior assumptions, the hypotheses allow for greater transparency and openness to criticism than do traditional statistical methods (Sprenger, 2015). Although such beliefs initially might be subjective in nature, Bayesian methods in practice ultimately express prior probabilities for hypotheses in objective mathematical terms based on rational criteria. By doing so, the hypotheses formulated by Bayesian methods are more intuitive than those generated by the more mechanistic approaches of null hypothesis significance testing and, ultimately, have greater relevance to EBP. Thus, although Bayesian researchers embrace the use of prior information—such as that found in preexisting records, research articles, and data provided by clients, colleagues, or pilot investigations—this information is not based on guesses, whims, or prejudice “plucked out of the air” (Mathews, 2000).
The Bayes Theorem
The Bayes theorem, as it has come to be known, can be expressed mathematically as follows:
P(H|D) = P(H) • P(D|H)
In Equation 1, P(H|D) is termed the posterior probability of the data, given the hypothesis. P(H) is the probability of the hypothesis expressed by the researcher before the data are collected (termed the prior probability). P(D|H) is termed the conditional probability on the basis of the sample data, and P(D) is the probability of the data independent of the truth or falsity of the hypothesis P(H).
The posterior probability, thus generated, can serve as a new prior probability in subsequent experiments, thereby increasing accuracy in deriving conclusions.
ASHA requires that undergraduate and graduate students complete at least one introductory course in statistics; PhD students typically complete a multicourse sequence tailored to their knowledge level and needs. Academic and clinical education programs in CSD emphasize the importance of EBP. A broad statistical foundation prepares future clinicians to base their decision making on current and emerging clinical research analysis approaches. From our perspective, new and intuitive methods such as Bayesian estimation can be taught alongside traditional introductory statistics. Infusing the Bayesian approach is an opportunity for CSD faculty to expand their framework of statistical analysis and advocate for student exposure to Bayesian methods as a part of their graduate school education.
Cohen, J. T. (1964). The earth is round (p <.05). American Psychologist, 49, 997–1003.
Fisher, R. A. (1956). Statistical methods and scientific inference. Edinburgh, Scotland: Oliver & Boyd.
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587–606.
Kline, R. B. (2005). Beyond statistical significance testing. Washington, DC: American Psychological Association.
Matthews, R. A. J. (2000). Facts versus factions: The use and abuse of subjectivity in scientific research. In J. Morris (Ed.), Rethinking risk and the precautionary principle (pp. 247–282). Oxford, England: Butterworth Heniemann.
Maxwell, D. L., & Satake, E. (2006). Research and statistical methods in communication sciences and disorders. Clifton Park, NY: Thomson/Delmar Learning.
Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276–1284.
Sackett, D. L., Rosenberg, W. M., Haynes, R. B., Richard, W. S., & Gray, J. A. (1996, January 13). Evidence-based medicine: What it is and what it isn’t. BMJ, 312(7023), 71–72.
Sprenger, J. (2015). The objectivity of subjective Bayesian inference [PhilSci-Archive article]. Pittsburgh, PA: PhilSci-Archive. Retrieved from http://philsci-archive.pitt.edu/11936/.
Wasserstein, R. L. & Lazar, N. A. (2016). The ASA’s statement on p-values: Content, process, and purpose. The American Statistician, 70, 129–133. doi:10.1080/00031305.2016.1154108.