Author: Paul D. Ellis
ISBN-10: 0521142466
ISBN-13: 978-0521142465
APA Style Citation
Ellis, D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press.
Buy This Book
https://www.amazon.com/Essential-Guide-Effect-Sizes-Interpretation/dp/0521142466

effect_size_activity.pdf |
What is an effect? Effects are everywhere—they result from treatments, decisions, accidents, inventions, elections, outbreaks, performances, etc. Researchers measure the size of effects, with statistical significance indicating the likelihood that results occurred by chance, and practical significance focusing on their meaning. Social sciences increasingly emphasize the need to report effect sizes alongside statistical significance to reduce bias and move beyond relying solely on p-values. This book explores three interconnected activities: interpreting effect sizes, analyzing statistical power, and conducting meta-analyses, which together form the foundation for robust research.
Part 1: Effect Size
Psychologists must address the “so what?” question by emphasizing the practical significance of their studies. A statistically significant result is unlikely due to chance, but practical significance reflects real-world impact. Researchers must communicate findings not only to peers but also to the public. Effect sizes, which measure the impact of treatments or the relationship between variables, are essential for interpreting study results, yet many researchers fail to report them. Effect sizes fall into two main categories: the d-family (differences between groups, such as Cohen’s d) and the r-family (measures of association, like correlation coefficients). Both are standardized metrics that can be calculated using tools like SPSS. When reporting effect sizes, researchers should specify the measure used, quantify precision with confidence intervals, and present results in clear, jargon-free language.
Even when effect sizes and confidence intervals are reported, they are often left uninterpreted, raising questions like “How big is big?” or “Is the effect meaningful?” Non-arbitrary reference points are essential for assessing practical significance, guided by the three C’s of interpretation: context, contribution to knowledge, and Cohen’s criteria. Small effects can be meaningful in the right context if they trigger larger consequences, alter probabilities of significant outcomes, accumulate into bigger impacts, or lead to technological breakthroughs or new insights. Interpreting contributions to knowledge requires more than comparing study results; researchers must also consider alternative explanations. Jacob Cohen’s 1988 criteria for small, medium, and large effect sizes offer a logical foundation and a starting point for resolving disputes about significance. While Cohen’s “t-shirt size” classifications are easy to understand and widely used, they remain controversial, with critics arguing against rigidly categorizing effects as small, medium, or large.
Part 2: Power Analysis
In any study, the null hypothesis assumes no effect (effect size = 0), while the alternative hypothesis assumes an effect (effect size ≠ 0). Statistical tests calculate the p-value, the probability of observing the result if the null hypothesis were true. A low p-value indicates statistical significance, allowing researchers to reject the null. Errors can occur: a Type 1 error (false positive) happens when researchers detect an effect that doesn’t exist, while a Type 2 error (false negative) occurs when they miss a real effect. Type 1 errors (α) and Type 2 errors (β) are inversely related; reducing one increases the other. Statistical power, the probability of detecting a true effect, depends on effect size, sample size, alpha significance criterion (α) level, and statistical power, with Cohen recommending a power level of 0.80. Underpowered studies risk missing meaningful effects, while overpowered studies may waste resources or highlight trivial findings. Power analysis, often done during study planning, helps determine the minimum sample size needed to detect anticipated effects. Researchers estimate effect sizes using prior studies, meta-analyses, pretests, or theory, aiming for conservative estimates to ensure adequate power. Tools like online calculators simplify these calculations, which are crucial for designing efficient and meaningful research.
Power analyses can be conducted for individual studies or groups of studies with a common theme or journal. In the 1960s, Jacob Cohen analyzed the statistical power of research published in the Journal of Abnormal and Social Psychology and found it lacking—a trend later confirmed across other fields. Published research is often underpowered, and the multiplicity problem arises when multiple statistical tests increase the likelihood of false positives. The family-wise error rate becomes relevant when multiple tests are run on the same data, as even low-powered studies can yield statistically significant results if enough tests are conducted. This can lead to practices like “fishing” for publishable results or HARKing (hypothesizing after results are known). To improve statistical power, researchers can focus on larger effects, increase sample sizes, use more sensitive measures, choose appropriate tests, or relax the alpha significance criterion.
Part 3: Meta-Analysis
Single studies rarely resolve inconsistencies in social science research, especially in the absence of large-scale randomized controlled trials. Progress often comes from combining results from many smaller studies. A qualitative approach, or narrative review, documents the story of a research theme, while the quantitative approach, meta-analysis, focuses on observed effects rather than others’ conclusions. Meta-analysis combines these effects into an average effect size to assess the overall direction and magnitude of real-world impacts. By statistically analyzing statistical analyses, meta-analysis systematically reviews research on a specific effect, weighting individual effect sizes by their precision to calculate a weighted mean effect size. This provides a more accurate estimate of the population effect size than any single study. Though designed to be objective, transparent, and disciplined, meta-analysis can still be undermined by biases, leading to precise but flawed conclusions. Each step in the process must be recorded, justified, and open to scrutiny, with the process generally broken into six key steps. See the classroom activity for details.
Large-scale randomized controlled trials are the gold standard for estimating effect sizes, but due to their cost and time requirements, research often starts with small-scale studies. When large trials follow a meta-analysis, comparisons can reveal inconsistencies, as meta-analyses may produce misleading conclusions. Bias in meta-analyses can arise from excluding relevant research, including bad studies, using inappropriate statistical models, or running underpowered analyses. The first three lead to inflated effect size estimates and increased Type I errors, while the fourth results in imprecise estimates and higher Type II errors. Excluding relevant research causes availability bias, and reporting bias occurs when only significant results are published. Studies with non-significant findings are often rejected, contributing to the "file drawer problem," which inflates mean estimates or increases Type I errors. P-values reflect sample size as much as effect size, meaning small samples can miss important effects. Non-significant results are inconclusive, indicating either no effect or insufficient power to detect one. Excluding non-English studies introduces bias. Discriminating studies based on quality also risks bias, scientific censorship, dismissal of valuable evidence, and overlooks differences in quality that can be controlled statistically.
Overall, this book provides information to help students evaluate psychological research. It explains the importance of effect sizes for understanding real-world significance and statistical power for designing studies that produce reliable results.
It includes a detailed discussion of meta-analysis, a method used to find broader patterns and trends in research while showing students how to recognize and avoid potential biases.
Other Related Resources
Author's Website- Check out FAQs
https://effectsizefaq.com/about/
Psychological Concepts and Figures
Alternative hypothesis
Bias
Confidence intervals
Effect size
Generalize
HARKing
Meta-analysis
Null hypothesis
Qualitative
Quantitative
Replication
Sample size
Standard deviation
Statistical significance
Type I error
Type II error
Correlation coefficient