Your new experience awaits. Try the new design now and help us make it even better

BRIEF RESEARCH REPORT article

Front. Psychol.

Sec. Quantitative Psychology and Measurement

Volume 16 - 2025 | doi: 10.3389/fpsyg.2025.1592658

Clarifying the reliability paradox: poor measurement reliability attenuates group differences

Provisionally accepted
  • Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada

The final, formatted version of the article will be published soon.

Cognitive sciences are grappling with the reliability paradox: measures that robustly produce within-group effects tend to have low test-retest reliability, rendering them unsuitable for studying individual differences. Despite the growing awareness of this paradox, its full extent remains underappreciated. Specifically, most research focuses exclusively on how reliability affects correlational analyses of individual differences, while largely ignoring its effects on studying group differences. Moreover, some studies explicitly and erroneously suggest that poor reliability does not pose problems for studying group differences, possibly due to conflating within-and between-group effects. In this brief report, we aim to clarify this misunderstanding. Using both data simulations and mathematical derivations, we show how observed group differences get attenuated by measurement reliability. We consider multiple scenarios, including when groups are created based on thresholding a continuous measure (e.g., patients vs. controls or median split), when groups are defined exogenously (e.g., treatment vs. control groups, or male vs. female), and how the observed effect sizes are further affected by differences in measurement reliability and between-subject variance between the groups. We provide a set of equations for calculating attenuation effects across these scenarios. This has important implications for biomarker research and clinical translation, as well as any other area of research that relies on group comparisons to inform policy and real-world applications.

Keywords: reliability paradox, test-retest reliability, individual differences, Group differences, Group effects

Received: 12 Mar 2025; Accepted: 05 Sep 2025.

Copyright: © 2025 Karvelis and Diaconescu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Povilas Karvelis, Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.