Mitigating the epidemic of type I error: ecology and evolution can learn from other disciplines

Parker, Timothy H.; Nakagawa, Shinichi

doi:10.3389/fevo.2014.00076

OPINION article

Front. Ecol. Evol., 25 November 2014

Sec. Behavioral and Evolutionary Ecology

Volume 2 - 2014 | https://doi.org/10.3389/fevo.2014.00076

Mitigating the epidemic of type I error: ecology and evolution can learn from other disciplines

Timothy H. Parker ¹^*

Shinichi Nakagawa ²

1. Biology Department, Whitman College Walla Walla, WA, USA
2. National Centre for Growth and Development, Department of Zoology, University of Otago Dunedin, New Zealand

Article metrics

View details

Citations

7,3k

Views

1,7k

Downloads

Section

In probabilistic disciplines from psychology to cancer biology and behavioral ecology, a disturbing quantity of empirically derived understanding has been challenged and found wanting (Begley and Ellis, 2012; Carpenter, 2012; Parker, 2013). Recently, it was reported that 47 of 53 “landmark” cancer studies from the past decade could not be reproduced (Begley and Ellis, 2012). Ongoing attempts to replicate results in psychology (Carpenter, 2012) have found that substantial portions do not stand subsequent tests (Reproducibility Project: https://osf.io/ezcuj/). Although some well-publicized cases of data fabrication have plagued that field recently (Vogel, 2011), much of the lack of repeatability is expected to result from less nefarious forms of bias (Ioannidis, 2005). Closer to home, a recent meta-analysis of studies of plumage color in a European songbird has substantially clouded what had been hailed as a model for the understanding of plumage color and sexual selection (Parker, 2013). The crux of the problem is that the published literature, especially in highly probabilistic systems, suffers from inflated type I error (false positive) rates, and careful replication is too rare to reliably separate the robust results from those resulting from error (Ioannidis, 2005; Parker, 2013). Thus, many published results are incorrect, and these results are too rarely discredited. Concerns about problems of empirical error are receiving attention from prestigious journals (e.g., Nature; Nuzzo, 2014), and in the popular press (e.g., Lehrer, 2010; Anonymous, 2013) they have stimulated a discourse that may be eroding public confidence in science.

Strategies to reduce the problems of inflated error and infrequent replication are emerging in psychology, neuroscience, and medicine (Baker, 2012; Carpenter, 2012). High rates of type I error and low rates of replication may appear to result primarily from the decisions of individual researchers. These researchers are, however, responding to institutional incentive structures. For instance, funding bodies support novel projects to the exclusion of replications, and high impact journals also place a premium on novelty (Palmer, 2000; Kelly, 2006). As another example, most journals select articles based on study outcome rather than just soundness of hypothesis, predictions, and methods (Chambers, 2013). Thus, researchers often choose to report the most interesting subsets of results or pursue other forms of biased reporting rather than reporting the entire set of outcomes (John et al., 2012). Institutions also promote bias, and possibly even academic dishonesty, by basing professional evaluation and remuneration on number of publications and the stature of the journals in which they are published (Qiu, 2010; John et al., 2012). Thus, effective strategies will come from changes in the institutions that influence our research practices, such as professional societies (including journals) and funding agencies (Parker, 2013). It is precisely at this institutional level that psychology and medicine are tackling the challenge of reducing bias and increasing replication. Initiatives in these other disciplines are not necessarily templates directly transferable to ecology and evolution. Yet, such examples should serve to stimulate discussion and they clearly demonstrate that redesigning incentive structures is possible.

Reducing incomplete and biased reporting of results may be accomplished by encouraging or requiring registration of studies at their initiation (Schooler, 2011). Since 2000, the US government has provided a registry for clinical trials of medical interventions (ClinicalTrials.gov). Registration prior to initiation is a requirement of many funding agencies and medical journals, and thus has become “standard practice” (Huser and Cimino, 2013). Although results from approximately half of registered trials end up unpublished, about a third of the unpublished studies post some results in the registry (Ross et al., 2009). Further, the registry facilitates a more precise estimate of reporting bias, and provides contact information for researchers with unpublished work. Thus, the bias in available results has dropped along with our ignorance of this bias. These are highly desirable outcomes.

A conceptually similar idea is the “registered report” initiated by the neuroscience journal Cortex in 2013 (Chambers, 2013). To publish in the registered report section of the journal, researchers submit a study plan for peer review and conditional acceptance prior to gathering data (http://www.elsevier.com/journals/cortex/0010-9452/guide-for-authors). This counteracts several forms of publication bias, including editors' preferences for statistically significant or novel outcomes, and the tendency of researchers to selectively report the more interesting facets of their results (Chambers, 2013). This option for publication remains rare, but if widely adopted it could serve as an important tool for reducing bias.

A prestigious journal in psychology has taken an alternate approach to reducing incomplete and biased reporting. Following the suggestions of Simmons et al. (2011) in their prominent paper on inflated false positive rates, Psychological Science, as of 2014, requires authors to confirm that they are reporting on their full data set, including “all independent variables or manipulations” and “all dependent variables or measures,” and “how sample size was determined.” This requirement rests on the assumption that many researchers who might otherwise be willing to report a biased subset of their results would not willingly make false statements, either because of the clear moral implication or because of the risk to one's career (Simmons et al., 2011). Although it is too early to determine success, employing such statements is a promising strategy for reducing reporting bias.

Bias in reporting is clearly problematic, but equally problematic is the pervasive lack of sufficient replication to identify robust patterns (Palmer, 2000; Kelly, 2006). The Reproducibility Initiative is a private organization that facilitates and incentivizes replication (Baker, 2012). Researchers can submit an experiment and the Reproducibility Initiative locates an appropriate lab, anonymous to the original researchers, to conduct the replication. The researchers pay for this service, but if the original results are reproduced, their work can carry an “independently validated” badge (http://reproducibilityinitiative.org). The open access journal PLOS ONE has joined the initiative with a pledge to publish replications (Baker, 2012). Further, at least some replication will be funded independently. In 2013, the Reproducibility Initiative received a 1.3 million dollar grant from the Center for Open Science to replicate a series of high profile cancer studies (http://centerforopenscience.org/pr/2013-10-16/). It is not yet clear whether a certificate of independent validation will serve as a sufficient incentive to promote widespread replication, but at least for researchers with substantial financial stakes in getting their research right, the appeal of independent validation is strong (Phillips, 2012).

Some of the most extensive replication efforts are currently underway in psychology, also partly funded by the Center for Open Science. The Reproducibility Project: Psychology (distinct from the Reproducibility Initiative described above) currently involves over 150 researchers volunteering to replicate studies published in 2008 in three well-respected psychology journals (https://osf.io/ezcuj/wiki/home/). The failure to replicate a number of published results justifies this ongoing effort to increase study replication, but more important, the open and collaborative model pursued in these replications serve as a potential model for pursuing replication more widely. In a related project from the Center for Open Science, an entire issue of Social Psychology earlier this year was devoted to reporting replications of important studies (Nosek and Lakens, 2014) dating back as far as the 1930's (Klein et al., 2014). Some of the original studies were supported and some were not, and others appeared more complex than previously realized (Nosek and Lakens, 2014). As proposed more than a decade ago (Palmer, 2000), funding replications and allocating journal space to publishing them appears to increase their frequency. In the case of psychology, the Center for Open Science's strong and multi-faceted institutional support for replication has clearly also been important.

Other proposals that may reduce biased reporting and increase replication abound. For instance, major funding agencies could devote a portion of their budgets to support worthy replications (Palmer, 2000) or could preferentially fund proposals that rest on better-replicated foundations (Parker, 2013). Simply ensuring that authors report sufficient methodological and statistical details (Nakagawa and Cuthill, 2007) is a useful step. To this end, standard guidelines are gaining support and endorsements in (bio-)medical sciences (e.g., ARRIVE—Animal Research: Reporting of In Vivo Experiments; Kilkenny et al., 2010) and even ecology (Hillebrand and Gurevitch, 2013). Such guidelines are inspired by various motives, but their common thread is that they should reduce selective reporting and facilitate replication. Providing publishing outlets that evaluate research based on the quality of the methods and inferences rather than on the appeal of the outcome should also help, but such journals may be of most use when complemented by incentives to publish negative results (http://www.scilogs.com/communication_breakdown/negative-results-plos-one/).

Unfortunately, we lack model strategies for reducing the effects of some important negative institutional incentives. For instance, we know of no movements to counteract the growing trend for universities, research institutes, and funding agencies to evaluate researchers based on number of publications or impact factors of the journals in which they publish (Qiu, 2010). Given that this trend tends not to originate in or to be controlled by decisions at the level of the discipline, it may be more difficult to counteract. Widespread grassroots opposition to these evaluation methods could lead to advocacy by influential people and institutions, and thus ultimately to a reduction in the practice of evaluating researchers in this simplistic manner. Certainly without a public discussion of the perverse incentives imposed by these evaluation methods, they seem unlikely to change.

Where do the fields of evolution and ecology stand? Although the published discussion of the problem of biased reporting and poor replication (Palmer, 2000; Kelly, 2006; Nakagawa and Cuthill, 2007; Forstmeier and Schielzeth, 2011; Parker, 2013) is not new, and our conversations with colleagues suggest aspects of these problems are relatively widely recognized, little has yet been done. One exception is that in 2011, four prominent journals in our fields began requiring authors to deposit their raw data in publically accessible databases (e.g., Whitlock et al., 2010), and more journals are joining this movement. Unfortunately, data archiving is expected to go only a small way toward addressing biased reporting, not only because thorough re-analyses of such data sets will be time consuming and thus probably rare, but also because authors can still readily publish (and post raw data from) a biased subset of their work (Simmons et al., 2011). Further, data archiving does not create incentives to replicate important findings. Although data archiving itself has a number of issues and has been controversial (Roche et al., 2014), its adoption demonstrates that evolutionary biologists and ecologists and the institutions they constitute can accept and promote substantial changes in the way research is conducted and published. This is a hopeful sign.

Which of the strategies discussed above, if any, are right for evolution and ecology? Unlike psychology and medicine, evolution and ecology consider the entire spectrum of organisms and living systems. Clearly, such breadth of study subject raises distinct challenges. For instance, a field biologist cannot simply arrange for a laboratory-for-hire to replicate her/his experiments. Yet, this difficulty does not mean that we should give up on replicating important studies (Kelly, 2006). Instead, we need to gather our collective experiences and insights and develop plans suitable for our own disciplines and sub-disciplines. We may find that some proposals, such as the development of voluntary hypothesis testing registries (Schooler, 2011), guidelines for improved statistical reporting (Hillebrand and Gurevitch, 2013), or devoting sections of journals to replication (Palmer, 2000) would face relatively few practical obstacles to implementation in evolution and ecology. Indeed, we expect that more ideas well-tailored to our disciplines will emerge from an open and active discussion. If we ignore these issues of biased reporting and a lack of replication and continue as we have, we do so at our peril. Other disciplines have responded to the crisis with bold steps. Let's figure out the ways forward for evolution and ecology.

Statements

Author contributions

Timothy H. Parker conceived of this paper and developed it in consultation with Shinichi Nakagawa. Timothy H. Parker drafted and revised the original manuscript and Shinichi Nakagawa contributed important additional content and provided editorial suggestions.

Acknowledgments

We are grateful for the thoughtful comments of anonymous reviewers and to our colleagues for stimulating conversations on this topic.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

1
Anonymous. (2013), October 19). Trouble at the lab. The Economist.
- Google Scholar
2
BakerA. (2012), August 14). Independent labs to verify high-profile papers. Nature News.
- Google Scholar
3
BegleyC. G.EllisL. M. (2012). Raise standards for preclinical cancer research. Nature483, 531–533. 10.1038/483531a
4
CarpenterS. (2012). Psychology's bold initiative. Science335, 1558–1561. 10.1126/science.335.6076.1558
5
ChambersC. D. (2013). Registered reports: a new publishing initiative at cortex. Cortex49, 609–610. 10.1016/j.cortex.2012.12.016
6
ForstmeierW.SchielzethH. (2011). Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse. Behav. Ecol. Sociobiol. 65, 47–55. 10.1007/s00265-010-1038-5
7
HillebrandH.GurevitchJ. (2013). Reporting standards in experimental studies. Ecol. Lett. 16, 1419–1420. 10.1111/ele.12190
8
HuserV.CiminoJ. J. (2013). Linking ClinicalTrials.gov and PubMed to track results of interventional human clinical trials. PLoS ONE8:e68409. 10.1371/journal.pone.0068409
9
IoannidisJ. P. A. (2005). Why most published research findings are false. PLoS Med. 2:124. 10.1371/journal.pmed.0020124
10
JohnL. K.LoewensteinG.PrelecD. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23, 524–532. 10.1177/0956797611430953
11
KellyC. D. (2006). Replicating empirical research in behavioral ecology: how and why it should be done but rarely ever is. Q. Rev. Biol. 81, 221–236. 10.1086/506236
12
KilkennyC.BrowneW. J.CuthillI. C.EmersonM.AltmanD. G. (2010). Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research. PLoS Biol8:e1000412. 10.1371/journal.pbio.1000412
13
KleinR. A.RatliffK. A.VianelloM.AdamsR. B.Jr.BahníkŠ.BernsteinM. J.et al. (2014). Investigating variation in replicability. Soc. Psychol. 45, 142–152. 10.1027/1864-9335/a000178
14
LehrerJ. (2010), December 13). The truth wears off. The New Yorker, pp. 52–57.
- Google Scholar
15
NakagawaS.CuthillI. C. (2007). Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev. 82, 591–605. 10.1111/j.1469-185X.2007.00027.x
16
NosekB. A.LakensD. (2014). Registered reports. Soc. Psychol. 45, 137–141. 10.1027/1864-9335/a000192
- CrossRef
- Google Scholar
17
NuzzoR. (2014). Scientific method: statistical errors. Nature506, 150–152. 10.1038/506150a
18
PalmerA. R. (2000). Quasireplication and the contract of error: lessons from sex ratios, heritabilities and fluctuating asymmetry. Annu. Rev. Ecol. Syst. 31, 441–480. 10.1146/annurev.ecolsys.31.1.441
- CrossRef
- Google Scholar
19
ParkerT. H. (2013). What do we really know about the signalling role of plumage colour in blue tits? A case study of impediments to progress in evolutionary biology. Biol. Rev. 88, 511–536. 10.1111/brv.12013
20
PhillipsM. L. (2012), August 21). Initiative tackles scientific study validation. BioTechniques: News
- Google Scholar
21
QiuJ. (2010). Publish or perish in China. Nature463, 142–143. 10.1038/463142a
- CrossRef
- Google Scholar
22
RocheD. G.LanfearR.BinningS. A.HaffT. M.SchwanzL. E.CainK. E.et al. (2014). Troubleshooting public data archiving: suggestions to increase participation. PLoS Biol12:e1001779. 10.1371/journal.pbio.1001779
23
RossJ. S.MulveyG. K.HinesE. M.NissenS. E.KrumholzH. M. (2009). Trial publication after registration in ClinicalTrials.Gov: a cross-sectional analysis. PLoS Med. 6:e1000144. 10.1371/journal.pmed.1000144
24
SchoolerJ. (2011). Unpublished results hide the decline effect. Nature470, 437–437. 10.1038/470437a
25
SimmonsJ. P.NelsonL. D.SimonsohnU. (2011). False positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366. 10.1177/0956797611417632
26
VogelG. (2011). Psychologist accused of fraud on “astonishing scale”. Science334, 579. 10.1126/science.334.6056.579
27
WhitlockM. C.McpeekM. A.RausherM. D.RiesebergL.MooreA. J. (2010). Data archiving. Am. Nat. 175, E145–146. 10.1086/650340

Summary

Keywords

bias, p-hacking, registered reports, replication, reproducibility, type I error

Citation

Parker TH and Nakagawa S (2014) Mitigating the epidemic of type I error: ecology and evolution can learn from other disciplines. Front. Ecol. Evol. 2:76. doi: 10.3389/fevo.2014.00076

Received

01 July 2014

Accepted

07 November 2014

Published

25 November 2014

Volume

2 - 2014

Edited by

François Criscuolo, Centre National de la Recherche Scientifique, France

Reviewed by

Cristian Pasquaretta, Centre National de la Recherche Scientifique - Institute Pluridisciplinare Hubert Curien, France

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: parkerth@whitman.edu

This article was submitted to Behavioral and Evolutionary Ecology, a section of the journal Frontiers in Ecology and Evolution.

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Behavioral and Evolutionary Ecology

OPINION article

Mitigating the epidemic of type I error: ecology and evolution can learn from other disciplines

Section

Statements

Author contributions

Acknowledgments

Conflict of interest

References

Summary

Outline

Cite article

Article metrics

OPINION article

Mitigating the epidemic of type I error: ecology and evolution can learn from other disciplines

Section

Statements

Author contributions

Acknowledgments

Conflict of interest

References

Summary

Outline

Cite article

Share article

Article metrics