Setting Statistical Thresholds Is Useful to Define Truly Effective Conservation Interventions

Effective interventions are needed to solve conflicts between humans and predators over livestock killing, nuisance behavior, and attacks on pets and humans. Progress in quantification of evidence-based effectiveness and selection of the best interventions raises new questions, such as the existence of thresholds to identify truly effective interventions. Current classification of more and less effective interventions is subjective and statistically unjustified. This study describes a novel method to differentiate true and untrue effectiveness on a basis of false positive risk (FPR). I have collected 152 cases of applications of damage-reducing interventions from 102 scientific publications, 26 countries, 22 predator species, and 6 categories of interventions. The analysis has shown that the 95% confidence interval of the relative risk of predator-caused damage was 0.10–0.25 for true effectiveness (FPR < 0.05) and 0.35–0.56 for untrue effectiveness (FPR ≥ 0.05). This means that damage was reduced by 75–90% for truly effective interventions and by 44–65% for interventions of untrue effectiveness. Based on this, it was specified that truly effective interventions have the relative risk ≤ 0.25 (damage reduction ≥ 75%) and the effectiveness of interventions with the relative risk > 0.25 (damage reduction < 75%) is untrue. This threshold is statistically well-justified, stable, easy to remember, and practical to use in anti-predator interventions. More research is essential to know how this threshold holds true for other conservation interventions aiming to reduce negative outcomes (e.g., poaching rates) or increase positive outcomes (e.g., species richness).


INTRODUCTION
Biodiversity loss and degradation of natural ecosystems are the globally pressing emergencies to which scientists and practitioners need to find practical, socially acceptable and effective evidencebased solutions (Adams et al., 2019;Burivalova et al., 2019;Sutherland et al., 2020). To achieve this, development and validation of conservation interventions is essential to reduce threats, recover species and landscapes, and secure sustainable co-existence between local societies and wildlife (Sutherland et al., 2019;Treves et al., 2019;Littlewood et al., 2020). This field is highly important, both scientifically and practically, particularly in the context of conflicts between humans and mammalian predators. Many predators, especially large ones, may inflict damage to livestock and crops, exhibit nuisance behavior, and attack pets and humans (Moreira-Arce et al., 2018;Torres et al., 2018;Ugarte et al., 2019). Retaliatory or preventive killing of predators is often the first "solution" to be considered, but it is generally ineffective and illegal as many predators are threatened and officially protected (Lennox et al., 2018;IUCN, 2020). Therefore, all possible efforts should be invested to find interventions which would be nonlethal and effective over a sufficiently long period of time (Khorozyan and Waltert, 2019a,b).
Evaluation of evidence-based effectiveness of antipredator interventions and identification of the most effective interventions have been developing fast in recent years (Miller et al., 2016;Treves et al., 2016;Eklund et al., 2017;van Eeden et al., 2017van Eeden et al., , 2018Khorozyan andWaltert, 2019a,b, 2020a,b;Bruns et al., 2020). This progress raises new and important research questions. One of them, which is not yet answered to the best of my knowledge, is: are there statistical thresholds that can determine whether interventions can be classified as being truly effective? This is a more nuanced issue than simply discriminating between effective and ineffective interventions because it splits their effectiveness into statistically "true" and "untrue." True effectiveness can be defined as when effectiveness of an intervention can be statistically proven (i.e., by rejecting a hypothesis that there is no effect). Conversely, untrue effectiveness represents situations where effectiveness is only suggested, but cannot be statistically proven (i.e., claims are not substantiated by hypothesis testing), and can therefore be thought of as a false positive or false effectiveness.
Several thresholds have been applied to the effect size metrics to discriminate effective and ineffective interventions, such as 1 for the relative risk and odds ratio, and 0 for the magnitude of change, Hedge's d, Cohen's d and similar metrics (Nakagawa and Cuthill, 2007;Fritz et al., 2012;Khorozyan, 2020). If interventions strive to reduce outcomes, e.g., damage caused by predators, then effective interventions would have a relative risk and odds ratio of less than 1, and negative estimates of the magnitude of change, Hedge's d and Cohen's d. However, having a relative risk <1 does not yet mean that an intervention is truly effective because, say, the estimates of 0.1 and 0.8 are both "effective, " but the first one is obviously more effective (damage reduction by 90%) than the second (20%). Use of some classification schemes is possible, such as the relative risk 0-0.49 (damage reduction by 51-100%) for very effective, 0.50-0.89 (11-50%) for moderately effective and >0.9 (<10%) for ineffective cases (Khorozyan and Waltert, 2019a). Such classifications are subjective and used solely for practical reasons without having a statistical justification. Therefore, a solid statistical background is required to set thresholds for defining truly effective interventions and separating them from interventions of untrue effectiveness.
False positive risk (FPR), or type I error, is a parameter which can be used to disentangle true and untrue effectiveness of conservation interventions. In this context, it means a probability that a result suggesting an intervention is effective (based only on the magnitude of the effect size) could have simply been obtained by chance (Colquhoun, 2014(Colquhoun, , 2019Dienes, 2019). For example, the relative risk may show high effectiveness of a given intervention but the Cohen's d may produce an opposite result. This contradiction is possible because the relative risk is calculated from the total outcomes of treatment and control samples, whereas Cohen's d incorporates variation of these outcomes between samples (Nakagawa and Cuthill, 2007;Fritz et al., 2012;Khorozyan, 2020). As a result, variable effects of an intervention indicate its ineffectiveness, which can be measured by FPR from Cohen's d and some other parameters (Colquhoun, 2017). Minimization of FPR is required, usually to 5% or 0.05 (Colquhoun, 2017), to be confident in true effectiveness of interventions. For this reason, it is safe to say that interventions with FPR less than 0.05 are truly effective and the effectiveness with FPR higher than or equal to 0.05 is untrue.
This study provides a novel approach to evaluate statistically true and untrue effectiveness of interventions used to protect rural economies and neighborhood safety from predators by setting thresholds of the relative risk based on false positives.

Literature Search
This study was focused on interventions striving to reduce damage by medium-sized and large mammalian predators to rural economies and neighborhood safety. I compiled the list of source publications from the previous meta-analyses of intervention effectiveness for predators in general (Khorozyan and Waltert, 2019a,b), wolves (Canis lupus) (Bruns et al., 2020), bears (Ursidae spp.) (Khorozyan and Waltert, 2020a), and felids (Felidae spp.) (Khorozyan and Waltert, 2020b). All these studies used similar approaches in the literature search, which included the retrieval of publications from the earlier meta-analyses (Miller et al., 2016;Treves et al., 2016;Eklund et al., 2017;van Eeden et al., 2017) and the search through all the issues of the journals Conservation Evidence 1 and Ursus, 2 3 newsletters Cat News 4 and Carnivore Damage Prevention News, 5 6 digital libraries of IUCN/SSC Human-Wildlife Conflict Task Force 7 and Cat Specialist Group (see text footnote 4), and Web of Science. 8 In Web of Science, the following search words and strings were used: "livestock" and "effectiveness" or "efficacy" and * predat * ; "wolf, " "Canis lupus, " "livestock, " "protection, " "eff * , " and " * predat * "; Latin names of seven recent bear species (except for the giant panda Ailuropoda melanoleuca which does not cause damage) in combination with eff * ; and Latin names of 38 recent felid species in combination with * predat * and eff * .
I repeated the search as described above and increased the number of publications by including seven publications from van Eeden et al. (2017) which were not available earlier, searching through relevant publications in a new compilation by Littlewood et al. (2020), and extending the search in Web of Science to the time span of 1970-2020 using the search words and strings shown above. I finished the literature search on October 12, 2020. I excluded publications which considered perceived, not actual, effectiveness of interventions (Marker et al., 2005;Boast et al., 2016), used interventions irrelevant to mitigation of predator-caused damage (Jackson et al., 2012), and which were autocorrelated to publications used in this study (Weise et al., 2014).

Data Collection
I compiled a dataset of study cases in which each case represented the effectiveness of a particular intervention on protecting an asset (a livestock species or neighborhood safety) from a predator species in a site. In the context of this study, I defined neighborhood safety as the local people's feelings of safety and security for themselves, other people, pets, and property related to the presence of predators in the area. Some cases included combinations of interventions (Beckmann et al., 2004), assets (Kissui et al., 2019), or predator species (Palmer et al., 2010;McManus et al., 2015), and they were incorporated to the dataset in this way. Interventions belonged to the following six categories: aversion, husbandry, invasive management, lethal control, non-invasive management, and mixed. Aversion included the use of acoustic (e.g., ultrasound or aggressive sounds), chemical (e.g., chemicals or animal feces), physical (e.g., protective collars or rubber bullets), and visual (e.g., fladry or flashlights) deterrents to ward off predators from assets (Ausband et al., 2013;Nuninger et al., 2017;Iliopoulos et al., 2019;Khorozyan et al., 2020). Husbandry was comprised of electric fences, enclosures, guarding animals, and herding, i.e., techniques used to protect assets from predator attacks (Huygens and Hayashi, 1999;Palmer et al., 2010;Potgieter et al., 2016;Weise et al., 2018). Invasive management was represented by translocations, sterilization, and shock collars which treated predators invasively, with capture, handling, release and post-release monitoring, to reduce damage induced by predators (Bromley and Gese, 2001;Landriault et al., 2009;Rossler et al., 2012). Lethal control included predator shooting, trapping, and poisoning to reduce damage (Hamr et al., 2015;Pacioni et al., 2018). Non-invasive management was comprised of procedures that excluded contacts with predators, such as the use of predator-proof garbage bins, removal of food remains, capacity building programs, crop management, and supplemental feeding (Hazzah et al., 2014;Johnson et al., 2018). Mixed interventions included the simultaneous application of interventions belonging to several categories (Stone et al., 2017;Jamwal et al., 2019).

Data Analysis
I measured the effectiveness of interventions for each case in the relative risk of damage (RR; Eklund et al., 2017;Khorozyan, 2020): where A is the metric of damage (e.g., number of livestock individuals killed by predators) with a given intervention, B is the same metric without the intervention, N t is the treatment sample size (e.g., number of livestock exposed to the intervention) and N c is the control sample size (e.g., number of livestock not exposed to the intervention or before the intervention is applied). Therefore, RR is a ratio of the probabilities of damage risk with and without the intervention. When interventions aim at reducing negative outcomes, such as damage caused by predators, interventions with RR < 1 are considered as effective and become most effective at RR = 0 when A = 0. Interventions with RR close to 1 are ineffective and those with RR > 1 are counter-productive as they increase damage instead of decreasing it. For better understanding, RR can be transformed to the percentage of damage reduction as (1−RR) × 100 (Khorozyan and Waltert, 2020a,b). I calculated the FPR for each case in FPR web calculator v. 1.7 9 from the observed p-values, arithmetic means, standard deviations, and sample sizes of treatment and control samples (Longstaff and Colquhoun, 2020). I calculated observed p-values in GraphPad QuickCalcs web calculator 10 by means of paired or independent t-tests of treatment and control samples depending on original study designs. I took the prior probability of the alternative hypothesis p(H1) for FPR calculations as equal to 0.5, meaning a 50:50 odds of an intervention to be effective or not before the study is done (Colquhoun, 2014(Colquhoun, , 2017(Colquhoun, , 2019. I excluded the cases when FPR could not be calculated because of standard deviations equaling zero in control and treatment samples. I defined the effectiveness of intervention applications to be true at FPR < 0.05 and untrue at FPR ≥ 0.05. I estimated the threshold of RR by producing the 95% CI of RR for true and untrue effectiveness by bootstrapping at 1000 repetitions in iNZight 3.2.1 (University of Auckland, New Zealand). I compared samples by Mann-Whitney test and conducted statistical analysis in IBM SPSS 26.0, unless otherwise indicated.
I studied the stability of the threshold of RR depending on sample sizes, interventions, countries and predator species. I split the sample into 15 random sub-samples of an increasing size with a step of 10 (n = 15, 25, 35, . . ., 152), calculated the 95% CI for true and untrue effectiveness and compared samples by Mann-Whitney test for each sub-sample as described above. I also repeated these procedures for the samples of intervention categories, countries and predator species which were larger than n = 10.

RESULTS
The original dataset consisted of 157 cases from which I excluded five cases with undetermined FPR (SD = 0). The dataset used for the analysis included 152 cases from 102 publications, 26 countries, and 22 predator species. It included 38 cases with FPR < 0.05 (true effectiveness) and 114 with FPR ≥ 0.05 (untrue effectiveness).
The 95% CI of RR was 0.10-0.25 (median 0.16) for FPR < 0.05 and 0.35-0.56 (median 0.46) for FPR ≥ 0.05 ( Figure 1A). The FIGURE 1 | The medians and 95% confidence intervals of the cases of true and untrue effectiveness in relation to the threshold of the relative risk (RR) = 0.25 across random sub-samples (A) and intervention categories, countries and predator species with sample size larger than n = 10 (B). The asterisks show the statistical significance of the difference between the cases of true and untrue effectiveness (p < 0.05) and the sign (+) indicates its marginal significance (Mann-Whitney U = 124.5, p = 0.059). Sample sizes are shown above the graphs. samples of RR in these two groups differed significantly (Mann-Whitney U = 1193.0, p < 0.001). This means that damage was reduced by 75-90% for truly effective interventions and by 44-65% for interventions of untrue effectiveness from the dataset ( Table 1). Based on this, it can be specified that truly effective interventions have RR ≤ 0.25 (damage reduction ≥ 75%) and the effectiveness of interventions with RR > 0.25 (damage reduction < 75%) is untrue ( Table 1).

DISCUSSION
This study has shown that the 5% FPR could reliably separate true and untrue effectiveness of conservation interventions aimed at reduction of damage by predators. For truly effective interventions against predators, the relative risk was less than or equal to 0.25 meaning that the damage produced by predators was reduced by interventions by 75% and more. For untrue effectiveness, the relative risk was higher than 0.25, i.e., damage was reduced by less than 75% (Table 1). This threshold is easy to remember and use in practice related to anti-predator interventions. This threshold was stable beginning from the sample size of ca. 40-50. It apparently was not affected by sampled interventions, countries and predator species, but variation of some samples caused by their small sizes precluded from making firm conclusions ( Figure 1B). The dataset of this study was dominated by USA (43.4% of 152 cases), husbandry (40.1%), aversion (30.3%), coyote (23.0%), and gray wolf (19.7%). Arguably, these biases are natural and insurmountable due to the practicality and traditional use of deterrents and husbandry methods such as dogs, herders, or enclosures (Palmer et al., 2010;Potgieter et al., 2016;Weise et al., 2018), widest implementation and high scientific publishability of the United States-based intervention studies (VerCauteren et al., 2012), and persistent large-scale conflicts with coyotes in North America and wolves in North America and Europe which drive their in-depth research (Bromley and Gese, 2001;Rossler et al., 2012;Ausband et al., 2013). Other studies also confirmed these biases toward the mentioned interventions, geographical areas, and species (Khorozyan andWaltert, 2019a, 2020a;Bruns et al., 2020). I assumed that publication bias, meaning the prevalence of positive results in the literature, did not play a major role in this study because a significant number of publications that I found and used provided also negative results (i.e., ineffective interventions).
It is still unclear whether this threshold can be generalized over other conservation interventions and species, and I invite more research on broader applicability of this approach. Interventions can be reduction-aimed by decreasing negative outcomes (e.g., poaching rates or damage caused by wildlife) or addition-aimed by increasing positive outcomes (e.g., species richness or diversity) (Khorozyan, 2020). It is equally interesting and practical to know how the threshold from this study holds true for other reduction-aimed interventions and how the FPR is useful in defining thresholds for additionaimed interventions.
Very little is known about the development and application of thresholds in evidence-based effectiveness of conservation interventions. For example, certain levels of vaccinations and removals were specified for different management approaches vs. no action in an attempt to control brucellosis in the Yellowstone bison (Bison bison) population (Hobbs et al., 2015). A number of ecological and management thresholds have been used to mark the tipping points beyond which interventions are required to prevent irreversible degradation of species and landscapes (Bestelmeyer, 2006;Laufenberg et al., 2018;Snyder and Young, 2020) or they become too expensive (Field et al., 2004). Therefore, these thresholds are related to intervention planning and not to their performance. In health care, the cost-effectiveness threshold has been practiced to set the best possible value for money in regard to the treatment of patients (Lim et al., 2017). However, a transfer of the cost-effectiveness threshold to biodiversity conservation is problematic due to inconsistency of conservation outcome metrics, diversity of protected assets, and a need for standardization (Cook et al., 2017). Therefore, much more effort is required to set and validate thresholds in effectiveness of conservation interventions.
In conclusion, I believe that the results of this study will be useful for practical applications and further research of interventions used to ensure co-existence between rural communities and predators.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
IK conceived and designed the study, collected and analyzed data, produced visual material, and wrote the manuscript. ACKNOWLEDGMENTS I thank L. van Eeden for sharing literature and two reviewers for their thoughtful and encouraging comments.