Commentary: Early Risk Detection of Burnout: Development of the Burnout Prevention Questionnaire for Coaches

In a recent volume of this journal, Schaffran et al. (2019) introduced the Burnout Prevention Questionnaire for Coaches (BPQ-C). Although we recognize the worthwhile efforts of Schaffran et al., we ...


INTRODUCTION
In a recent volume of this journal, Schaffran et al. (2019) introduced the Burnout Prevention Questionnaire for Coaches (BPQ-C). Although we recognize the worthwhile efforts of Schaffran et al., we believe that there are several issues associated with this instrument. This commentary aims to expand on why we think the BPQ-C should not currently be used by practitioners and researchers to screen for burnout.

LACK OF THEORETICAL FOUNDATIONS
With regards to theory, we have four concerns. First, Schaffran et al.'s definition (and subsequent measurement) of burnout is inadequate. Although several definitions of burnout exist, most researchers agree that burnout is a multidimensional construct and models share-at a minimum-an exhaustion component (Maslach et al., 1996;Raedeke and Smith, 2001;Shirom, 2005). This was recently highlighted by the World Health Organization (2018). While not everyone agrees with one exact definition, no well-recognized definition includes general stress, social stress, and amotivation (Maslach et al., 1996;Raedeke and Smith, 2001;World Health Organization, 2018).
Second, Schaffran et al. (2019) seem to conflate antecedents, outcomes, and the construct of burnout itself. This point is exemplified by the fact that chronic job stress has historically been defined as an essential burnout antecedent and not a defining characteristic of burnout (Maslach et al., 1996;Shirom, 2005). Therefore, using stress as a symptom of burnout and not as an antecedent is conceptually incorrect. Similarly, the dimension labeled "pre-burnout" includes fatigue. This is confusing because of the evident conceptual overlap with exhaustion, which is known as a defining characteristic of burnout (Grossi et al., 2015).
Third, an underlying assumption for the BPQ-C seems to be that there exist sequential events in the burnout process. It is therefore peculiar that the authors relied exclusively on crosssectional data when they created and tested their instrument Maslach, 1999, 2003;Hendrix et al., 2000). Especially since the complexity regarding temporality between burnout dimensions has been discussed before both in sport and more generally (Shirom and Melamed, 2006;Martinent et al., 2016).
Our final conceptual concern is the reliance on statistical considerations (from analysis of one sample), instead of also considering theoretical aspects, when deciding on the inclusion/exclusion of certain BPQ-C dimensions (Study 2). This exploratory approach has been criticized in the burnout literature previously and heightens risks for conceptual overlap with other constructs (see Shirom and Melamed, 2006), as well as misunderstanding of temporal relations with emotional exhaustion (Moore, 2000). The problem with this procedure becomes particularly apparent in relation to the resources dimension which defines sleep as a resource that may buffer potential burnout. However, with the same logic, poor sleep quality could be seen as a part of "pre-burnout" since poor sleep quality is a well-known antecedent of burnout symptoms (Söderström et al., 2012). Thus, although the authors briefly acknowledge the complexity of sleep in relation to burnout in their discussion, they do not provide any guidelines based on sound empirical work and theory to explain how sleep scores in the BPQ-C should be interpreted.

UTILITY AS A SCREENING TOOL
The authors claim that the BPQ-C has the potential to be used as a screening tool in practical settings to detect early signs of burnout, and to guide prevention strategies before a coach develops more serious burnout symptoms. Although a commendable goal, one well-known problem is that there is often no clear distinction between normal functioning and psychological illness. Evaluations of symptoms should be related to the context in which they appear as well as to contemporary values and economic forces present in society and politics (Kinderman et al., 2017). Furthermore, Schaffran and colleagues do not provide evidence that the BPQ-C can be used to adequately screen for clinical burnout. To do so, analyses such as receiver operating characteristic analysis (ROC), which tests the ability of measures to discriminate between individuals with and without a certain clinical characteristic, is necessary (Kraemer, 1992). Comparisons between a clinical and a healthy sample were not a part of the validation of the BPQ-C. In fact, no descriptive statistics of burnout are reported. This makes it even harder to determine variation and representability of the samples that were used. In addition, even when a clinical sample is compared with a healthy sample, discriminating between healthy and ill individuals is challenging as they show substantial overlap (Lundgren-Nilsson et al., 2012). Therefore, using screening tools without clinical interviews creates substantial risks of both false positives and negatives, a matter that has been neglected in the sport psychology literature and is not considered by Schaffran et al. in their article.

QUESTIONNAIRE DEVELOPMENT
Our last point of criticism concerns the development and validation of the BPQ-C. Several guides on how to develop a psychological instrument are available. Although they differ, there are some common grounds when it comes to item development, scale development, and scale evaluation (MacKenzie et al., 2011;Boateng et al., 2018). Schaffran et al., however, only communicate a few of these suggested steps. When such a step is communicated, like confirmatory factor analysis (CFA), the high degree of similarity in subscale content suggests a redundancy of these factors. This issue is compounded by the fact that parallel analysis suggested a one-factor model (see Sample 1). A stepwise evaluation would have easily eliminated this in the item developing phase. Lastly, in the three-factor CFA, several residuals are correlated (see Figure 1 in Schaffran et al., 2019). The reasons for this needs to be communicated since correlating residuals comes with several risks (with capitalization of chance being the most prominent one), making replication in another sample unlikely (Landis et al., 2009).

DISCUSSION
This commentary provides several arguments as to why, in our opinion, the BPQ-C fails to meet several theoretical, clinical, and psychometric standards. Substantial work is therefore needed before the BPQ-C can be considered a valid measure. Instead, we recommend the use of existing validated measures of burnout. One example of such a measure is the Shirom Melamed Burnout Questionnaire (SMBQ; Kushnir and Melamed, 1992), which is based on Hobfoll's Conservation of Resources (COR) theory (Hobfoll, 1989). The SMBQ has been validated in both clinical and healthy samples (Lundgren-Nilsson et al., 2012). Another alternative would be the Maslach Burnout Inventory (Maslach et al., 1996), which follows the theoretical underpinnings provided by Maslach and colleagues (Maslach et al., 1996). The MBI has shown reasonable psychometric properties in previous research, and for which at least some evidence exists that the instrument is able to discriminate between clinical and healthy individuals. However, the latter only applies to the exhaustion subscale (Kleijweg et al., 2013). Here, we note, however, that using self-report measures as the only source for clinical screening may not be successful because of the overlap between clinical and healthy individuals. We therefore recommend that, wherever possible, self-report screening should be combined with clinical interviews (Lundgren-Nilsson et al., 2012;Kleijweg et al., 2013) and that the interpretation of self-reported burnout measures with clinical cut-offs should be made with caution.

AUTHOR CONTRIBUTIONS
EL is the main author who initiated the commentary, wrote the first draft, and has been editing all other contributors' drafts and comments. HG was involved in the design of the paper and has been commenting and editing the paper throughout the whole process. MG has been commenting on the paper and has contributed uniquely to the lack of theoretical foundations part. CL has been commenting on the paper and has contributed uniquely to the clinical screening tool part. AI has been commenting on the paper and has contributed uniquely on the part called questionnaire development. DM was involved in the initial scoping of the paper, has done English editing, and contributed uniquely on the lack of theoretical foundations part.