An Investigation of Diagnostic Accuracy and Confidence Associated with Diagnostic Checklists as Well as Gender Biases in Relation to Mental Disorders

This study examines the utility of checklists in attaining more accurate diagnoses in the context of diagnostic decision-making for mental disorders. The study also aimed to replicate results from a meta-analysis indicating that there is no association between patients’ gender and misdiagnoses. To this end, 475 psychotherapists were asked to judge three case vignettes describing patients with Major Depressive Disorder (MDD), Generalized Anxiety Disorder, and Borderline Personality Disorder. Therapists were randomly assigned to experimental conditions in a 2 (diagnostic method: with using diagnostic checklists vs. without using diagnostic checklists) × 2 (gender: male vs. female case vignettes) between-subjects design. Multinomial logistic and linear regression analyses were used to examine the association between the usage of diagnostic checklists as well as patients’ gender and diagnostic decisions. The results showed that when checklists were used, fewer incorrect co-morbid diagnoses were made, but clinicians were less likely to diagnose MDD even when the criteria were met. Additionally, checklists improved therapists’ confidence with diagnostic decisions, but were not associated with estimations of patients’ characteristics. As expected, there were no significant associations between gender and diagnostic decisions.


INTRODUCTION
A comprehensive and systematic diagnostic process based on objective criteria is of particular importance for the treatment of mental disorders (Ramirez Basco et al., 2000;Ehlert, 2007). Nevertheless, several studies have shown high rates of misdiagnoses in daily practice (Kales et al., 2005;Bruchmüller and Meyer, 2009;Wolkenstein et al., 2011). Though patient factors, such as gender, are frequently assumed by clinicians to be a cause of misdiagnosis, in reality, among other factors, clinicians' diagnostic approach is a more prominent cause of misdiagnosis (Zimmerman and Mattia, 1999;Wolkenstein et al., 2011;Cwik and Teismann, 2016). The diagnostic process may involve subjective descriptions of the patient or may influence or be influenced by therapists' expectations (Langer and Abelson, 1974;Rosenhan and Seligman, 1989;Margraf and Schneider, 2009). Clinicians may also tend toward looser interpretation and use of the diagnostic criteria of classification systems and might resort to other resources instead, such as their professional experiences or personal assumptions (Morey and Ochoa, 1989;Bruchmüller and Meyer, 2009;Meyer and Meyer, 2009;Wolkenstein et al., 2011;Bruchmüller et al., 2012;Garb, 2013;Cwik and Teismann, 2016). As a result, in single diagnoses, clinicians tend to be more likely to make false-positive diagnoses over time, assigning a disorder label though not all required diagnostic criteria are fulfilled . Regarding comorbidity, the opposite tends to happen: the most salient diagnosis becomes diagnosed and comorbid disorders are missed (Garb, 1998).
Structured diagnostic interviews are recommended in clinical practice to safeguard against misdiagnosis due to clinician bias (Schneider and Döpfner, 2004;Joiner et al., 2005;Silverman and Ollendick, 2005;Ehlert, 2007). Although interviews do help to reduce bias and misdiagnosis , clinicians rarely use such instruments in daily practice Suppiger et al., 2008Suppiger et al., , 2009Bruchmüller et al., 2011). Therapists provided varying rationales for not using structured interviews: their clinical judgment is more useful (37%) and the length of the interviews (34%).
Diagnostic checklists provide an alternative to interviews and represent a compromise between clinicians' preference for less structured discussion and the inclusion of specific diagnostic criteria in the diagnostic process, enabling an evaluation process of higher reliability and validity than the status quo (Hiller et al., 1993). Diagnostic checklists are beneficial in that they can assess diagnostic criteria without interfering with the clinician's rapport or interview approach. Thus, open clinical judgment (OCJ) is still possible, but in combination with the assurance that all criteria of a potential disorder are considered systematically and that non-conformity with the criteria of a diagnosis is consciously noted. There is evidence that diagnostic checklists can help to attain reliable diagnoses and to increase diagnostic accuracy (Biederman et al., 1993;Bronisch and Mombour, 1994;Aebi et al., 2010;Mokros et al., 2013;Vaughn and Hoza, 2013).
However, accuracy of given diagnoses is only one essential component for a well-considered psychotherapy. Other important aspects are treatment planning, confidence with given diagnoses, and the estimation of therapy relevant patient characteristics (Brownstein, 2003;Schmidt et al., 2005;Arvilommi et al., 2007;Bruchmüller and Meyer, 2009;Meyer and Meyer, 2009). However, Witteman and Kunst (1997) showed that professionals tend to insist on their first clinical opinion and then search for confirming information. Assuming that different therapists have different underlying confirmation biases, therapeutic recommendations and estimations of patients' characteristics may vary when therapists do not use diagnostic tools to ensure a standardized consideration of all relevant aspects. In line with this, study results showed that neglecting diagnostic relevant information leads to inappropriate treatment planning (Drake et al., 1993;North et al., 1997). Furthermore, therapists suggest that the consideration of DSM criteria is not important for treatment planning (Zimmerman et al., 1993) and that their clinical judgment is more useful than diagnostic tools . More recently, Zimmerman (2016) pointed out in his systematic review that accurate diagnostics may have an impact on treatment recommendations.
Based on the assumption that the diagnostic process has an effect on treatment (Ramirez Basco et al., 2000;Ehlert, 2007) and that an effective diagnostic process could improve treatment planning (Haynes and Williams, 2003;Groenier et al., 2008), one could assume that diagnostic checklists -as effective diagnostic tools -will promote accurate diagnostics and adequate treatment recommendations and estimation of patient characteristics compared to OCJ. Consequently, we hypothesized that the usage of diagnostic checklists influences diagnosticians' confidence with given diagnoses and other therapeutic aspects, such as taking patient characteristics or diagnosticians' appraisal of the subsequent psychotherapeutic process into consideration.
In addition to diagnostic methods, gender biases and their association with misdiagnoses of mental disorders became of scientific interest in the 1970s (Broverman et al., 1970). Currently and historically, several diagnoses tend to be more prevalent in one gender, with a bias toward more prevalence in general in women (American Psychological Association [APA], 1975[APA], , 1978. Mental disorders with increased prevalence in one gender are of particular interest to psychiatrists and psychotherapists. Major Depressive Disorder (MDD), Generalized Anxiety Disorder (GAD), and Borderline Personality Disorder (BPD) have shown a greater prevalence in women (Angst et al., 2002;Wittchen and Jacobi, 2005;Morschitzky, 2009;Seedat et al., 2009), with some potential overdiagnosis. Widiger and Spitzer (1991) postulated that gender-biased mis-or over-diagnosis can occur on two levels: in relation to the application of the diagnostic criteria and/or related to the diagnostic criteria themselves. In line with this, some evidence indicates that a diagnostic category that is, in clinician's opinion, linked to a particular gender, is more likely to be given when the symptomatic behavior of the patient is concordant with traditional gender stereotypes (Sprock, 1996;Crosby and Sprock, 2004;Flanagan and Blashfield, 2005). Clinicians tend to pathologize symptoms when there is a large gap between symptoms and traditional gender characteristics of a patient (Sprock, 1996;Möller-Leimkühler, 2005). For instance, Möller-Leimkühler (2005) reports that emotional expressions in a depressive man or aggressive behavior in a depressive woman are not in line with traditional gender characteristics and are associated with misdiagnosis of MDD. According to these findings, Ford and Widiger (1989) illustrated the effects of gender biases in a study on personality disorders. They showed that even if women fulfilled the diagnostic criteria for Antisocial Personality Disorder, clinicians more frequently diagnosed Histrionic Personality Disorder, despite non-fulfillment of the diagnostic criteria.
Assuming that therapists are aware of the connection between gender and prevalence, one could expect consistent gender biases in diagnostic decisions. However, evidence of the influence of gender stereotypes on diagnostic decisions is mixed Gomes and Abramowitz, 1976;Teri, 1982;Heatherington et al., 1986;Hansen and Reekie, 1990).
For example, therapists made significantly more MDD diagnoses when the described person in a case vignette was a woman in comparison to a man (Wrobel, 1993;Bertakis et al., 2001;Lewis et al., 2006). However, other studies reported no effect of patient's gender in MDD (Hansen and Reekie, 1990;López et al., 1993;Case et al., 1999;Kales et al., 2005). More generally, Garb (1997) found that women are no more likely to be given a mental health diagnosis than men. Additionally, a recent meta-analysis (Cwik et al., under review) of ours showed that patient's gender is not a causal factor for misdiagnoses of mental disorders when examining results across 22 studies.
The aim of the present study was to investigate two main questions. First, we wanted to investigate diagnostic accuracy when using diagnostic checklists for mental disorders as compared with OCJ. Additionally, we were interested in finding out to what extent the usage of checklists increases diagnosticians' confidence with given diagnoses. We hypothesized that clinicians' diagnostic accuracy and confidence would be higher when using checklists.
Secondly, this study aimed to investigate the relationship between misdiagnoses and patients' gender. Based on the results of our meta-analysis, one could assume that a diagnostic gender bias is absent when it is investigated in mental disorders in general, but present in specific mental disorders that are more frequent in females (e.g., MDD, GAD, BPD). Even in these disorders, we expected that patients' gender is not a cause of misdiagnosis or of different estimations of diagnosticians' confidence with given diagnoses.
Finally, we conducted an exploratory analysis of how the usage of checklists and patients' gender is associated with the following diagnostic aspects: clinicians' estimation of the severity of diagnoses, patients' motivation for treatment, expected number of treatment sessions until a significant improvement of symptoms, and clinicians' recommendation for therapeutic interventions.

MATERIALS AND METHODS
The Ethics Committee of the Ruhr-Universität Bochum approved the study. All participants gave their informed consent before they were able to start with the questionnaire.

Participants and Procedure
An invitation to the study was send out to mailing addresses of psychotherapists that were available by using the search functions of the Chamber of Psychotherapists in all federal states of Germany and of the "Deutsche Psychotherapeutenvereinigung" (German Association of Psychotherapists). In this email, participants received background information about the study and received a link to the online survey. The online survey was completed anonymously and participants could only proceed to the next questionnaire once all questions had been answered. A response of 834 invitees was achieved. Participants were randomly allocated to conditions. The exclusion rate was 359, due to incompletion of 357 (42.81%) diagnostic surveys and implausible information of two surveys, thus an effective response rate of n = 475 (56.95%) was achieved [checklist (CL): n = 245 vs. OCJ: n = 230; female vignettes: n = 241 vs. male vignettes: n = 234]. The exclusion of the incomplete surveys ensured that there was no missing data in the remaining 475 surveys. The data collection took place between April and August 2011.
Participants' mean age was 46.67 years (SD = 11.02), and 326 participants (68.8%) were women. The sample comprised 79.2% clinicians who had completed cognitive behavioral psychotherapy training, 22.5% who had completed training in analytic psychotherapy, and 24.6% who had completed another type of psychotherapeutic training (including gestalt, clientcentered, systemic therapists). Of all psychotherapists, 75.4% had completed a single therapeutic training (e.g., only a training in CBT), 22.9% had completed two trainings (e.g., trainings in CBT and gestalt therapy), and 1.7% had completed three (e.g., trainings in CBT, gestalt therapy, and psychoanalysis) or more trainings. Overall, 82.9% of the participants worked in their own practice, with a mean of 16 years (SD = 10.13) of professional experience. When asked to rate their clinical experience in using the ICD-10 (World Health Organization [WHO], 1992) and the DSM-IV-TR (American Psychiatric Association [APA], 2000) (1 = "no experience" to 5 = "very experienced"), the participants rated their level of experience with ICD-10 higher (M = 4.37, SD = 0.71) than their experience with DSM-IV-TR (M = 2.53, SD = 1.09); there was a significant difference between experiences with both classification systems [t(474) = −32.35, p < 0.001].

Case Vignettes
Each participant received an online survey with a cover letter and a link to the survey including three case vignettes (available as Supplementary Material). The case vignettes were constructed on the basis of DSM-IV-TR and were extended to include essential criteria for ICD-10, so that the underlying disorders could be unambiguously diagnosed according to both classification systems.
The first case vignette described a middle-aged patient with a severe episode of MDD. The vignette contained all information necessary to clearly diagnose a MDD episode without psychotic symptoms, according to DSM-IV-TR and ICD-10, except the criterion for suicidal behavior. The second vignette described a patient fulfilling criteria for GAD, based on a DSM-III case description . For this case vignette, we shortened the original description attuned to German culture and to DSM-IV-TR criteria. To ensure that participating therapists who usually use ICD-10 diagnostic criteria in their daily routine would be able to diagnose GAD, we added all required symptoms of ICD-10. The last vignette described a patient fulfilling all general criteria for a personality disorder according to DSM-IV-TR as well as seven required criteria of a BDP. It was based on a case description of Zaudig et al. (2000).
To validate the diagnostic criteria of the underlying disorders, seven licensed psychotherapists reviewed all vignettes. All psychotherapists diagnosed the correct disorder without an additional (comorbid) diagnosis in each vignette.

Diagnostic Questionnaire
All participants received the same diagnostic questionnaire. In this questionnaire, therapists were asked to diagnose the case vignettes. They were able to choose up to three options. One option of which was "no disorder present." The other options were 12 listed diagnoses. In addition, therapists of both groups were asked how reliable they rated the selected diagnosis (0 = "insufficient information" to 100 = "very reliable").
Furthermore, therapists were asked to rate the assumed extent of mental, social and job-related impairment caused by the described symptoms (0 = "mentally healthy" to 100 = "mentally ill"), patients' motivation for a therapeutic treatment (0 = "not at all motivated" to 100 = "highly motivated"), and severity of the described disorder (1 = "no mental disorder" to 8 = "severe mental disorder"), and were asked to estimate how many treatment sessions would be needed until significant improvement could be expected. Finally, therapists were asked which therapeutic orientation for treatment they would advise.

Diagnostic Checklists
Checklists were only presented to participants in the checklist condition. Initially, the checklists were not visible to participants. First participants had to choose and click on a diagnosis that determined the checklist they would be presented. For instance, if a participant decided for MDD and clicked on MDD diagnosis, the corresponding MDD checklist opened. Then the participant was able to verify the given diagnosis or to falsify it, switch to another diagnosis and receive the corresponding checklist (e.g., choosing Dysthymia diagnosis and receiving the checklist for Dysthymia).
The checklists listed particular keywords extracted from DSM-IV-TR criteria for each diagnosis. Subsequent to each symptom, participants had to decide whether the symptom was described or not. Finally, at the end of the checklist, participants had to decide whether the criteria were fulfilled.

Statistical Analysis
Although participants were initially randomized, due to exclusion of incomplete 359 surveys, several betweengroup significant differences in demographic variables were observed. For the diagnostic procedure condition, significant between-group differences were observed several demographic variables. Significant group differences were . For all other demographic variables, no significant differences between groups were observed. These significant variables were used for the calculation of propensity scores (PS) for each condition: a PS for each participant for gender condition and another PS for diagnostic procedure condition. The PS were calculated using logistic regression with the respective conditions as dependent variables and the demographic variables showing significant differences between groups as independent variables (Rosenbaum and Rubin, 1983;D'Agostino, 1998;Bartak et al., 2009;Weinberger et al., 2009;Austin, 2011).
Next, multinomial logistic regression analyses were performed for each case vignette, with each of the two conditions (diagnostic procedure and patients' gender) as independent variables and diagnostic decisions and treatment recommendations as dependent variables. Within each condition, the corresponding PS for that condition were used for weighting each participant's contribution, to control for significant group differences.
Additionally, multiple linear regression analyses were used to analyze the nature of correctly diagnosed cases within each case vignette after weighting for differences in demographic variables. The respective conditions (diagnostic procedure and patient's gender) were used as independent variables. Therapists' confidence of diagnosis, patients' motivation for treatment, severity of diagnosis and the expected number of treatment sessions were included as dependent variables. Furthermore, model fit and relative strength of associations between conditions and diagnostic decisions were assessed using Nagelkerke's R 2 and Cohen's f 2 .
All results were Bonferroni-corrected for multiple comparisons. Data analysis was conducted using SPSS version 22.0 for Mac (IBM Corporation, 2013).

Checklists and Diagnostic Decisions
As can be seen in Table 1, results from the multinomial logistic regression analysis revealed that the usage of OCJ was significantly associated with making more false comorbid diagnoses in patients with MDD (p < 0.005), GAD (p < 0.001), and BPD (p < 0.001), compared to making the correct diagnosis and to therapists using checklists. Furthermore, the usage of OCJ was also significantly associated with making false diagnoses in patients with GAD (p < 0.001) and patients with BPD (p = 0.002), compared to making the correct diagnosis and to therapists using checklists.
Contrary to our hypothesis that clinicians' diagnostic accuracy would be higher when using checklists, the usage of checklists was significantly associated with therapist's decision to refrain from making a diagnosis in the MDD case, compared to making correct diagnostic decisions and therapists using OCJ (p < 0.001). This means that there were more false-negative diagnoses when using checklists compared to using OCJ.

Patients' Gender and Diagnostic Decisions
To examine the association between patients' gender and diagnostic decisions, all three case vignettes were given to participants either in a male or in a female version. As indicated in Table 1, there was no significant association between patients' gender and giving false comorbid diagnoses, false diagnoses or no diagnoses in patients with MDD, GAD, or BPD.

Checklists and Therapists' Treatment Recommendations
With respect to treatment recommendations, there was no significant association between the diagnostic procedure and recommendations in the MDD and the GAD cases. Pertaining to the treatment recommendations in the BPD case, therapists who used checklists recommended significantly more frequently dialectic-behavioral therapy (DBT) as preferable therapy (p = 0.007) than psychotherapists in the OCJ condition ( Table 2).

Patients' Gender and Therapists' Treatment Recommendations
With respect to treatment recommendations, there was no significant association between the patients' gender and treatment recommendations ( Table 2).

Confidence with Given Diagnoses and Estimations of Patients' Characteristics
Multiple linear regression analyses were conducted, using both conditions as independent variables and the appraisal of patients' motivation for treatment, severity of given diagnoses, confidence in diagnoses, and psychotherapists' estimation regarding the expected number of treatment sessions until the patient experiences significant improvement as dependent variables. Additionally, corresponding PS were included in the regression models to control for significant differences in demographic variables.

Checklists and Therapists' Confidence and Estimations of Patients' Characteristics
As can be seen in Table 3, there was a significant association between the usage of checklists and the therapist's confidence in OR, odds ratio; 95% CI, 95% confidence interval; MDD, Major Depressive Disorder; GAD, Generalized Anxiety Disorder; BPD, Borderline Personality Disorder; correct, correct diagnosis given; comorbid, correct diagnosis given with a false comorbid diagnosis; false, only false diagnosis/diagnoses given; no, no diagnosis/diagnoses given. Significant results are presented in bold letters and were all Bonferroni corrected: * p < 0.016; * * p < 0.003. a Adjusted for significant differences demographic data by using propensity scores.   With regard to the estimation of disorders' severity ratings and patients' motivation for treatment as well as the estimation of expected treatment sessions, no significant associations were found.

Patients' Gender and Therapists' Confidence and Estimations of Patients' Characteristics
As Table 3 illustrates, there was a significant association between men with MDD and men with BPD and an appraisal of lower motivation for treatment (MDD: β = −0.142, p = 0.003; BPD: β = −0.159, p = 0.001), but not in case of men with GAD (β = −0.051, p = 0.286). With respect to the confidence with diagnostic decisions, the estimation of the severity of the diagnoses and the number of expected treatment sessions, there were no significant associations observed with patients' gender in any of the three cases.

DISCUSSION
The present study examined the diagnostic accuracy of psychotherapists with the use of checklists in the diagnostic process and investigated diagnostic accuracy as related to patient gender. We hypothesized that the usage of checklists would result in significantly higher diagnostic accuracy. After controlling for group differences in several therapist demographic variables and therapist training/experience, this hypothesis was mostly supported by the data with one exception: for the MDD case vignette, when using diagnostic checklists therapists made significantly more false-negative diagnoses compared to when using OCJ.
With regard to the association between the use of checklists and misdiagnoses, psychotherapists who used OCJ were more likely to diagnose an additional comorbid disorder in all of the three disorder cases. In the literature on misdiagnoses, there is evidence that therapists base diagnoses more on comparison to prototypes than on the meeting of diagnostic criteria (Blashfield and Herkov, 1996;Garb, 1996;Westen and Shedler, 2000;Crosby and Sprock, 2004). This could explain the difference between the diagnostic decisions of therapists who used checklists and those that did not.
Furthermore, it seems that therapists rely on various heuristics while making a diagnosis, based on their professional experience and, as prior studies also showed, their appraisal of symptoms and diagnoses is not often evident (Morey and Ochoa, 1989;Blashfield and Herkov, 1996;Crosby and Sprock, 2004). In their theory of heuristics and biases, Tversky and Kahneman (1974) point out that even experts are prone to errors of judgment. Concerning the prevalence and the overdiagnosing of Mood Disorders in this study, it seems that psychotherapists seem to be prone to a representativeness bias (Tversky and Kahneman, 1974;Blumenthal-Barby and Krieger, 2015) with respect to their knowledge of the prevalence and regarding the symptoms of mood and hyperarousal reported in the vignettes. It also seems that the last mentioned aspect triggers a focus on particular symptom complexes that provoke a neglect effect with respect to other relevant symptoms. Furthermore, it seems that the representativeness bias draws diagnosticians' attention to particular symptom complexes, while other relevant symptoms are neglected. Regarding representativeness bias, it could be surmised that the less frequent the diagnosis, and thus, the less familiar therapists are with the diagnosis, the more they will benefit from using checklists to attain correct and to avoid false diagnoses.
Contrary to our expectations, psychotherapists were more likely to refrain from making a diagnosis in the MDD case vignette when they used checklists. The results of a study by Garb (2007) showed that the usage of computer-administered interviews and checklists was associated with more false-positive diagnoses. Thus, Garb recommends a combination of such a diagnostic instrument with clinical judgment.
More information was collected using the computeradministered interviews and checklists than traditional clinical interviews. Admittedly, in the study of Garb (2007), more symptoms were revealed and thus, it is not surprising that more diagnoses were made. Contrary, in the present study checklists were used to help clinicians making diagnoses from vignettes and thus, this study is involved more with data integration than data collection. Furthermore, although psychotherapists were given checklist information, they were not given the diagnostic criteria for MDD. As Garb (1996) illustrated, if clinicians make diagnoses by comparing clients to prototypes, they may not know the diagnostic criteria for MDD and may have continued to compare the MDD case vignette to their prototype for MDD. Then, if the checklist contained information psychotherapists do not usually collect, that information may not be part of their prototype, and thus the vignette may seem more dissimilar to their prototype resulting in fewer diagnoses of MDD. Thus, one could assume that giving clinicians checklist information does not mean the representativeness heuristic is no longer descriptive. However, it is possible that they were unable to clarify uncertainties with respect to single symptoms. In the case of doubt, they might have become more conservative in their judgment and decided to refrain from making a diagnosis. To deal with this problem, it could be helpful to provide additional lists with exemplary descriptions. However, Margraf and Schneider (2009) pointed out, while referring to the study of Wittchen and Unland (1991), that checklists do not protect against confirmation bias in the diagnostic process. Therefore, the reliability and validity of diagnoses based on the usage of checklists depends on the clinician's training as well as on the homogeneity of the patients.
Concerning the recommendation of DBT in the BPD case, therapists who used checklists recommended DBT two and a half times as often as therapists who made a diagnosis based on OCJ. It is possible that checklists help therapists to maintain an overview of the full extent of the disorder, whereas therapists relying on OCJ may neglect problem areas or misjudge symptoms. Thus, therapists using checklists tend to consider a broad range of problems, especially cognitive and behavioral deficits, while simultaneously considering BPD specific dysfunctional behaviors like self-injury. Therefore, therapists using checklists were prepared to recommend an adequate therapeutic intervention (American Psychiatric Association [APA], 2005).
Regarding the hypothesis that the use of checklists is significantly associated with higher confidence with diagnostic decisions, the results of the study confirm our hypothesis: Therapists who used checklists reported significantly higher confidence with their diagnostic decisions in all three case vignettes compared to psychotherapists who did not use checklists. The results indicate that even a low-threshold diagnostic instrument such as a checklist is associated with increased diagnostician certainty when compared to OCJ, which is in line with prior results (Vicente et al., 2007). Considering the fact that psychotherapists made significantly more false-positive diagnoses, it is questionable whether higher confidence with diagnostic decisions related to the use of checklists is desirable.
We investigated the association of patients' gender with the accuracy of diagnostic decisions. Based on the results of our meta-analysis, we expected that there would no significant association between patients' gender and diagnostic decisions in all three case vignettes. In line with our expectations, we generally found no significant association between patients' gender and diagnostic decisions of psychotherapists or treatment recommendations. However, an association was found between therapists' estimations of motivation for treatment in male patients with MDD and BPD. Admittedly, the choice of disorders that were described in the case vignettes could explain this result. Alternatively, in the 1970s and 1980s, diagnosticians may also have become more aware of gender-related biases and thus tried to counteract them.
However, Widiger and Spitzer (1991) postulated that biases due to patients' gender can occur on two levels: on application of the diagnostic criteria and on the diagnostic criteria themselves. Accordingly, clinicians are more likely to make gender-linked diagnoses if patients' symptomatic behaviors correspond to gender stereotypes (Sprock, 1996;Crosby and Sprock, 2004;Flanagan and Blashfield, 2005). Likewise, experts tend to pathologize when there is a great gap between patients' symptoms and traditional gender characteristics (Sprock, 1996;Möller-Leimkühler, 2005). A number of stereotypes are reported, e.g., in relation to differences in symptom expression between men and women (Vredenburg et al., 1986). The case vignettes used in our study did not revert to different these aspects and thus, generalization of these results may be more limited. Future studies should therefore consider these serotype-related misdiagnoses in their case descriptions.
Finally, it should be mentioned that diagnostic criteria vary between ICD and DSM and that there is no gold standard for making "correct" diagnoses. Thus, simply showing that the usage of checklists can improve diagnosis by using the very criteria that have been used to make the classification of the disorder, is in itself not sufficient to improve the diagnostic process or improve decisions on adequate therapy. As recommended by Garb (2007) a combination of such diagnostic instruments with clinical judgment of a therapists is therefore highly recommended. Accordingly, diagnostic instruments -like diagnostic checklists or interviews -should be seen as helpful diagnostic tools, but they should be clinician administered.

Limitations
A limitation of this study may be the use of case vignettes. Thus, therapists were not able to request additional information during the diagnostic process and might have been limited in their decisions. Future studies should analyze the diagnostic process in detail and inquire as to which information therapists would additionally need for their diagnostic decisions. Studies may also examine why therapists decided not to make a diagnosis (e.g., which criterion was not fulfilled). This could clarify why therapists using checklists more often refrained from making a diagnosis, although a mental disorder was evident. Furthermore, the checklists used were created specifically for this study. Thus, the results regarding the effects of checklists may be limited to these specific checklists. Finally, future studies should include a greater selection of possible diagnoses and investigate comorbid diagnoses to illustrate more ecologically valid results.

CONCLUSION
In sum, the present study supports research indicating that diagnostic checklists can improve the accuracy of mental disorder diagnoses, and could be a useful tool to avoid false-positive diagnostic decisions. Nevertheless, the results of our study revealed that the use of checklists could also lead to more false-negative diagnoses when compared with OCJ. Furthermore, the use of checklists provides a higher level of confidence in diagnostic decisions compared with OCJ and is also associated with more correct treatment recommendations.
With regard to the investigation of an association between patients' gender and misdiagnoses, the results of this study revealed no gender bias and no association in relation to most diagnostic decisions.

ETHIC STANDARDS
The study was approved by the Ethics Committee of the Faculty of Psychology at the Ruhr-Universität Bochum.

AUTHOR'S DISCLOSURE
All authors read and approved the final manuscript and agreed to authorship. All authors state their compliance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). They also agree to the ethical standards of the Faculty of Psychology's Ethical Commission of the Ruhr-Universität Bochum. FUNDING JC position was funded by an Alexander von Humboldt professorship, awarded to JM. The study was not funded by any granting agency. There has been no prior publication.