The Psychometric Properties and Clinical Utility of the Korean Version of GAD-7 and GAD-2

Generalized anxiety disorder (GAD) is a common but serious form of anxiety disorder. Despite this, the rate of GAD recognition in primary care remains low in both Western and Eastern countries. The GAD-7 and GAD-2 were developed to efficiently identify people with GAD, and their reliability and validity have been well-documented in Western countries. The GAD-7 and GAD-2 have also been widely utilized to screen for other anxiety disorders; however, their diagnostic utility has not been fully justified with empirical support, especially in East Asian samples. In this study, we examined the diagnostic sensitivity and specificity of these screening tools for identifying individuals with GAD or other anxiety disorders, and recommended screening cutoff scores for GAD and other anxiety disorders for use in Korea. Based on the rigorous standard suggested by the Quality Assessment of Diagnostic Accuracy Studies-2, a total of 1,157 participants randomly recruited from the community completed the GAD-7, GAD-2, and other anxiety and depression measures in a counter-balanced order. All participants were assessed, and their psychiatric diagnosis confirmed through a structured clinical interview conducted by independent clinicians blinded to the results of the self-report questionnaires. The GAD-7 and GAD-2 both showed excellent reliability and validity. Notably, both the GAD-7 and GAD-2 demonstrated acceptable diagnostic accuracy in detecting GAD with similar recommended cut-off scores as those reported in Western countries, but unacceptable diagnostic accuracy for other anxiety disorders. We conclude that given their brevity, the GAD-7 and GAD-2 can be well-utilized to identify people with GAD for preventative evaluation and treatment in Korea. Use of the GAD-7 and GAD-2 for screening other anxiety disorders should be cautioned.


INTRODUCTION
Generalized anxiety disorder (GAD) is one of the most common yet serious forms of anxiety disorder, characterized mainly by pervasive, uncontrollable, and long-lasting worries. According to a global review on the prevalence of anxiety disorders, the lifetime prevalence of GAD was estimated to be 6.2% (95% confidence interval [CI]: 4.0-9.2%) (1) and 2.2% among adolescents (2). GAD often follows a chronic course and deteriorates overall quality of life and subjective well-being (3)(4)(5). Given the chronic nature and adverse functional outcomes of GAD, early diagnosis and timely intervention are essential for individuals with GAD. However, due to frequent comorbidities and the nature of the disease, which is accompanied by various physical symptoms, approximately half of individuals with GAD consulted their primary care physicians rather than mental health professionals when seeking treatment for anxiety symptoms (6). Unfortunately, the rate of GAD recognition in primary care remains between 29.0% and 34.4% in Western countries (6,7) and at 33.3% in non-Western countries (8,9). Given this, a valid and reliable diagnostic tool for GAD in a brief format (i.e., a minimum number of questions) would facilitate early detection and proper timely intervention, not only in primary care institutions but in mental health settings as well.
The generalized anxiety disorder 7-item scale [GAD-7; (10)] was developed with the clear purpose of screening patients with GAD. The scale has also been widely used in both clinical and research settings to monitor the severity of GAD symptoms. It was proven to be a reliable and valid instrument, and its seven items reflect most of the GAD diagnostic domains in the Diagnostic and Statistical Manual of Mental Disorders, 4 th Edition (DSM-IV) (11). Further, the GAD-7 has been found to have clinical utility in screening for other anxiety disorders in several studies, although its sensitivity and specificity were lower than for GAD (12,13). GAD is highly comorbid with other anxiety disorders and typically precedes the onset of the comorbidities, which contributed to the conceptualization of GAD as the "basic" anxiety disorder (14,15). In sum, as GAD also shares common features of other anxiety disorders including uncontrollable worry and accompanying somatic symptoms (16), a screening tool for GAD may potentially detect other anxiety disorders as well.
A diagnostic meta-analysis of the GAD-7 reported its sensitivity and specificity for screening GAD as 0.83 (95% CI: 0.71-0.91) and 0.84 (95% CI: 0.70-0.92), respectively, at the cutpoint of 8 or greater (17). For identifying anxiety disorders, sensitivity and specificity values ranged from 0.77 to 0.91 and 0.74 to 0.83, respectively, at the same cut-point (17). However, to use the GAD-7 as a screening tool for anxiety disorders, the cutoff score should be studied further (17).
Among its seven items of the GAD-7, items 1 and 2 represent the core anxiety symptoms. These thus comprise the GAD-2, an ultra-brief version of the GAD-7 questionnaire, which can be used in primary care settings with limited time and resources (12). Plummer et al. (17) reported acceptable sensitivity and specificity values for screening GAD at a GAD-2 cut-point of 3 [sensitivity: 0.76 [95% CI: 0.55-0.89], specificity: 0.81 [95% CI: 0.60-0.92]]. However, empirical evidence for GAD-2 is also insufficient to determine a cutoff score for identifying any other anxiety disorders, because sensitivity and specificity values were varied (17).
Therefore, the purposes of this study were (1) to examine the psychometric properties and diagnostic sensitivity and specificity of these screening tools for identifying individuals with any anxiety disorder, and (2) to determine cutoff scores for identifying both GAD and other anxiety disorders.

METHOD Participants
The present study was carried out as part of a large nationally funded research project entitled, the Development and Validation of the Korean Depression and Anxiety Scales, conducted from September 2015 to August 2018. The ethical approval was accepted by Korea University Institutional Review Board. A total of 1,228 individuals were recruited for this study through two routes: online recruiting advertisements and introduction to potential research participants by hospital staff. All individuals voluntarily opted to participate in the study. The only inclusion criterion was being age of 19 years or older. Exclusion criteria were not specified to minimize sampling bias. For rigorous evaluation of the accuracy of the screening tools, the methodology of the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) (18) was applied. The QUADAS-2 evaluates the quality of screening tools in four domains: patient selection, index test, reference standard, and flow and timing. To avoid bias in participant selection, the samples in this study were randomly recruited and minimal exclusion criteria were specified. However, 71 individuals (5.78%) were excluded either because they did not complete the questionnaire or because they could not answer questions properly as a result of their psychiatric or medical symptoms. A final total of 1,157 participants were included in the present analysis.

Measures
Mini-International Neuropsychiatric Interview-Plus Version 5.0.0 (MINI) The MINI is a structured clinical interview used to diagnose psychiatric disorders according to the DSM-IV and the International Classification of Diseases, 10th Edition (ICD-10). The Korean version of the MINI (19), which showed overall good agreement between MINI based and expert diagnoses, was used in this study. The MINI was utilized as a reference standard (i.e., criterion). The one-on-one, in-person clinical diagnostic interview took ∼30-50 min per participant. The MINI was administered by licensed clinical psychologists, psychiatrists and supervised clinical psychology senior students. The inter-rater reliability of the MINI was 0.92. Final psychiatric diagnoses were confirmed by licensed clinical psychologists and the psychiatrist.

GAD-7 and GAD-2
The GAD-7 is a simple, 7-item self-administered instrument designed to screen for GAD and used to assess the intensity of symptoms. Subjects are asked to rate the frequency at which they have been disturbed by each symptom over the past 2 weeks using a 4-point Likert scale. The Korean version of the GAD-7 (20), which is available on the Patient Health Questionnaire website (http://www.phqscreeners.com), was used in the present study. In the previous research (21), the items of the Korean version of the GAD-7 were translated and then back-translated by an independent bilingual speaker. The original version and back translated versions were compared by another native English speaker who concluded that both were identical. Korean version of GAD-7 showed excellent internal consistency (α = 0.93).
The first three items of the GAD-7 relate to two core criteria of GAD (A and B) defined in the DSM-IV (10,11). Therefore, use of a short-form version consisting of only the first two items was proposed, resulting in the GAD-2 scale. The GAD-2 is reported to be a reliable and valid tool for screening GAD, both when performed alone or when extracted from previous GAD-7 results (22). The two items showed the highest correlation with the GAD-7 total score (Pearson's r = 0.94, p < 0.01).

Anxiety Measures Beck Anxiety Inventory (BAI)
The BAI (23) scale is widely used to assess the severity of anxiety and track treatment progress. This 21-item self-report inventory covers the affective, cognitive, and physical domains of anxiety. The measure asks respondents to indicate the extent to which they have suffered from each symptom over the past week using a 4-point Likert scale. The Korean version of the BAI (24) was used in this study, and showed excellent internal consistency (α = 0.96).

Penn State Worry Questionnaire (PSWQ)
The PSWQ (25) is a 16-item self-administered instrument designed to measure the frequency and intensity of pathological worry. Each item is assessed on a 5-point Likert scale. In this study, the Korean version of the scale (26) was used, and showed very good internal consistency (α = 0.85).

Depression Measures Beck Depression Inventory-II (BDI-II)
The BDI-II (27) is a well-accepted self-report inventory consisting of 21 items that assess the affective, cognitive, motivational, and physiological severity of depressive symptoms. Subjects rate each item using a 4-point Likert scale. The Korean version of the BDI-II (28) was used in this study, and showed excellent internal consistency (α = 0.95).

Center for Epidemiologic Studies Depression Scale (CES-D)
The CES-D (29) is a 20-item self-report measure developed to easily identify depression in the general population. Subjects are asked to indicate how often they have experienced emotional and physical symptoms and interpersonal difficulties over the previous 7 days. Each item is rated on a 4-point Likert scale. In the present study, the Korean version of the CES-D (30) was used, and showed very good internal consistency (α = 0.85).

Research Design
When individuals indicated their intention to participate verbally or by the response to an e-mail, research assistants coordinated their dates for participation. Participants were invited to a University research lab or two other general hospitals and received a detailed explanation of the current study. After obtaining a signed written informed consent from each participant, they were asked to complete a self-report assessment battery consisting of a demographic information questionnaire, the GAD-7, and other anxiety or depression measures. In most cases, the questionnaires were immediately retrieved, but for some participants who needed additional time for completion, the remaining items were completed at home and returned within a week at the latest. Licensed clinical psychologists, psychiatrists, and trained and supervised clinical psychology graduate research assistants administered face-toface diagnostic interviews using the MINI (31) before or after participants completing the self-report assessment battery. All procedures, including the questionnaires and interview, took approximately 45-75 min. Participants were compensated for their participation, as specified in the approved Institutional Review Boards protocol. According to the recommendation of the QUADAS-2, to avoid bias in sampling and evaluation, all participants were treated the same way regardless of patient or non-patient. The self-report assessment battery and the MINI were conducted, scored, and interpreted separately by independent evaluators without knowing the results of the assessment battery or psychiatric diagnoses from the MINI.

Analysis
The internal consistency of responses in the GAD-7 was examined using Cronbach's alpha and item-total correlation. Validity evidence was collected not from a single source but from several, following the recommendations of the Standards for Educational and Psychological Testing provided by AERA, APA, and NCME (32). Convergent validity was assessed by calculating correlations of the GAD-7 and GAD-2 with other anxiety scales, namely the BAI and PSWQ. Discriminant validity was assessed by examining correlations of the GAD-7 and GAD-2 with depression measures, namely the BDI-II and CES-D. Discriminant validity was also assessed by independent t-test. The mean scores of the GAD-7 and GAD-2 in participants with GAD were compared to the scores of the individuals without GAD. To avoid multiple comparison problems, we use Bonferroni correction, and the p-value was 0.0125 in these independent t-tests. The examination of diagnostic criterion validity included receiver operating characteristic (ROC) analyses and investigation of diagnostic sensitivity and specificity, positive and negative predictive values (PPV and NPV), and positive and negative likelihood ratios (PLR and NLR) at various cutoff scores concerning the diagnosis of GAD or any anxiety disorder based on the MINI. The optimal cutoff points for the GAD-7 and GAD-2 were determined where both diagnostic sensitivity and specificity were maximized. Data analysis using the Statistical Package for the Social Sciences (SPSS) version 24.

Demographics
The total sample (N = 1,157) had a mean age of 37.31 (SD = 14.76, range 19-85), and 772 (66.7%) of the subjects were women. The mean years of education was 14.63 (SD = 2.98). All participants were South Korean.

DISCUSSION
This study was conducted to determine whether the GAD-7 and GAD-2 were able to detect GAD specifically and any anxiety disorder including GAD. The results suggested that the Korean versions of the GAD-7 and GAD-2 are reliable and valid measures for detecting GAD. However, use of the GAD-7 and GAD-2 to screen for any anxiety disorder should be cautioned.
The GAD-7 and GAD-2 showed excellent internal consistency and good convergent validity with other anxiety measures. The total GAD-7 score was strongly correlated with the scores of the BAI and PSWQ. The total GAD-2 score, which was not statistically different from that of the GAD-7, was also significantly correlated with both BAI and PSWQ scores. These results mean that GAD-7 and GAD-2 have a good convergent validity with anxiety measures.
Both the GAD-7 and the GAD-2 were correlated with the depression scales. Specifically, the correlations between the GAD-7 and the depression measures were stronger than with the PSWQ. Correlations of the GAD-2 with the CES-D were higher than that of the PSWQ. High correlations between GAD-7/2 and depressive symptoms measures were not hypothesized, but interesting results since some of the previous studies reported similar correlational patterns (10,34,35). In addition, Watson (36) argued that GAD is more similar to depressive disorders than to the other anxiety disorders. More importantly, it has been reported that Asians with GAD and depressive disorders have more physical symptoms than cognitive symptoms (i.e., pathological worries) (8). Despite the high correlations between GAD-7/2 and depressive symptoms measures, participants with GAD had the highest means on the GAD-7/2 than those with other anxiety disorders or depressive disorders, providing evidence for discriminant validity of GAD-7/2, and their clinical utility as a screening tool for GAD. Therefore, after obtaining GAD-7 or GAD-2 results, clinicians should also gather additional information about depressive symptoms for differential diagnosis or treatment planning.
The Korean versions of the GAD-7 and GAD-2 detected GAD with excellent accuracy. ROC analysis showed high accuracy for both the GAD-7 and GAD-2 in detecting probable cases of GAD. These AUC values are relatively high compared with previous research (17). The optimal cutoff score for GAD, at which the balance of sensitivity and specificity was maximized, was 8 or greater for the GAD-7 and 3 or greater for the GAD-2. These cutoff points were consistent with the scores suggested by previous meta-analysis (17). Additionally, both the GAD-7 and GAD-2 showed low NPV, indicating a false negative rate of about 2% when detecting GAD with the GAD-7 and GAD-2. These characteristics indicate that the GAD-7 and GAD-2 are a useful screening tool for GAD patients in various settings. However, it should be noted that as in previous studies, PPV was quite low for detecting GAD using the GAD-7 or GAD-2 (10,37,38). The low PPV indicates that the GAD-7 and GAD-2 could detect too many false positives. At a cutoff score 8 or greater for the GAD-7, 69% of participants were not actual GAD patients, and at a cutoff score 3 or greater for the GAD-2, 66% of subjects were not actual GAD patients. This issue is partially due to the low prevalence of GAD (7.7% in this study) because PPV drops with a prevalence rate (33). We thus calculated PLR and NLR to compensate for the prevalence effects. PLR for the GAD-7 was 5.25, meaning that GAD-7 scores exceeding 8 are obtained approximately five times more often in subjects with GAD than subjects without GAD. PLR for the GAD-2 was 6.11, meaning that a GAD-2 score exceeding 3 is obtained approximately six times more often from subjects with GAD than subjects without GAD. These results indicate that both the GAD-7 and GAD-2 could provide "clinically useful information" in identifying GAD (33).
We also investigated whether the GAD-7 and GAD-2 could be used to detect any anxiety disorder. ROC analysis of the GAD-7 and GAD-2 indicated moderate accuracy; the cutoff score was 5 or greater for the GAD-7 and 2 or greater for the GAD-2. The GAD-7 cutoff score was quite lower than in previous meta-analysis (8 or greater) (17). In the case of the GAD-2, sensitivity and specificity varied throughout previous studies, and thus GAD-2 cutoff scores could not be drawn from the previous meta-analysis (17). Although NPV was high for both the GAD-7 and the GAD-2, PPV was quite low. Using the GAD-7 and GAD-2 cutoff scores, about 60% of subjects detected were not actual anxiety disorder patients. Moreover, the low PLR and high NLR were more problematic when detecting anxiety disorders using the GAD-7 or GAD-2. A PLR of <3.00 and an NLR of more than 0.33 rarely alter clinical decisions (33), and thus the GAD-7 and GAD-2 do not provide additional information in detecting any anxiety disorders. Thus, to prevent misdiagnosis and unnecessary costly intervention when screening for any anxiety disorders, it is recommended that the GAD-7 or GAD-2 be used in combination with additional clinical interviews or other screening tools specifically designed to diagnose anxiety disorders (17).
The limitations of the current study are as follows. First, participants were not recruited by stratified random sampling. Although subjects were recruited randomly, with minimal exclusion criteria, from online advertisements and introduction by hospital staff, age and gender quotas were not applied. Many subjects of this study were women (66.7%), were in their 20s (42%), and were highly educated (an average of 14.63 years of education). Therefore, future study should be conducted with subjects with equal gender and age distribution. Second, it was unclear why the results of this study (low PLR and high NLR of the GAD-7 and GAD-2 in detecting anxiety disorders) differed from those of previous study (12). These discrepancies might be due to cultural factors. All of our subjects were Asian (i.e., South Korean), whereas about 97% of subjects in previous study reported white, Hispanic, and Black ethnic backgrounds. In a previous study, patients with anxiety disorders in Asia tend to report somatic symptoms as emotional distress (8). It is speculated that since the GAD-7 and GAD-2 items do not reflect or measure various somatic symptoms, GAD-7/2 in the current study might be poorer in identification of anxiety disorder in our study sample than previous studies. Cultural differences (or consideration) while administering and interpreting the GAD-7/2 scores have been reported in a previous study (39) in which Parkerson et al. (39) indicated that individuals who defined themselves as Black/African American endorsed significantly lower on some items (e.g., feeling nervous, irritable, restless, etc.) of the GAD-7 than other ethnic (i.e., White and Hispanic) group. Thus, these discrepancies, which are not yet fully understood, should be a subject of future study.
Despite these limitations, the current study provides evidence on the psychometric properties and clinical utility of both the GAD-7 and the GAD-2 as reliable and valid screening tools for people with GAD. Because the GAD-2 is an ultra-brief measurement, it can be a useful tool for various clinical settings (e.g., primary care) with limits on time and resources. It is expected that both measures could be widely used to detect GAD in many clinical settings, and thus provide optimal and timely intervention in community.

DATA AVAILABILITY
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of Korea University Institutional Review Board [1040548-KU-IRB-15-92-A-1(R-A-1)(R-A-2)(R-A-2)] with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Korea University Institutional Review Board [1040548-KU-IRB-15-92-A-1(R-A-1)(R-A-2)(R-A-2)].