Comparing the Self- and External Assessment Versions of the HCL-33 as Screening Instruments for Bipolar Disorder in Older Depressed Patients

Objectives: The misdiagnosis of bipolar disorder (BD) as major depressive disorder (MDD) is common in depressed older adults. The self-rated HCL-33 and its external assessment version (HCL-33-EA) have been developed to screen for hypomanic symptoms. This study compared the screening ability of these two instruments to discriminate BD from MDD. Methods: A total of 215 patients (107 with BD and 108 with MDD) and their carers were recruited. Patients and their carers completed the HCL-33 and HCL-33-EA, respectively. The consistency of the total score and the positive response to each item between the two scales was calculated with the intraclass correlation coefficient (ICC) and Cohen's kappa coefficient separately. Receiver operating characteristics (ROC) curves were drawn for both instruments. The optimal cut-off points were determined according to the maximum Youden's Index. The areas under the ROC curve (AUC) of the HCL-33 and HCL-33-EA were calculated separately and compared. The sensitivity and specificity at the optimal cut-off values were also calculated separately for the HCL-33 and HCL-33-EA. Results: The intraclass correlation coefficient (ICC) between the total scores of the HCL-33 and HCL-33-EA was 0.823 (95% CI = 0.774–0.862). The positive response rate on all items showed high agreement between the two instruments. ROC curve analysis demonstrated that the total scores of both HCL-33 and HCL-33-EA differentiated well between MDD and BD, while there was no significant difference in the AUCs between the two scales (Z = 0.422, P = 0.673). The optimal cutoff values for the HCL-33 and HCL-33-EA were 14 and 12, respectively. With the optimal cutoff value, the sensitivities of the HCL-33 and HCL-33-EA were 88.8% and 93.5%, and their specificities were 82.4% and 79.6%. Conclusion: Both the HCL-33 and HCL-33-EA had good screening ability for discriminating BD from MDD in depressed older adults.


INTRODUCTION
With the improvement of healthcare services in the past decades, many patients with bipolar disorder (BD) live on into older adulthood. The diagnosis of BD is associated with increased health service use and premature mortality in older adults (1). The prevalence of BD in this population varied greatly between different studies, ranging from 0.1% in the community to 8-10% in psychiatric hospitals (2).
Patients with BD are frequently misdiagnosed in clinical practice, in a range of 48% (3) to 69% of cases (4). The misdiagnosis of BD is also common in older adults, although the rate seems to decrease with age (5). Older BD patients, particularly those with BD-type II (BD-II) and BD-not otherwise specified (BD-NOS), were most often misdiagnosed as having major depressive disorder (MDD) (6). The misdiagnosis of BD as MDD could be partly attributable to the unawareness and underreporting of hypomanic symptoms, since patients with BD tend to seek medical help during their depressive but rarely during their hypomanic episodes, when they often enjoy the elevated mood (4). In addition, the course of BD often starts with a depressive episode and may even be followed by predominantly depressive episodes for a considerable period of time (7)(8)(9). The time gap between the first depressive episode and the subsequent first manic/hypomanic episode is longer in older than in younger patients: for example, this gap was 17 years in BD patients aged 60 years and above, while the corresponding figure was only 3.5 years in those aged 40 and below (10). The late appearance of a manic/hypomanic episode in older BD patients increases the likelihood of their BD being misdiagnosed as MDD. Late recognition of BD results in delayed, inadequate, and inappropriate treatment (11,12).
Regular screening for hypomania facilitates the timely diagnosis of BD. In the past decades, both clinician-administered and self-rated screening instruments have been developed to screen for hypomania. Structured clinical interviews, such as the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (SCID-5) (13) and the Composite International Diagnostic Interview (CIDI) (14), represent the most reliable and valid approach for diagnosing BD, but they are time-consuming and need to be administered by skilled clinicians (15,16). Several self-report measures have therefore been developed to screen for BD, including the 33-item Hypomania Checklist (HCL-33) (17). The HCL-33, a modified version of the 32-item Hypomania Checklist (HCL-32) (18), a widely-used self-report instrument to screen for hypomania, has been validated in depressed Chinese adults (17). Recently, a parallel external-assessment version of the HCL-33 (HCL-33-EA) has been constructed for patients' carers, family members and friends (19). Carers are familiar with patients' mood swings and daily lives. Moreover, cognitive, hearing, and visual problems, that are common in older patients, may hinder the use of selfreport scales.
The clinical features of BD in older patients are different from those in their younger counterparts (2), making it important to validate the HCL-33 and the HCL-33-EA in an older sample. Our study examined the screening ability of the HCL-33 and HCL-33-EA to differentiate BD from MDD and evaluated the consistency of the screening ability of the two instruments.

Participants
The study was conducted in the geriatric psychiatry department of Beijing Anding Hospital, a major tertiary psychiatric hospital in China, between July 2017 and November 2019. Patients attending the geriatric psychiatry department were consecutively invited to participate in the study if they were (1) aged 60 years old and above; (2) experiencing a depressive episode; (3) diagnosed with MDD or BD according to the 10th Revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10); (4) accompanied by at least one caregiver. The depressive episode and diagnoses of BD and MDD were initially established by the patients' treating psychiatrists and confirmed by a research psychiatrist using the Chinese version of the Mini International Neuropsychiatric Interview (MINI), Version 5.0, based on the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV) (20,21). The exclusion criteria were comorbid psychiatric disorders and severe medical or neurological conditions. Carers of each patient were also invited to participate in the survey without any exclusion criteria. The study protocol was approved by the Clinical Research Ethics Committee of Beijing Anding Hospital. All participants provided written informed consent.

Instruments
The demographic characteristics of patients and carers and patients' clinical features were collected. The HCL-33, Chinese version, is a self-administered and validated questionnaire (17). The HCL-33-EA is based on the original version of the HCL-33 and was administered to patients' carers (family members and close friends) to assess hypomanic symptoms. Both scales consist of 33 items with dichotomous responses of "yes/no, " comprehensively covering various aspects of hypomania. The total score on each scale is obtained by adding up all positive responses and ranges from 0 to 33, with a higher total score representing more severe hypomanic symptoms. In this study, all patients were asked to complete the HCL-33 and their carers the HCL-33-EA.

Statistical Analysis
All the analyses were performed using the Statistical Package for Social Sciences (SPSS), Version 20.0 and the Mecalc software. The normality of continuous variables was examined with the P-P plot. The paired sample t tests and Wilcoxon Signed Rank tests were used to compare the total scores of the HCL-33 and HCL-33-EA, as appropriate. The frequency of positive responses for items of the two scales was compared between BD and MDD patients using chi-square tests. The intraclass correlation coefficient (ICC) was employed to assess the consistency between the total scores of the two scales, while the Cohen's kappa coefficient was used to assess the consistency between the positive response to each item of the two scales with a Cohen's kappa coefficient value of "0-0.20" considered as slight, "0.21-0.40" as fair, "0.41-0.60" as moderate, "0.61-0.80" as substantial, and "0.81-1.00" as almost perfect agreement (22). A previous study (23) found a two-factor structure for the HCL-33, comprising "active/elated" (items 2-6, 8, 10-15, 17-19 and 21-27) and "substance use/indulging" (items 28, 29 and 30) factors. Principal components analysis (PCA) was used to examine the factor structure of the HCL-33-EA. As recommended previously (23), items were allocated to a specific factor when their loading value was > 0.4.
The sensitivity and specificity at each possible cutoff value of the HCL-33 and HCL-33-EA for discriminating BD from MDD were calculated using the receiver operating-characteristics (ROC) curve analysis with the MINI diagnosis as the gold standard. The discriminating ability was examined with the area under the ROC curve (AUC) where the AUC of >0.6 indicated acceptable discrimination (24). The optimal cutoff value was determined according to the Youden's Index, which was the maximum of summation of sensitivity and specificity at each cut-off value (25). The pairwise comparison of the ROC curves of the HCL-33 and HCL-33-EA was conducted using the DeLong method (26). The consistency between the HCL-33 and the HCL-33-EA was tested using Cohen's kappa with <0.40 signaling poor agreement, 0.40-0.75 fair to good agreement, and >0.75 excellent agreement (27). Significance level was set at P < 0.05 (two-side) in all analyses.

Demographic Characteristics of the Total Sample
In total, 232 patients were screened and invited to participate in our study; 17 (7.3%) refused or failed to complete the interview.  Table 1).

Positive Responses to the Individual Items of the HCL-33 and the HCL-33-EA
Positive responses to the HCL-33 items in patients with BD were significantly more frequent than in patients with MDD except for items 7, 16, and 28-32 ( Figure 1A). The same was true for the HCL-33-EA except for items 6, 7, 16, 29, 30, and 32 ( Figure 1B).

Factor Analysis of the HCL-33-EA
To explore the factor structure of HCL-33-EA, the Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy and the Bartlett's test of sphericity were performed (28,29), giving a KMO of 0.869 and χ 2 of 2,452.828 (P < 0.01), which indicates that the study sample was adequate and suitable for PCA. Nine factors with an eigenvalue greater than 1 emerged and cumulatively explained 60.4% of the total variance (Supplementary Figure  1). The first three factors had eigenvalues of 8.7, 2.1, and 1.9, respectively, and explained 38.2% of the total variance. Factor I consisted of 20 items (items 2-5, 8-15, 17-19, 21-24, and 27) and could be characterized as "active/elated", Factor II consisted of 2 items (items 25 and 26) and could be characterized as "irritable", and Factor III consisted of 6 items (items 7, 28-30, 32, and 33) and could be characterized as "substance use/indulging" ( Table 3). Few items loaded on other factors with eigenvalues > 1, making them difficult to characterize. A three-factor structure was ultimately established.

ROC Curves Analyses for the HCL-33 and HCL-33-EA
ROC curve analysis demonstrated that the HCL-33 total score could differentiate well between MDD and BD, with the AUC of 0.91 (95% CI=0.87-0.95). The optimal cut-off point was 14, with a Youden index of 0.71, and the corresponding sensitivity and specificity figures were 88.8 and 82.4%, respectively (Figure 2A). ROC curve analysis also demonstrated that the HCL-33-EA total score could differentiate well between MDD and BD, with the AUC of 0.90 (95% CI = 0.86-0.94). The optimal cutoff value was 12, with a Youden index of 0.73, and the corresponding sensitivity and specificity figures were 93.5 and 79.6%, respectively ( Figure 2B). There was no significant difference between the AUC of the HCL-33 and the HCL-33-EA (Z = 0.422, P = 0.673).

Kappa Coefficients of the HCL-33 and HCL-33-EA
Using the optimal cutoffs of 14 for the HCL-33 and 12 for the HCL-33-EA in the sample, the consistency of the HCL-33 and the HCL-33-EA was fairly good (kappa coefficient = 0.737, P < 0.001).

DISCUSSION
To the best of our knowledge, this is the first study to compare the screening consistency between the self-rated and external assessment versions of the HCL-33 in depressed older adults. The ICC between the HCL-33 and HCL-33-EA total scores was 0.823, which is similar to the finding in depressed younger adults (Spearman's r = 0.46) (19). The consistency of the total scores on the two instruments was higher in patients cared for by their spouses (ICC = 0.846, 95% CI = 0.766-0.900), followed by those cared for by offspring (ICC = 0.815, 95% CI = 0.747-0.866), and others (ICC = 0.672, 95% CI = 0.217-0.887), probably because spouses were more familiar with the patients' mood swings than other carers.
The positive responses to all the 33 items showed sufficient agreement between the two HCL scales, with most of the items achieving moderate agreement (κ > 0.4). This is slightly different from the findings of a study conducted in adult patients (30), which found insufficient agreement between the HCL-33 and HCL-33-EA in 6 of the 33 items. The present findings indicate high consistency between the items of the HCL-33 and HCL-33-EA in older depressed Chinese patients. In addition, the three-factor structure of the HCL-33-EA differed from the twofactor structure as reported previously for the HCL-33 (23). More specifically, although the same 19 items (items 2-5, 8, 10-15, 17-19, 21-24, and 27) loaded on Factor I of both scales, items 25 and 26 loaded on Factor I of the HCL-33 but on Factor II of the HCL-33-EA. Moreover, three items (28)(29)(30) that loaded on Factor II of the HCL-33 loaded on Factor III of the HCL-33-EA together with three further items (7, 32 and 33). Inconsistencies were also found between previous studies on the HCL-32 and HCL-33, including two-factor (17, 18, 31-33), three-factor (34-37) and four-factor structures (38,39). The discrepancy between studies could be partly due to different study characteristics (e.g., age, gender and severity of illness) and types of rater (e.g., patients for the HCL-33 vs. patients' carers for the HCL-33-EA).
The ROC curve analysis revealed that both the HCL-33 and HCL-33-EA total scores differentiated well between MDD and BD in older adults. The optimal cutoff value for the HCL-33 was 14 in this study, which is similar to the cutoff value of 15 found in Chinese adult patients (17). The sensitivity and specificity at the optimal cutoff value in our study were higher than those reported in adult patients (sensitivity: 88.8 vs. 62%; specificity 82.4 vs. 74%) (17). The optimal cutoff value for the HCL-33-EA total score was 12, with a higher sensitivity (93.5%) and a lower specificity (79.6%) than for the HCL-33, suggesting that carers could be more sensitive in recognizing hypomanic symptoms than the patients themselves. Since this was the first study examining the screening efficacy of the HCL-33-EA, direct comparisons with previous studies could not be made.
This study did not find any significant difference between the AUC of the two instruments (Z = 0.422, P = 0.673), which suggests that the HCL-33 and HCL-33-EA have similar ability to discriminate BD from MDD in depressed older patients. In addition, the kappa coefficients between the two instruments showed that the consistency was fairly good. A similar finding was reported in a Chinese adult sample, in which the total scores of HCL-33 and HCL-33-EA were significantly and positively correlated (19). The two instruments could therefore be interchangeable in clinical practice to discriminate BD from MDD in older Chinese patients.
The study has several limitations that need to be acknowledged. First, the sample size was relatively small, which may have decreased the statistical power of the findings. Second, due to the single study site, the sample could not represent depressed older adults from other regions in China. Third, patients with comorbid psychiatric disorders, severe medical or neurological conditions were excluded from the study, which further limits the generalizability of the findings.
In conclusion, both the HCL-33 and the HCL-33-EA showed satisfactory psychometric properties in discriminating BD from MDD in depressed older adults, while the consistency of the discriminative ability of the two scales was also comparable.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because the Clinical Research Ethics Committee of Beijing Anding Hospital that approved the study prohibits the authors from making publicly available the research dataset of clinical studies. Readers and all interested researchers may contact Y-TX (Email address: xyutly@gmail.com) for details. Y-TX could apply to the Clinical Research Ethics Committee of Beijing Anding Hospital for the release of the data. Requests to access the datasets should be directed to xyutly@gmail.com.

ETHICS STATEMENT
All participants provided written informed consent. The study protocol was reviewed and approved by the Clinical Research Ethics Committee of Beijing Anding Hospital.