Psychometric Properties of the Suicide Stroop Task in a Chinese College Population

Objective This study aimed to test the psychometric properties of the suicide stroop task in a Chinese college population. Methods College students (n = 121) who were in the 1st–4th grade, fluent in Chinese, and without color blindness were recruited from a university in Guangzhou, China from September to December 2019. Participants were administered the suicide stroop task at baseline and 1-month follow-up. Results The suicide stroop task showed excellent internal reliability (Cronbach’s α ranged from 0.940 to 0.953). However, the suicide stroop task did not reveal suicide-related attentional biases among current suicide ideators and was not significantly associated with the severity of suicidal ideation, depression, hopelessness, nor anhedonia (all p values > 0.05), indicating a lack of concurrent validity for the task. Additionally, the two-time data of interference scores could not generate intraclass correlation coefficients (ICCs) due to a negative average covariance among data, which indicated poor test–retest consistency for the task. Conclusion The results of this study did not support the use of the suicide stroop task on the identification of suicidal risk among Chinese college students. It is crucial to assess the psychometric properties of behavioral measures rigorously as self-report measures before large applications in clinical and community settings.


INTRODUCTION
Suicide is a major public health issue in young people, with suicide being the second leading cause of death in people between the ages of 15 and 29 years worldwide (Turecki and Brent, 2016). Additionally, suicide has received increasing attention among subgroups of these young people including college students. A meta-analysis showed that pooled prevalence estimates of lifetime suicidal ideation, plans, and attempt were 22.3, 6.1, and 3.2% among college students, and higher estimates were found in samples from Asia (Mortier et al., 2018). It is important to effectively identify people at risk for suicide behaviors to prevent fatal attempt, but the prediction of suicide continues to be a critical challenge (Franklin et al., 2017).
Currently, the screening of suicide risk commonly relies on self-report. However, self-report assessments are limited by the individuals' willingness (e.g., to avoid hospitalization) and ability to report suicidal thoughts (i.e., not aware of suicidal thoughts/suicidal risk) . Moreover, a systematic review found that the Beck Hopelessness Scale and the Beck Suicide Intent Scale, two commonly used self-report suicide risk scales, did not have sufficient evidence to support their use on predicting suicide in high-risk samples (Chan et al., 2016). Thus, it seems insufficient to identify suicide risk by self-report alone, and there are increasing arguments on the need of more objective tools on suicide risk determination.
According to the cognitive model of suicidal behavior, suicidespecific attentional bias leads to a fixation on suicide as the sole escape solution, and combined with a state of hopelessness, it would ultimately result in a suicide attempt (Wenzel and Beck, 2008). Previous research found that suicide-specific attentional bias is relevant to previous suicidal attempts in clinical samples (Williams and Broadbent, 1986;Becker et al., 1999;Cha et al., 2010). Specifically, the study conducted by Cha et al. (2010) suggested that suicide-specific attentional bias can be used as a potential behavioral marker to predict future suicide attempt. As these results were very promising, researchers in different countries tried to generalize the measure used in Cha et al.'s study (Cha et al., 2010), the suicide stroop task, into different samples including college students, patients with mood disorders, and community-based samples reporting past-month suicidal ideation (Chung and Jeglic, 2016;Richard-Devantoy et al., 2016;Cha et al., 2017). However, mixed findings were reported. Additionally, a systematic review of the existing seven studies found that the suicide stroop task had excellent internal reliability, but poor classification accuracy to classify suicide attempter from non-attempters (Wilson et al., 2019).
The validity of the suicide stroop task has not been tested in the Chinese context. In this current study, we made a Chineselanguage adaption of the suicide stroop task and tested its internal reliability, concurrent validity, and test-retest reliability in Chinese college students. This study aimed to provide more evidence whether the suicide stroop task could be used in a community-based sample in which the majority would not report suicidal ideation and have never made a serious suicidal attempt before. Based on previous research, we hypothesized that (1) those who reported current suicidal ideation (current SI) would also have slower reaction times to suicide-related words than those without current suicidal ideation (nonideator) and (2) the performance of the suicide stroop task would be significantly associated with suicidal ideation severity, depression, hopelessness, and anhedonia.

Participants and Procedures
College students who were in the 1st-4th grade, fluent in Chinese, and without color blindness were recruited from a university in Guangzhou, China from September to December 2019. Participants were recruited online (e.g., WeChat group). Interested participants were invited to a computer laboratory. All participants were asked to provide written informed consent and then to complete the baseline survey and the suicide stroop task.
One month later, participants were invited to complete the retest survey and the suicide stroop task in the same laboratory.
This study was approved by the institutional review boards of the Affiliated Brain Hospital, Guangzhou Medical University. Written informed consent has been obtained from all participants.

The Suicide Stroop Task
The suicide stroop task is a computer-based behavior task that uses response latencies of how quickly participants identify the color of different words presented on a computer screen. The test material and test conditions were replicated based on the methodology used in Cha et al. (2010). In this study, stimuli for the task were presented, and response latencies were recorded using E-prime 2.0 software.
After reading the instructions, participants were asked to complete eight practice trial, followed by 48 critical trials. Each trial started with a blank white screen for 4 s, followed by a  centered "+" in red for 1 s, another blank screen for 1 s, and then the word either in blue or in red color; the words remained on the screen until either a blue or a red key was pressed. During the critical trial, neutral [house (fangwu), paper (baizhi), and car (qiche)], positive [happy (kaixin), success (chenggong), and pleasure (kuaile)], negative [alone (gudu), rejected (jujue), and stupid (yuchun)], and suicide-related [funeral (zangli), dead (siwang), and suicide (zisha)] words in Chinese characters were presented. After discussion with psychologists, museum, and engine, which were used as neutral words by Cha et al. (2010), were replaced by house and car (in Chinese characters) based on the Chinese contexts in this study. Each of these words was presented four times in random order during the 48 critical trials. The interference score for each category was calculated by subtracting the mean response time (RT) for neutral words from the mean RT for positive, negative, or suicide-related words.

Socio-Demographics
Socio-demographic information including age, gender, residence, single child or not, and relationship status was collected.

History of Suicidal Attempts
In this study, we used the introduction interview part of the Pathway to Suicidal Action Interview (PSAI) to collect data on previous suicidal behaviors. Approved and assisted by the first author of the PSAI [Millner, A.J. (Millner et al., 2017)], a panel of three bilingual public health researchers, who were also trained in psychiatry and suicide prevention, translated the original English version of the PSAI into simplified Chinese. For an action to be considered as a suicidal attempt, an individual must have had engaged in a potentially deadly behavior with some intention to die (Millner et al., 2017).

Current Suicidal Ideation
The Beck Sale for Suicidal Ideation (BSSI) (Beck et al., 1979) was used to assess the severity of suicidal ideation in the past week. Each item is rated on a 0-2-point scale, with higher scores reflecting more severe suicidal ideation. If one rated either item four or five with a score of one or greater, the person was considered as having current suicidal ideation.

Depression
The degree of depression was assessed by the Patient Health Questionnaire Depression Scale (PHQ-9) (Bian et al., 2009). It consists of nine items related to the diagnostic criteria of major depressive disorder based on the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV). The total score ranges from 0 to 27, with higher scores indicating higher levels of depression.

Hopelessness
Hopelessness was measured by the 4-item Beck's Hopelessness Scale (BHS-4) (Yip and Cheung, 2006;Ma et al., 2020). It consists of four items relevant to success, dark future, breaks, and faith. Item responses range from 1 (strongly agree) to 5 (strongly disagree). The possible score ranges from four to 20, and a higher score represents a higher level of hopelessness.

Anhedonia
Anhedonia was measured by the Snaith-Hamilton Pleasure Scale (SHAPS) (Snaith et al., 1995). It is a validated and reliable scale that was developed to assess the ability to experience pleasure in normally pleasurable activities in the past few days. It consists of 14 items, and each item is rated on a 4-point Likert format, ranging from 1 (strongly agree) to 4 (strongly disagree) (Hu et al., 2017). The total score ranges from 14 to 56, with higher scores indicating lower ability to experience pleasure.  All suicide stroop score means and standard deviations are reported in milliseconds (ms). Interference, suicide/negative/positive word RT-neutral word RT. *< 0.05; **< 0.01.

Statistical Analysis
Regarding the suicide stroop task, we included trials with correct responses in the analysis. For all participants, the rate of correct response was 97.7%, and the correct response rates for suiciderelated (97.5%), negatively-valenced (97.0%), positive-valenced (97.9%), and neutral words (98.3%) did not significantly differ from one another (χ 2 = 5.301, p = 0.151). Additionally, we eliminated trials with response latencies ±2 SD from each participant's mean response latency. Internal reliability was evaluated using the criterion of Cronbach's alpha ≥ 0.70. Regarding concurrent validity, we firstly performed independent sample t-tests to assess the group differences in mean RTs or interference scores (suicide/negative/positive word RT-neutral word RT) for each valence word, and then we conducted Group × Valence (repeated measures analysis) ANOVAs. Group comparisons included current ideators vs. non-ideators. For the withinsubjects factor, valence had four levels in mean RT analyses (i.e., suicide-related, negative, positive, and neutral) and three levels in interference scores (i.e., suicide-related, negative, and positive). Additionally, Pearson correlation analysis was used to evaluate the correlations between suicide stroop task performance (mean RTs and interference scores) and the severity of current suicidal ideation, depression, hopelessness, and anhedonia. Test-retest reliability was assessed by the paired-sample t-test and intraclass correlation coefficients (ICCs). All analyses were performed by SPSS version 23.0 (SPSS Inc., Chicago, IL, United States). The level of significance was set at 0.05.

Demographic Characteristics
As presented in Table 1, a total of 121 college students participated in this study. Among them, 62.0% were female, and the mean age was 19.0 years (SD = 4.1). There were 3.3% of participants reporting previous suicidal attempts and 28.9% reporting current suicidal ideation. One month after baseline, 103 students (85%) completed the retest. Except for previous suicidal attempts, no significant differences were found in sociodemographic or psychosocial characteristics at baseline between lost samples and those who finished the retest (Table 1).

Internal Reliability
The mean RTs for each valence word demonstrated excellent internal reliability (Cronbach's α ranged from 0.940 to 0.953).

Concurrent Validity
Across the sample, a significant difference was found from the mean RT for suicide-related words, M = 513.03 (SD = 142.39 ms); negative valenced words, M = 513.33 (SD = 151.64 ms); positive valenced words, M = 499.55 (SD = 144.47 ms); and neutral words, M = 507.55 (SD = 151.70 ms), F = 5.139, p = 0.025. A least significant difference (LSD) analysis was conducted for multiple comparisons. The results of LSD indicated that the mean RT for suicide-related words and negative valenced words was significantly longer than the mean RT for positive valence words (ds = −13.486, −13.782, ps < 0.05).
As shown in Table 2, the results of independent sample t-tests revealed that no group difference in mean RTs or interference scores for each valence word was related to current suicidal ideation (t = 0.410-1.012, p = 0.314-0.683). Group × Valence interactions (repeated measures analysis) were also not significant when testing mean RTs or interference scores for two-group comparison (current SI vs. non-ideators, F = 0.795, p = 0.374).
As shown in Table 3, the results of Pearson correlation analysis showed that the interference score for each valence word was not significantly associated with the scores of suicidal ideation severity, depression, hopelessness, or anhedonia (rs = −0.094-0.085, ps > 0.05).

Test-Retest Reliability
As shown in Table 4, the paired-sample t-test showed no significant differences for mean RTs or interference scores of each All suicide stroop score means and standard deviations are reported in milliseconds (ms). Mean RT, mean response time; interference, suicide/negative/positive word RT-neutral word RT. *The ICC value was negative due to a negative average covariance among data collected at baseline and retest, which violated reliability model assumptions.
Frontiers in Psychology | www.frontiersin.org valence word between baseline and retest. However, the twotime data of interference scores could not generate ICC values due to a negative average covariance among data, which violated reliability model assumptions.

DISCUSSION
The goal of the current study was to test the psychometric properties of the suicide stroop task. Consistent with previous research (Wilson et al., 2019), the mean RTs for all valence words demonstrated good internal reliability. However, the suicide stroop task performance lacked concurrent validity, as the suicide stroop task did not reveal suicide-related attentional biases among current suicide ideators. We also found that the suicide stroop task performance was not significantly associated with the severity of suicidal ideation, depression, hopelessness, nor anhedonia, which indicated a lack of concurrent validity for the task as well. Additionally, the interference scores of all stimuli showed poor test-retest consistency, whereas other selfreport measures (i.e., BSSI, PHQ-9, BHS-4, and SHAPS) showed moderate-to-good test-retest consistency. Thus, the results of this study did not support the use of the suicide stroop task on the identification of suicidal risk among Chinese college students. There might be some reasons for these results. First, the general reaction time is associated with age-related differences in cognitive ability. Our samples were much younger than those in studies with positive results (Williams and Broadbent, 1986;Becker et al., 1999;Cha et al., 2010). Second, the suicide stroop task might be more sensitive in depressive people with recent suicidal attempts (Chung and Jeglic, 2016), whereas in this study, the majority were not depressive and did not have a recent suicidal attempt. Third, as the suicide stroop task uses manual reaction times (i.e., press a key) in responding to the stimuli as a measure, other paradigms, such as voice and eye movements in responding to the stimuli, might perform better. This study is limited by a small convenience sample. Among 121 participants, 35 participants had current suicidal ideation, and four participants reported previous suicidal attempts. However, in the community, most people will not report suicidal ideation, and the majority will have never made a serious suicide attempt before. That is the reason why we need more sensitive measures with high accuracy on screening suicidal risk.
Over the past 50 years, there was a surge of research designed to identify the risk factors for suicidal behaviors, and many different theories of suicide have been proposed (Wenzel and Beck, 2008;Franklin et al., 2017). It is still a critical challenge on the identification of suicide risk and the prediction of suicide. We believe it is of great meaning to explore more objective measures or behavior markers related to suicidal behaviors. However, it is crucial to assess the psychometric properties of behavioral measures rigorously as self-report measures before large applications in clinical and community settings.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Affiliated Brain Hospital of Guangzhou Medical University. The patients/participants provided their written informed consent to participate in this study.