Prediction of adolescent suicidal ideation after the COVID-19 pandemic: A nationwide survey of a representative sample of Korea

Objective This study developed a model to predict groups vulnerable to suicidal ideation after the declaration of the COVID-19 pandemic based on nomogram techniques targeting 54,948 adolescents who participated in a national survey in South Korea. Methods This study developed a model to predict suicidal ideation by using logistic regression analysis. The model aimed to understand the relationship between predictors associated with the suicidal ideation of South Korean adolescents by using the top seven variables with the highest feature importance confirmed in XGBoost (extreme gradient boosting). The regression model was developed using a nomogram so that medical workers could easily interpret the probability of suicidal ideation and identify groups vulnerable to suicidal ideation. Results This epidemiological study predicted that eighth graders who experienced depression in the past 12 months, had a lot of subjective stress, frequently felt lonely in the last 12 months, experienced much-worsened household economic status during the COVID-19 pandemic, and had poor academic performance were vulnerable to suicidal ideation. The results of 10-fold cross-validation revealed that the area under the curve (AUC) of the adolescent suicidal ideation prediction nomogram was 0.86, general accuracy was 0.89, precision was 0.87, recall was 0.89, and the F1-score was 0.88. Conclusion It is required to recognize the seriousness of adolescent suicide and mental health after the onset of the COVID-19 pandemic and prepare a customized support system that considers the characteristics of persons at risk of suicide at the school or community level.


Introduction
Since the WHO declared the COVID-19 pandemic in March 2020, the world has been experiencing a crisis of COVID-19 until now. The COVID-19 pandemic has affected all age groups. Especially, the youth have experienced indirect effects such as the unemployment of their parents and the separation from society or peers due to the school lockdown, as well as direct effects such as infection. In 2020, as COVID-19 spread in South Korea, the government extended the winter break of elementary, middle, and high schools nationwide three times, and schools started on April 6 instead of March 2. Moreover, as the social distance step was raised to the lockdown level, intensive COVID-19 response measures were implemented. For example, face-to-face classes were banned and they were replaced with online classes (1).
The UN (2020) was concerned that the crisis of the COVID-19 pandemic, an infectious disease, was rapidly increasing a crisis in children's health. Moreover, Choi (2) named the youth group suffering from the COVID-19 pandemic as the "COVID generation" and recommended showing more attention to their mental health because the COVID generation experienced interactions such as interpersonal relationships much less than the generations before the COVID-19 pandemic. People are highly concerned about its effect on suicidality after the outbreak of the COVID-19 pandemic. COVID-19 may increase the risk of developing suicidal behaviors by affecting numerous wellestablished suicide risk factors (3)(4)(5). Nevertheless, there are still not enough epidemiological studies using big data to identify groups vulnerable to mental health deterioration among adolescents after the COVID-19 pandemic.
In contrast, suicide refers to the act of taking own life with the intention of causing death (6). Adolescence is a stormy period, a transition period from childhood to adulthood, and adolescents may resolve the confusion about their identity and the uncertainty about their future by using extreme methods such as suicide without achieving psychological stability and balance (7). Many researchers (8,9) have emphasized the importance of early detection of adolescent suicide while pointing out that adolescents with dangerous levels of suicidal ideation may not receive help early enough because suicidal ideation in adolescence is recognized by those around them as a universal psychological state that can appear during development. It was reported that the suicide rate of South Korea was the highest among OECD member countries as of 2020 (6). Especially, the suicide rate of teenagers skyrocketed by 9.4% compared to 2019 (10) before the COVID-19 pandemic. The result implied that youth suicide became a serious problem in South Korea after the outbreak of COVID-19.
Most previous studies (11)(12)(13) that identified predictors of suicidal ideation in adolescents mainly used logistic regression models. The logistic regression model is a stochastic model widely used to predict the likelihood of an event by linearly combining independent variables when the dependent variable is binomial. Although logistic regression analysis has the advantage of being able to identify the influence of individual variables on a dependent variable, it has a limitation in understanding the interaction between various explanatory variables used for the predictive model because it assumes independence that the effect of an explanatory variable does not depend on the level of other explanatory variables (14). As a way to overcome these limitations of regression analysis, many recent studies (15, 16) are widely using boostingbased machine learning models such as XGBoost (extreme gradient boosting).
Since the COVID-19 pandemic has not yet come to an end even after suffering from it for 2 years, it is necessary to conduct more studies based on scientific evidence to improve the mental health of the youth and prevent suicide. This study developed a model to predict groups vulnerable to suicidal ideation after the declaration of the COVID-19 pandemic based on nomogram techniques targeting 54,948 adolescents who participated in a national survey in South Korea.

Data source
It was a secondary data analysis study using the 2020 Korea Youth Risk Behavior Survey (17). The Korea Youth Risk Behavior Survey is an anonymous self-reporting online survey targeting students between 7th-grade students and 12th-grade students to understand the health behaviors of South Korean adolescents. It was jointly conducted by the Ministry of Education, the Ministry of Health and Welfare, and the Korea Disease Control and Prevention Agency in South Korea. This study sampled subjects from the 2020 Korea Youth Risk Behavior Survey in the steps of population stratification, sample allocation, and sampling. In the sample allocation step, this study set the sample size to 400 middle schools and 400 high schools. Then, five middle schools and five high schools were first allocated to each of 17 cities and provinces. This study investigated 57,925 students who were selected as samples from 1 August 2020 to 30 November 2020. The participation rate was 94.9% (54,948 students). This study excluded students who were absent for more than 3 months, students with disabilities (e.g., intellectual disability), and students with dyslexia at the time of the investigation. The data collection method was an anonymous self-report online survey, if there were any unanswered items, it did not move on to the next item. Therefore, there was no missing value. All data were collected in a way that did not reveal personal identifiable information. This study analyzed the data of 54,948 subjects who responded that they had suicidal ideation among students between 7th grade and Frontiers in Pediatrics 02 frontiersin.org 12th grade who participated in the 2020 Korea Youth Risk Behavior Survey.

Measurement of variables
The presence of suicidal ideation, an outcome variable, was determined when a subject responded "yes" to the item, "Have you ever seriously considered committing suicide in the past 12 months?" The input variables included grade (1 = 7th grade, 2 = 8th grade, 3 = 9th grade, 4 = 10th grade, 5 = 11th grade, or 6 = 12th grade), gender (1 = male or 2 = female), subjective economic status (1 = high, 2 = medium, or 3 = low), whether the economic status has changed during the COVID-19 pandemic (1 = strongly agree, 2 = agree, 3 = disagree, or 4 = strongly disagree), living with a family member (1 = yes or 2 = no), area of residence (1 = urban area or 2 = rural area), school type (1 = middle school, 2 = vocational high school, or 3 = general high school), academic performance (1 = high, 2 = mediumhigh, 3 = medium, 4 = medium-low, or 5 = low), drinking at least one glass or shot of beer, soju, or whiskey within the last 30 days (1 = no or 2 = yes), smoking at least one cigarette within the last 30 days (1 = no or 2 = yes), drug experience (e.g., hallucinogens and drugs such as methamphetamine) (1 = no or 2 = yes), conflict in relationship with friends or colleagues due to smartphone overdependence (1 = strongly disagree, 2 = disagree, 3 = agree, or 4 = strongly agree), days of conducting moderate-or higher-intensity exercise regularly (none, 1-2 times a week, or 3 or more times a week), subjective sleep satisfaction (1 = sufficient, 2 = moderate, or 3 = insufficient), subjective health recognition (1 = good, 2 = moderate, or 3 = bad), subjective stress recognition (1 = high, 2 = moderate, or 3 = none), subjective body type recognition (1 = underweight, 2 = moderate, or 3 = obesity), weight control efforts in the past 30 days (1 = no effort or 2 = effort), sexual relation (1 = no or 2 = yes), depression experience in the past 12 months (1 = no or 2 = yes), experience of loneliness in the past 12 months (1 = rarely, 2 = moderate, or 3 = frequent), and receiving treatment due to violence from an acquaintance (e.g., adult, senior, or friend) (1 = no or 2 = yes). Depression was defined as the case of answering "yes" to "Have you ever felt sad or hopeless enough to stop your daily activities for 2 weeks in the past 12 months?" that was a criterion for determining major depressive disorder. Smartphone overuse was defined as the experience of severe conflict in a friend, colleague, or social relation due to smartphone overuse in the past 30 days. Regular moderate-or higher-intensity exercise was defined as "the experience of conducting exercise (regardless of exercise type) at the intensity that increases heart rate than usual or makes you out of breath for 60 minutes or longer in total per day in the past seven days." Subjective sleep satisfaction was defined as the case in which the amount of sleep was sufficient to overcome fatigue in the last 7 days.

Variable selection
A nomogram generally identifies the predictive path of disease by using 5-7 variables because when a larger number of explanatory variables were entered into the nomogram, the number of cases for calculating the predictive probability for a disease increases as well (18). Therefore, when developing a nomogram, it is important to select explanatory variables to be used in the nomogram. This study used XGBoost to select variables, and the top seven variables with highfeature importance were selected as variables to be used in the nomogram. XGBoost is a boosting technique that has the advantages of fast speed and scalability (19). XGBoost is based on a decision tree-based algorithm that uses a boosting technique that lowers the error by coupling multiple classification and regression trees. XGBoost generates an optimized model in a way that controls the complexity of the tree to minimize training loss and prevent overfitting. The objective function of XGBoost (19) is presented in the following equation: where K stands for the number of trees, and refers to all situations that may affect the complexity of trees. Starting from a tree with a depth of 0, if a lot of new information is gained (Gain) when pruning, the tree continues to grow (greedy learning of the tree) (20). The gain function of XGBoost is presented in the following equation: Although XGBoost has been mainly used as a predictive model, it can also be used as an interpretable model for variable selection. This is because it is possible to understand the accuracy contribution score (gain) of each variable and the appearance frequency of the variable in the entire tree until the XGBoost model is formed by checking the feature importance (20). It can also confirm the split used for each pruning and the gain due to the split. It helps to understand the direction of the variable. This study set the hyperparameters of XGBoost as the number of trees = 100, learning rate = 0.3, regularization lambda = 1, and limit the depth of individual tree = 6.

Development and validation of the nomogram
This study developed a model to predict suicidal ideation by using logistic regression analysis. The model aimed to understand the relationship between predictors associated with the suicidal ideation of South Korean adolescents by using the top seven variables with the highest feature importance Frontiers in Pediatrics 03 frontiersin.org confirmed in XGBoost. The regression model analyzed using multiple logistic regression with adjusted confounding factors. It presented an adjusted odds ratio (AOR) and 95% confidence interval (CI) to understand the independent relationship between predictive factors and adolescent suicidal ideation. The regression model was developed using a nomogram so that medical workers could easily interpret the probability of suicidal ideation and identify groups vulnerable to suicidal ideation. The nomogram based on logistic regression is a two-dimensional diagram presenting the relationship between multiple risk factors to simply and efficiently calculate the predictive probability of disease (21). A logistic regression nomogram is generally composed of a point line, a risk factor line, a probability line, and a total point line (22). The point line is placed at the top of the nomogram to derive a score corresponding to the class of each risk factor (23, 24). Moreover, the number of risk factor line is equal to the number of risk factors for adolescent suicidal ideation. This study set the number of risk factor lines as seven for efficient interpretation of the nomogram. The total point line refers to the sum of the scores of individual risk factors. The probability line is the final risk probability value calculated based on the total point line and is placed at the bottom of the nomogram.
The predictive performance evaluation of the finally developed nomogram was analyzed using 10-fold crossvalidation. The area under the curve (AUC), general accuracy, F1 score, and calibration plot were used as indicators for evaluating predictive performance. All analyses were conducted using Python version 3.10.4. 1

Results
General characteristics of subjects by suicidal ideation experience after the COVID-19 pandemic Table 1 shows the differences (Chi-square test results) in the general characteristics between adolescents who experienced suicidal ideation during the COVID-19 pandemic and those who did not experience suicidal ideation during the COVID-19 pandemic. Among 54,948 South Korean adolescents, 5,979 adolescents (10.9%) experienced suicidal ideation during the COVID-19 pandemic. The results of Chi-square test showed that adolescents who experienced suicidal ideation and those who did not experience suicidal ideation were significantly different in grade, gender, whether the economic status has changed during the COVID-19 pandemic, household economic status, living with a family member, school type, area of residence, academic performance, drinking within the last

Predictive factors for suicidal ideation in South Korean adolescents
This study calculated the feature importance of factors associated with the suicidal ideation of South Korean adolescents by using XGBoost (Figure 1). The results showed that the top seven variables with high-feature importance were depression experience in the past 12 months, subjective stress recognition, experience of loneliness in the past 12 months, academic performance, grade, household economic status, and changes in economic status due to COVID-19. Table 2 shows the results of logistic regression analysis for predicting the suicidal ideation of South Korean adolescents using the top seven variables with high-feature importance in XGBoost. The analysis results of adjusted model for predicting the suicidal ideation of South Korean adolescents showed that independent influencing factors were 7th grade (AOR = 1.15, 95% CI = 1.03-1.28), 12th grade (AOR = 0.89, 95% CI = 0.80-0.99), adolescents with very large economic changes due to COVID-19 (AOR = 1.25, 95% CI = 1.10-1.41), adolescents with poor household economic status (AOR = 1.26, 95% CI = 1.14-1.38), adolescents with moderate academic performance (AOR = 0.86, 95% CI = 0.77-0.96), adolescents who frequently experienced subjective stress (moderate: AOR = 1.88, high: AOR = 4.81), adolescents who experienced depression in the last 12 months (AOR = 4.85, 95% CI = 4.53-5.19), and adolescents who frequently experienced loneliness in the past 12 months (moderate: AOR = 1.93, frequently: AOR = 4.58) (p < 0.05).

Development and validation of a nomogram for high-risk groups for suicidal ideation in Korean adolescents
The nomogram for predicting the suicidal ideation of South Korean adolescents is presented in Figure 2. This nomogram derived that the predictive probability of suicidal ideation for eighth graders who responded that they experienced depression in the past 12 months, they had a lot of subjective stress, they frequently felt lonely in the last 12 months, their household economic status worsened a lot during the  COVID-19 pandemic, and their academic performance was very poor was 72%.
The predictive performance of the developed nomogram for predicting adolescent suicidal ideation was tested using AUC, general accuracy, F1, recall, precision, and calibration plot (Figure 3). The prediction probability and observation probability of the adolescent group that experienced suicidal ideation and those of the adolescent group that did not experience suicidal ideation were compared using calibration plot and Chi-square test (Figure 3). The results showed that prediction probability and observation probability were not significantly (p < 0.05) different. The results of 10-fold crossvalidation revealed that the AUC of the adolescent suicidal ideation prediction nomogram was 0.86, the general accuracy was 0.89, the precision was 0.87, the recall 0.89, and the F1score was 0.88.

Discussion
This study evaluated the factors associated with the suicidal ideation of adolescents using epidemiological data representing South Korean adolescents. The results showed that middleschool students had a higher risk of suicidal ideation than high-school students. Previous studies (25, 26) on South Korean adolescents reported that middle-school students had a 1.3fold higher risk of suicide attempts than high-school students, which agreed with the results of this study. Brière et al. (26) showed that eighth and ninth graders had the most suicidal ideation and suicide attempts. Glenn et al. (25) also revealed that suicide ideation increased abruptly between 12-and 14-yearold adolescents. Suicidal ideation refers to a continuous interest, thoughts, and illusions about ending one's own life (27). Middleschool students who were adolescents could be more easily stuck in a psychologically maladjusted state (28) than highschool students when they experienced stress or negative life events. When this psychological maladjustment state persists, it is highly likely to develop mental health problems (29).
Dubé et al. (4) conducted a meta-analysis using 54 studies (308,596 subjects), which examined suicide behaviors during the COVID-19 pandemic, to find that event rates (e.g., 10.81% for suicidal ideation and 4.68% for suicide attempts) increased from studies conducted prior to the pandemic. Nevertheless, it is noteworthy that most of the studies targeted adults. Since only a few epidemiological studies analyzed the suicidal ideation of adolescents after the COVID-19 pandemic, additional epidemiological studies are needed to understand the difference between adolescent suicidal ideation before and after the pandemic.
Suicide has been a direct cause of death in South Korea and the top cause of death among South Korean adolescents for 10 consecutive years (30). Social efforts are required to maintain adolescent mental health because it has been reported that adolescents who experienced suicidal ideation have a high risk of suicide even if they do not choose to commit suicide during adolescence and may experience severe depression due to social maladjustment (26). Therefore, it is highly required to detect middle-school students highly vulnerable to suicide as soon as possible and intervene with them continuously to reduce the suicide rate of adolescents in the future based on the results of this study. It is also needed to develop a suicide prevention program tailored to the sociodemographic characteristics of middle-school students.
In this study, adolescents who experienced loneliness in the last 12 months had a higher risk of suicidal ideation. It seems that the result is related to emotional anxiety due to the absence of a person to seek help from. Choi et al. (31) examined the suicidal ideation of South Korean middle-school students and reported that only 32.3% of South Korean adolescents had consulted with others or asked for help when they have difficulties. The results implied that two out of three middleschool students tried to solve problems on their own without the help of people around them when encountering difficulties. It is believed that middle-school students frequently feel lonely and give up asking for help from people around them. As a result, they feel helpless repeatedly, which ultimately leads to suicidal ideation. The results of this study confirmed that the level of stress perceived by adolescents was significantly related to suicidal ideation. These results were similar to the results of previous studies (32,33) showing that stress was a major risk factor for suicide in adolescence. Stress that is perceived as not controllable is highly likely to make people lose the meaning of their lives (32). Moreover, persistent stress is highly likely to intensify suicidal ideation (33). Therefore, medical personnel need to first understand the stress level perceived by the subject more than anything else to detect adolescents with a high risk of suicidal ideation.
Another finding of this study was that change in household economic status due to the COVID-19 pandemic was identified as a major risk factor for adolescent depression. As the lockdown caused by the COVID-19 pandemic continued, South Korean workers experienced income reduction and instability due to business regulations (34,35). Moreover, many business owners had to close their businesses in extreme cases, in addition to income decrease (34,35). The decrease in household income due to the extended COVID-19 pandemic threatened the survival of the family (35). This economic difficulty could become a bigger psychological problem for economically vulnerable groups such as older adults and adolescents than adults (36). For example, as the lockdown due to the COVID-19 pandemic continued, students could experience isolation due to school closures and the absence of psychological support providers (37). As they experienced a crisis in the household economy at the same time, their psychological and emotional problems could be further exacerbated. The results of this study showed that psychological problems such as the suicidal ideation of adolescents were significantly related to rapid changes in household economic status, such as unemployment or a decrease in income of workers due to the COVID-19 pandemic. These results implied that the government should respond to the unemployment and reduced income of workers due to the extended COVID-19 pandemic more sensitively. They also suggested that there would be a need to pay attention especially to the mental health of the children of households with sharply declining incomes continuously.
This study developed a logistic nomogram and identified multiple risk factors for adolescent suicidal ideation during the COVID-19 pandemic. This nomogram derived that the predictive probability of suicidal ideation for eighth graders who responded that they experienced depression in the past 12 months, they had a lot of subjective stress, they frequently felt lonely in the last 12 months, their household economic status worsened a lot during the COVID-19 pandemic, and their academic performance was very poor was 72%, which was high. In South Korea, the teenage suicide rate in 2020 increased by 9.4% from that in 2019 (10). As the prolonged COVID-19 pandemic has not ended as of May 2022, it may increase even further in the future. Therefore, it is necessary to screen depression for the high suicidal ideation risk group with all these multiple risk factors at the school or community level and to conduct community-centered monitoring continuously to prevent depression. It is also required to conduct additional studies on multiple risk factors for suicidal ideation among adolescents after the COVID-19 pandemic.  A nomogram predicting the South Korean adolescent group vulnerable to suicidal ideation: (1) depression experience in the past 12 months (1 = no or 2 = yes), (2) subjective stress recognition (1 = high, 2 = moderate, or 3 = none), (3) experience of loneliness in the last 12 months (1 = rarely, 2 = moderate, or 3 = frequent), (4) subjective economic status (1 = high, 2 = medium, or 3 = low), (5) grade (1 = 7th grade, 2 = 8th grade, 3 = 9th grade, 4 = 10th grade, 5 = 11th grade, or 6 = 12th grade), (6) academic performance (1 = high, 2 = medium-high, 3 = medium, 4 = medium-low, or 5 = low), and (7) changes in economic status during the COVID-19 pandemic (1 = strongly agree, 2 = agree, 3 = disagree, or 4 = strongly disagree).

FIGURE 3
A calibration plot to identify the performance of the nomogram to predict the South Korean adolescent group vulnerable to suicidal ideation.
Frontiers in Pediatrics 08 frontiersin.org The strengths of this study were to identify the adolescent group vulnerable to depressive disorder based on multiple risk factors and present the basis for selecting the adolescent group vulnerable to suicide based on the results. This study had several limitations. First, this study did not investigate parental abuse or negligence. Second, the variables used in this epidemiological study were measurements based on self-report questionnaires. Future studies need to identify risk factors for adolescent suicidal ideation by integrating qualitative research methods such as in-depth interviews in addition to self-report questionnaires. Third, since the results of this study were based on a crosssectional study, the results cannot be interpreted as a causal relationship. It is necessary to conduct longitudinal studies on adolescents vulnerable to suicidal ideation identified in this study.

Conclusion
This epidemiological study predicted that eighth graders who experienced depression in the past 12 months, had a lot of subjective stress, frequently felt lonely in the last 12 months, experienced much-worsened household economic status during the COVID-19 pandemic, and had poor academic performance were vulnerable to suicidal ideation (a high suicide risk group). Therefore, it is necessary to continuously intervene (e.g., early detection of adolescents vulnerable to suicidal ideation and mental health management) with adolescents to prevent adolescent suicide. It is also required to recognize the seriousness of adolescent suicide and mental health after the onset of the COVID-19 pandemic and prepare a customized support system that considers the characteristics of persons at risk of suicide at the school or community level.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.kdca. go.kr/yhs/.

Ethics statement
This study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of Korea Disease Control and Prevention Agency (protocol code 117075 and date: 01-07-2021). Written informed consent to participate in this study was provided by the participants or their legal guardian/next of kin.

Author contributions
HB was involved in study data interpretation, designed the manuscript, performed the statistical analysis, and assisted with writing the manuscript.