Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Psychol., 02 January 2026

Sec. Educational Psychology

Volume 16 - 2025 | https://doi.org/10.3389/fpsyg.2025.1684529

Students’ stress prediction and explainable analysis based on improved decision trees


Cheng Liu*Cheng Liu*Shuang YuShuang Yu
  • Department of Digital Business, Jiangsu Vocational Institute of Commerce, Nanjing, Jiangsu, China

Introduction: Nowadays students are burdened with pressures from various aspects such as academics, social life, and career planning. It is of great significance to accurately predict their stress levels and analyze the key influencing factors.

Methods: A stress prediction model for students was constructed based on an enhanced decision tree (DT) algorithm. First, nine machine learning algorithms, including logistic regression (LR) and DT, were compared to screen out the optimal base model. Then, the harris hawks optimization (HHO) algorithm was introduced to optimize the DT model and improve its prediction performance. Finally, the Shapley Additive Explanations (SHAP) model was applied to interpret the prediction results and analyze the contribution of various features to stress levels.

Results: The DT algorithm showed outstanding performance among the nine compared models, achieving a prediction accuracy of 0.909. After optimization by the HHO algorithm, the HHO-DT model further improved the accuracy to 0.927 and had the fewest misclassified samples. SHAP analysis revealed that blood pressure, social support, and depression were the key features affecting students’ stress level prediction.

Discussion: The research results provide a scientific and effective basis for intervention measures taken by mental health educators, parents, and students themselves, which is helpful to relieve students’ stress and promote their physical and mental health.

1 Introduction

In today’s society, adolescents and young adult students (AYAS) are burdened by pressures from various aspects, including academics, social life, and career planning. Academically, with the increasingly fierce social competition and the ever-changing educational environment, AYAS face unprecedented psychological and life pressures (Harahap et al., 2022). In addition to academic stress, they are also troubled by issues related to social and interpersonal relationships. These pressures may arise from peer relationships, teacher-student interactions, and perceptions of self-identity, among other factors (Guo R., 2024). Research has shown that a lack of social support can further exacerbate students’ psychological burdens (Jia and Chu, 2024). Moreover, when navigating interpersonal relationships, AYAS encounter numerous challenges, including emotional problems and social expectations (Chen, 2013). Prolonged exposure to such pressures makes AYAS highly susceptible to mental health issues, with symptoms of depression being particularly prevalent. A study on AYAS found that daily hassles are more likely to trigger depressive emotions than major life events (Shin, 2024).

Recent studies have shown that combining metaheuristic optimization algorithms with traditional machine learning models can significantly improve classification accuracy and reduce computational costs. For example, the combination of greylag goose optimization (GGO) and a multi-layer perceptron achieved an accuracy of 98.4% in lung cancer classification (Elkenawy et al., 2024), while the gradient boosting model optimized by binary particle swarm optimization (BPSO) reached a precision of 95.5% in the prediction of COVID-19 transmission (Alkhammash et al., 2023). The combination of deep learning models such as convolutional neural networks (CNN) and long short-term memory networks (LSTM) has also made breakthroughs in agricultural disease detection. For instance, the CNN-LSTM model achieved a remarkably high accuracy of 97.1% in the identification of potato late blight, demonstrating its advantages in time-series data and image analysis (Alzakari et al., 2024). However, existing methods still face challenges such as high computational complexity, the risk of overfitting, and feature redundancy. There is an urgent need for more efficient optimization strategies. The crucial role of optimization algorithms in feature selection and model parameter tuning provides the inspiration for this study.

In this context, accurately predicting the stress levels of AYAS and analyzing the key influencing factors have become pressing issues that warrant urgent attention. This research aims to construct a predictive model for AYAS stress using an enhanced DT algorithm. The objective is not only to achieve precise identification of AYAS’ stress states but also to provide an interpretable analysis that elucidates the mechanisms through which various factors influence stress levels. This research offers a scientific and effective foundation for interventions by mental health educators, parents, and the students themselves, thereby contributing to the alleviation of stress and the promotion of physical and mental health among AYAS. The innovations of this paper are as follows:

• Through comparing nine machine learning algorithms, the DT model demonstrated the optimal performance in predicting student stress with an accuracy of 0.8955 and an F1 score of 0.8952. And, paired t-test validation confirmed its superiority over models such as Random Forest (RF) and XGBoost (XGB).

• Introducing the HHO-DT model with accuracy of 0.927, while achieving a runtime of only 26.9 s, outperforming grid search of 117.2 s.

• Feature impact quantification via SHAP model revealed blood pressure, social support, and depression as core factors in stress prediction, enhancing model transparency and practical value while providing a scientific basis for intervention measures.

2 Literature review

2.1 Research of student stress

In today’s learning environment, students encounter various forms of stress, with academic stress being one of the most significant influencing factors. The increasing difficulty of courses, the growing volume of homework, the frequent administration of exams, and the escalating competition for further education all impose a considerable burden on students (Al-Rouq et al., 2022). A study (Chust-Hernández et al., 2022) has highlighted that for nursing students, school-related factors significantly impact their academic stress. Tasks such as classroom presentations and insufficient time to complete assignments are particularly important sources of stress.

Moreover, family-related factors should not be overlooked. Parents’ excessive expectations, the quality of family relationships, and the family’s economic situation can all influence students’ stress levels to varying degrees (Nguyen et al., 2018). For instance, literature (Hossain et al., 2023) clearly indicates that economic factors, such as tuition arrears and borrowing, are positively correlated with the economic stress experienced by college students. In the social realm, the challenges students face in managing relationships with classmates and teachers, as well as the difficulties in integrating into social groups, can also contribute to their stress levels (Bartlett et al., 2016). Additionally, individual factors play a crucial role in how stress is perceived and managed. A study (Maykrantz and Houghton, 2020) has demonstrated that elements such as personality, psychological resilience, and self-perception can influence students’ perceptions of and responses to stress. This research further indicates that self-leadership and coping skills can help regulate students’ stress levels, underscoring the significant impact of individual psychological factors on stress management.

Long-term exposure to stress can negatively impact students’ physical and mental health. A study of German students (Kiupel et al., 2023) found an association between stress and skin itching, suggesting that stress may adversely affect physical well-being. From an academic perspective, moderate stress can be transformed into motivation for learning; however, excessive stress can hinder the learning process. Taking Thai medical students as an example (Saipanish, 2003), the study highlights that the stress they experience is closely linked to academic challenges, implying the detrimental effects of excessive stress on academic performance. Additionally, other literature (Ragab et al., 2021) underscores the adverse effects of academic stress on medical students.

To effectively alleviate students’ stress, intervention measures must be implemented at multiple levels. At the school level, optimizing the curriculum, actively developing mental health education courses, and fostering a positive campus atmosphere can significantly help reduce students’ stress (Nguyen-Thi, 2024). One study (Antoniadou et al., 2024) suggests that schools should enhance their support systems and refine academic procedures to better address students’ stress-related issues. The family environment also plays a crucial role. Parents who adjust their expectations, adopt democratic educational methods, and provide ample emotional support can positively influence their children’s ability to cope with stress. Although relevant literature does not extensively detail family intervention measures, the significant impact of family factors on students’ stress indirectly underscores the importance of such measures. For students themselves, acquiring effective coping skills and cultivating a diverse range of hobbies can enhance psychological resilience, thereby mitigating stress. For example, a study (Mahoney and Bussard, 2024) has shown that intervention measures such as relaxation training can reduce the stress levels of nursing students, which fully reflects the importance of students’ self-regulation in the process of coping with stress.

2.2 Research of machine learning in predicting student stress

Research on the application of machine learning to predict student stress is gaining increasing attention. Numerous studies focus on enhancing prediction accuracy through various algorithms and data sources, thereby providing robust support for interventions in students’ mental health.

Regarding the construction of prediction models and the comparison of algorithms, many studies have achieved notable results. In El Morr et al. (2024), eight machine learning models were developed to assess the depression, anxiety, and stress symptoms of Lebanese university students during the COVID-19 pandemic. The study found that the random forest (RF), naive bayes (NB), and adaboost models excelled in predicting these symptoms. In the study referenced as Ghosh et al. (2024), researchers created multiple algorithms to predict changes in student stress following yoga practice, with the RF algorithm demonstrating the lowest root mean square percent error (RMSPE) and a relatively high r-squared value. Schaab et al. (2024) conducted a systematic review of machine learning models for detecting depression, anxiety, and stress in undergraduate students, revealing that the accuracy of most models exceeded 70%. However, the review also noted issues such as over-reliance on internal validation and low-quality evidence. These studies indicate that different algorithms possess distinct advantages and disadvantages in various scenarios, offering a valuable reference for subsequent research in selecting appropriate algorithms. Notably, Arya et al. (2024) uses the same dataset as the present study, but relies on the GridSearchCV method for hyperparameter optimization. Its performance is constrained by the design of search space and step size, resulting in low efficiency. Furthermore, its analysis is limited to correlation coefficients and confusion matrices, lacking quantitative interpretation of the contribution of features to predictions. In contrast, the present study introduces a metaheuristic optimization framework and adopts the HHO algorithm to optimize the hyperparameters of the DT model, enabling efficient search of the parameter space. In addition, the present study integrates the SHAP model, and quantifies the degree of influence of each feature on the prediction results by calculating the average SHAP value of each feature, which provides an operable scientific basis for campus mental health educators and parents to formulate targeted intervention measures.

Data sources and feature selection are crucial for predicting stress. Shvetcov et al. (2024) utilized passive sensing data from smartphones, including GPS and step-counting data, to predict stress levels among university students and established an effective methodological process. Liao et al. (2024) selected factors such as sleep and exercise for feature extraction based on intelligent sensing data and psychological stress scales, constructing a psychological stress assessment model for university students. Zhang et al. (2024) integrated multi-source data (psychological, lifestyle, and exercise data) to predict the risk of sleep disorders in university students, identifying key features such as stress scores and the severity of depression. This demonstrates that the integration of multi-source data can provide more comprehensive information for stress prediction and enhance prediction accuracy.

Research on predicting stress in specific student groups has also progressed. In the study referenced as Rahman and Kohli (2024), focusing on international students, a model was developed using online surveys and existing datasets to predict their depression, achieving an accuracy rate of 80% with the RF model. Kong et al. (2025) combined multiple machine learning methods to predict the severity of mental health issues among Chinese freshmen in universities, discovering that factors related to interpersonal relationships had the strongest predictive power. Calderon et al. (2024) employed machine learning and Bayesian networks to analyze a sample of American university students, revealing the connection between insomnia and mental disorders, thereby providing a basis for relevant interventions. These studies have yielded targeted strategies for stress management in student groups with diverse backgrounds.

Although machine learning has yielded promising results in predicting student stress, it continues to encounter several challenges. The quality of the data is inconsistent, with issues such as missing values and noise that adversely affect model performance. Furthermore, the generalization ability of these models requires enhancement, and their stability across diverse student groups and scenarios remains inadequate. Additionally, ethical concerns, including data privacy protection and algorithmic fairness, must also be addressed.

2.3 Research of intelligent optimization

The research on the application of intelligent optimization algorithms in numerous fields is continuously advancing in depth. Among them, the HHO, BES, GWO, SSA, and particle swarm optimization (PSO) have been developing vigorously. These algorithms, with their unique search mechanisms, provide effective solutions to complex optimization problems.

Numerous scholars have conducted extensive research and innovative practices on these algorithms, yielding fruitful results. In Tang et al. (2024), researchers proposed an improved harris hawks optimization algorithm (IHHO) integrated with a back propagation (BP) neural network, applying it to high-precision landslide displacement prediction. Practical experiments demonstrate that the IHHO-BP model possesses significant advantages in addressing complex nonlinear problems. This model greatly enhances prediction accuracy and provides crucial technical support for landslide disaster warning. Zaky et al. (2021) applied the BES algorithm to the modeling and simulation of perovskite solar cells (PSCs). The study revealed that the BES algorithm exhibits high accuracy and efficiency in determining the device parameters of PSCs, particularly when addressing complex nonlinear issues, thereby opening new avenues for research and development in perovskite solar cells.

In the field of healthcare, Tarek et al. (2025) applied the snake optimization algorithm (SO) to the detection of cardiovascular diseases, and achieved a high precision of 99.9%, demonstrating the role of optimization algorithms in reducing dimensionality and enhancing the generalization ability of models. The improved Al-Biruni Earth Radius (MBER) algorithm adjusted the exploration and exploitation strategies dynamically. It achieved an accuracy of 96.12% in the classification of eye movements from electroencephalogram (EEG) signals (Elshewey et al., 2024), providing new ideas for biomedical signal processing.

In the field of mechanical engineering optimization, the GWO model exhibits exceptional performance. Yildiz et al. (2020) applied the GWO to various mechanical engineering optimization problems and compared its efficacy against multiple meta-heuristic algorithms. The results substantiate the GWO’s superior performance, highlighting its significant application value in the optimization design of mechanical engineering. The SSA also demonstrates robust capabilities. Xue and Shen (2020) introduced the SSA and conducted comprehensive tests of its performance using 19 benchmark functions. The results indicate that, in most instances, the SSA outperforms classic algorithms such as the GWO and PSO. To further enhance the SSA’s performance, Guo W. (2024) introduced a learning mechanism, resulting in the learning-based SSA (LSSA), which effectively addresses the issue of the traditional SSA’s tendency to converge to local optima. Lastly, Ye et al. (2022) proposed a feature selection method based on PSO. By integrating a differential evolution strategy, this method greatly improves the global search capability of the algorithm, allowing the PSO to more accurately identify key features in the feature selection task, thus enhancing the performance of related models.

Overall, intelligent optimization algorithms hold significant potential for addressing complex optimization problems. Algorithms such as HHO, BES, GWO, SSA, and PSO each possess unique advantages, making them suitable for various application scenarios. Ongoing in-depth research and enhancements of these algorithms contribute to a better understanding of their performance advantages and facilitate the expansion of their application domains.

3 Materials and methods

3.1 Data description

The dataset utilized in this study was sourced from Kaggle and targets a student population with extensive coverage. And, the dataset was primarily collected from high school and college students in Dharan, Nepal. Participants were aged between 15 and 24 years, representing the adolescent and young adult population typically studied in stress research. The data collection period spanned from June to October 2022, using stratified sampling to ensure representation across different grades and academic disciplines. Comprising 1,100 samples, this dataset includes 20 variables related to student stress, spanning multiple dimensions: psychological, physiological, social, environmental, and academic. As presented in Table 1, the mean (MEAN) and standard deviation (STD) were calculated for each variable, accompanied by a clear interpretation of each field’s significance. The mean value of student’s stress levels is 1.51, which falls within the upper-middle portion of the range [0, 2], indicating that students generally experience a relatively high level of stress. The mean value of anxiety levels is 13.95, also situated in the upper-middle range [0, 21], suggesting that students, overall, exhibit a certain degree of anxiety. Additionally, the mean value for mental health history is 0.69, indicating that approximately 69% of students have a history of mental health issues, which represents a notably high proportion. By analyzing this dataset, we can verify the impact of factors such as Psychological, Physiological, Environmental, Academic, and Social on students’ stress, thereby addressing the stress issue in a more targeted manner.

TABLE 1
www.frontiersin.org

Table 1. Statistics and explanations of stress indicators.

The core psychological indicators in the dataset were measured using internationally recognized standardized scales with clear operational definitions and scoring ranges: anxiety_level was assessed by the Generalized Anxiety Disorder-7 Scale (GAD-7, scoring range 0–21); self_esteem was measured by the Rosenberg Self-Esteem Scale (scoring range 0–30); and depression was evaluated by the Patient Health Questionnaire-9 (PHQ-9, scoring range 0–27). The Cronbach’s α coefficient for the mental health dimension is 0.649, which falls within the acceptable range. For other non-psychological scale indicators detailed in Table 1, their scoring follows a unified logical framework, scores of 0–1 represent “low level,” 2–3 represent “moderate level, and 4–5 represent “high level,” with physiological indicators like blood_pressure graded with reference to conventional clinical assessment standards, ensuring the indicator values have practical significance and discriminability to characterize students’ physical status and living environment.

Preliminary statistical checks confirmed that all samples had complete data across the 21 dimensions, with all values falling within reasonable ranges. Therefore, the dataset can be directly used for modeling analysis without additional preprocessing. To investigate the factors influencing stress level, Pearson correlation coefficients between stress level and the other 20 variables were calculated. As shown in Figure 1, except for blood_pressure, the absolute values of the correlation coefficients for all other variables exceed 0.5, indicating moderate to strong associations with stress level and justifying their inclusion in further analysis.

FIGURE 1
Horizontal bar chart showing Pearson correlation coefficients between various factors and stress levels. Positive correlations are in red, indicating the strongest with bullying (0.751) and future_career_concerns (0.743). Negative correlations in blue include self_esteem (-0.756) and sleep_quality (-0.749). Factors like depression (0.734) and academic performance (-0.721) are also significant.

Figure 1. Correlation between stress level and other variables.

3.2 Decision tree algorithm

The DT model is a commonly used machine learning algorithm, which is widely applied to classification and regression tasks (Chen et al., 2011). It simulates the human decision making process. By analyzing and judging the features of data, it gradually divides the data set into smaller subsets until each subset belongs to the same class as much as possible.

Step 1, in the feature selection stage, an optimal feature is selected from the current data set for partitioning the data set. Commonly used indicators include information gain. The larger the information gain, the better the feature can reduce the uncertainty of the data set when used for partitioning, that is, the greater the contribution of this feature to classification.

Suppose the data set D contains n classes, and the number of samples in the i-th class is |Ci|. Then the entropy of the data set is defined as in Equation 1:

H ( D ) = - i = 1 n | C i | | D | log 2 ( | C i | | D | ) (1)

In which, |D| represents the total number of samples. When the data set D is partitioned using the feature A, the information gain is defined as in Equation 2:

G ( D , A ) = H ( D ) - j = 1 v | D j | | D | H ( D j ) (2)

Among them, |Dj| represents the total number of samples in the Dj dataset, and H(Dj) represents the entropy of the Dj dataset. The larger the information gain, the greater the contribution of feature A to classification.

Step 2, based on step 1, partition the data set. Divide the current data set into several sub-data sets, with each sub-data set corresponding to one value of the feature.

Step 3, recursively construct sub-trees, for each sub-data set, repeat the above processes of step 1 and step 2, until the samples in the sub-data set all belong to the same class or there are no more features available for selection.

3.3 HHO algorithm

The HHO model is a novel metaheuristic optimization algorithm inspired by the cooperative hunting behavior of harris hawks in nature (Heidari et al., 2019). It has shown remarkable performance in solving a wide variety of complex optimization problems due to its efficient exploration and exploitation capabilities. During the exploration phase, the hawks search for potential prey over a wide area, mimicking the global search behavior in optimization. In the exploitation phase, once the prey is spotted, they use different tactics to capture it, similar to the local search refinement in optimization algorithms.

Let Xit denote the position vector of the i-th hawk in the t-th iteration. In the exploration phase, the position update rule is designed to encourage the hawks to explore different regions of the search space. The equation is given in Equation 3:

X i t = X r a n d t - r 1 * | C * X r a n d t - X i t | (3)

In which, Xrandt is a randomly selected hawk’s position from the current population at iteration, r1 is a random number in the range [0, 1], and C is a constant coefficient in the range between 0 and 2.

During the exploitation phase, when the hawks have identified a promising region (prey location), the position update is more focused on local search. There are multiple strategies depending on the fitness of the hawks and the stage of the search. One of the commonly used equations is given in Equation 4:

X i t = X b e s t t - 2 * E 0 ( 1 - t T m a x ) * | J * X b e s t t - X i t | (4)

Where Xbestt is the position of the best performing hawk at iteration t, E0 is the initial escape energy, usually set to a value between 1 and 2, t is the current iteration number, and Tmax is the maximum number of iterations, and J is a random number between −1 and 1.

This study introduced the HHO algorithm, aiming to meticulously optimize the hyperparameters of the DT algorithm, thus effectively enhancing the accuracy and reliability of the model and laying a solid foundation for subsequent precise pressure classification.

The dataset was divided into a training set and a testing set at a ratio of 4:1. On the training set, the HHO algorithm was employed to optimize hyperparameters of the DT to determine the optimal parameters of the model. And, the determined optimal model was used to predict the testing set to further validate the generalization ability of the model, the work flow is shown in Figure 2.

FIGURE 2
Flowchart illustrating an HHO-DT classification model. Left panel: starts with student stress data, divides data into testing and training sets, processes data, applies HHO-DT model, outputs stress classification results, and ends. Right panel: initializes parameters (population size, spatial dimension, iterations), sets search range, satisfies termination conditions, obtains optimal parameters, calculates parameters and fitness value for each iteration. Arrows connect steps to show process flow.

Figure 2. Work flow of HHO-DT model.

3.4 Interpretive SHAP model

The SHAP model is a unified approach for interpreting the prediction results of machine learning models (Hancock et al., 2025). It is based on the concept of Shapley values in cooperative game theory and aims to provide a reasonable and unique attribution for the contribution of each feature to the model’ prediction. In this study, we applied the SHAP model to explain the student stress prediction model, in order to deeply understand the impact of each feature on the prediction results.

Specifically, for a dataset X containing n features and a machine learning model f, the Shapley value ϕi of feature i is defined as in Equation 5:

ϕ i = S N \ { i } | S | ! ( n - | S | - 1 ) ! n ! [ f ( S { i } ) - f ( S ) ] (5)

Among them, N represents the set of all features, S is an arbitrary subset of N that does not contain feature i, |S| indicates the size of subset S, and f(S) represents the prediction result of the model when only considering the features in subset S.

In the student stress prediction model, by calculating the SHAP value of each feature, we can quantify the degree of influence of each feature on predicting students’ stress levels. A positive SHAP value indicates that the feature tends to increase the predicted stress level, while a negative SHAP value means that the feature tends to decrease the predicted stress level. The larger the absolute value of the SHAP value, the greater the impact of the feature on the prediction result.

3.5 Model evaluation metrics

To comprehensively evaluate the performance of the student stress prediction model, this study has selected a series of representative evaluation indicators, including accuracy, precision, recall, and F1 score. These indicators, considered from various dimensions, provide a thorough assessment of the model’s performance in classifying faults and non-fault samples, thereby establishing a robust basis for accurately measuring the model’s reliability, stability, and generalization.

Accuracy intuitively reflects the overall correctness of the model’s predictions by calculating the proportion of correctly predicted samples in the total number of samples. The formula for accuracy is given in Equation 6:

A c c u r a c y = T P + T N T P + T N + F P + F N (6)

In which, TP refers to the number of faulty samples correctly identified by the model. TN represents the number of non-faulty samples accurately determined by the model. FP indicates the number of normal samples misjudged as faulty by the model. FN is the number of faulty samples misclassified as normal by the model.

Precision measures the proportion of true positive samples among the samples predicted as positive. It reflects the accuracy of the model in predicting positive examples, that is, how many of the positive examples predicted by the model are actually positive. Its calculation formula is given in Equation 7:

P r e c i s i o n = T P T P + F P (7)

Recall measures the proportion of actual positive examples that are correctly predicted as positive. It reflects the ability of the model to find all positive examples, that is, how many true positive examples are found by the model. The calculation formula is given in Equation 8:

R e c a l l = T P T P + F N (8)

The F1 score is the harmonic mean of precision and recall. It integrates the two metrics of precision and recall, and is used to balance and comprehensively evaluate the performance of a model. The higher the F1 score, the better the overall performance of the model in terms of both precision and recall, as shown in Equation 9.

F1 score = 2 * P r e c i s i o n * R e c a l l P r e c i s i o n + R e c a l l (9)

4 Experimental verification and analysis

4.1 Evaluation of multiple machine learning algorithms

In this multi-class classification task of student stress, in order to comprehensively evaluate the performance of different models, multiple machine learning models such as LR, DT, Random Forest (RF), Support Vector Machine (SVM), K–Neighbors (KNN), XGB, LightGBM (LGBM), Gradient Boosting (GBM), and Extra Tree (ET) were used for comparative analysis. The dataset was divided into a training set and a test set. We employed five-fold stratified cross-validation on the training set for hyperparameter tuning to determine the optimal parameters, and finally validated the model on the test set. The search ranges and optimal parameters for each algorithm are presented in Table 2.

TABLE 2
www.frontiersin.org

Table 2. Grid search parameters and optimal values for each model.

Ultimately, the performance comparison results of nine machine learning algorithms are presented in Table 3 The DT model demonstrates the best performance in student stress prediction, achieving an accuracy of 0.8955, precision of 0.8967, recall of 0.8955, and F1 score of 0.8952, which are significantly higher than other models. This indicates that DT effectively captures nonlinear relationships in the data through hierarchical feature partitioning, showcasing stronger feature selection capabilities in stress classification tasks. The SVM model follows closely with an accuracy of 0.8909 and an F1 score of 0.8909, approaching the performance of the DT model and demonstrating balanced classification ability for positive and negative samples. LR model and ET model achieve accuracies of 0.8864 and 0.8818, respectively, placing them at a moderate level. LGBM and GBM model both achieve an accuracy of 0.8773, highlighting certain limitations of traditional gradient boosting methods in this task.

TABLE 3
www.frontiersin.org

Table 3. Evaluation metrics of multiple machine learning algorithms.

Because of the randomness in feature selection within the DT algorithm, its accuracy can be affected. To further examine the comparison between the DT model and other models, we used a paired t-test. Through 20 rounds of experiments, we obtained the corresponding accuracy values and carried out a comparative analysis with other models. The experimental results are presented in Table 4. All p < 0.05, indicating that the accuracy of DT is better than that of other models. In the context of small datasets, the integrated characteristics and parameter sensitivity of XGB make it difficult for it to fully display its advantages, and the difference is the most significant (t = 24.07, p = 3.85 × 10−15).

TABLE 4
www.frontiersin.org

Table 4. T-test results of DT with other models.

Table 5 shows the accuracy of various machine learning algorithms for different student stress categories (Stress-0, Stress-1, Stress-2). In the Stress-0 category, RF performed optimally with an accuracy of 0.9211, DT followed closely with an accuracy of 0.9079, indicating strong ability to identify students in the low-stress group. For the Stress-1 category, the DT model excelled with an accuracy of 0.9178, outperforming other models. Both the RF and LR models achieved an accuracy of 0.8493, ranking lowest and demonstrating limited identification capabilities for the moderate-stress group. In the Stress-2 category, SVM showed the best performance, while the DT and ET models both achieved an accuracy of 0.8451, indicating average identification capabilities.

TABLE 5
www.frontiersin.org

Table 5. Accuracy of different models for student stress categories.

4.2 Evaluation of multiple intelligent optimization algorithms

In this study, when optimizing the DT using multiple intelligent optimization algorithms (BES, SSA, GWO, WOA, HHO), the key parameters to be optimized are the values of max_depth, min_samples_leaf, max_features, min_weight_fraction_leaf, and ccp_alpha.

In DT, the max_depth parameter represents the maximum depth of the DT, that is, the number of nodes on the longest path from the root node to the leaf node. This parameter is used to control the complexity of the DT, and the search range is from 3 to 20. And the min_samples_leaf parameter refers to the minimum number of samples required in a leaf node, and the search range is from 0.01 to 0.2. The max_features parameter represents the percentage of the maximum number of features considered during each node split, and the search range is from 0.01 to 1. The min_weight_fraction_leaf parameter represents the minimum weight fraction of samples in a leaf node, and the search range is from 0.01 to 0.2. The ccp_alpha parameter is a parameter used for cost complexity pruning. It balances the complexity of the DT and the fitting error on the training data, and the search range is from 0.01 to 0.2. After 500 iterations, the fitness of the model tends to be stable. The optimal values of different parameters are shown in Table 6.

TABLE 6
www.frontiersin.org

Table 6. Comparison of key parameter values of DT.

Figure 3 shows the changes in the accuracy of DT models based on different optimization algorithms (BES-DT, SSA-DT, GWO-DT, WOA-DT, HHO-DT) during the training process with the number of iterations when the number of iterative steps is 500. Overall, the accuracy of each model shows different trends during the iteration process. Some models quickly reach a high accuracy in the early stage and remain stable, while some models experience a more tortuous growth process before tending to be stable.

FIGURE 3
Line chart showing the fitness of accuracy over epochs for five algorithms: BES-DT (black), SSA-DT (red), GWO-DT (yellow), WOA-DT (green), and HHO-DT (blue). HHO-DT reaches the highest accuracy.

Figure 3. DT optimized by different intelligent optimization algorithms.

The accuracy of the HHO-DT model is relatively low in the early stage of iteration, about 0.905. However, with the progress of iteration, its accuracy gradually increases. It reaches about 0.918 at around the 30th iteration and remains stable at 0.927 subsequently. This indicates that the HHO has certain potential when optimizing the hyper parameters of the DT. Although its performance is not good in the early stage, it can effectively improve the performance of the model through continuous iteration and finally enable the model to achieve a high accuracy. The final accuracies of the GWO-DT and BES-DT models are 0.923; the final accuracies of the SSA-DT and WOA-DT models are 0.918. We further performed 10 rounds of verification on the HHO-DT, BES-DT, GWO-DT, SSA-DT, and WOA-DT algorithms, and calculated their average values. As shown in Supplementary Table 1, the HHO-DT algorithm achieves the highest accuracy.

In this study, we compared with grid search (GS) strategy and tuned five core parameters. The parameter ranges were consistent with the previous description, max_depth was set with a step size of 2, min_samples_leaf and min_weight_fraction_leaf with a step size of 0.01, max_features with a step size of 0.2, and ccp_alpha with a step size of 0.05.

From Figure 4, it can be seen that the GWO-DT and WOA-DT have the shortest running times, which are 15.4 s and 15.6 s, respectively. These algorithms are suitable for scenarios requiring rapid parameter tuning. The running time of the HHO-DT algorithm is 26.9 s, which lies between that of the group search algorithm (BES at 44.2 s) and the individual search algorithm (SSA-DT at 30.4 s). The GS-DT takes 117.2 s, and as the step size becomes smaller, the grid search time will increase further. HHO-DT control the running time within a reasonable range while ensuring optimization effectiveness, making it particularly suitable for small-dataset scenarios. In contrast, although grid search can exhaustively enumerate parameter combinations, its computational cost is excessively high.

FIGURE 4
Bar chart comparing runtimes in seconds for six algorithms: BES-DT at 44.2, SSA-DT at 30.4, GWO-DT at 15.4, WOA-DT at 15.6, HHO-DT at 26.9, and GS-DT at 117.1.

Figure 4. Comparison of running time of different optimization algorithms.

Based on the fitness iteration curves in Figure 3, the GWO-DT, HHO-DT, SSA-DT, and DT are selected for a comparative analysis using confusion matrices, and the results are shown in Figure 5. The accuracy of GWO-DT is 0.9227, that of HHO-DT is 0.9272, SSA-DT is 0.9181, and DT is 0.8909. It can be seen that the accuracies of the DT models improved by optimization algorithms are all higher than that of the original model, among which the HHO-DT has the highest accuracy. As is shown in Figure 6, in the HHO-DT model, only 16 samples are misclassified, which is the smallest number among the compared models, indicating that the HHO algorithm is quite effective in enhancing the classification performance of the DT model.

FIGURE 5
Confusion matrices for four models: GWO-DT with 92.27% accuracy, HHO-DT with 92.72% accuracy, SSA-DT with 91.81% accuracy, and DT with 89.09% accuracy. Each matrix compares true and predicted labels across three classes.

Figure 5. Confusion matrices in student stress classification.

FIGURE 6
Scatter plot showing class labels for samples. True labels are light blue, predicted labels are magenta, and misclassified samples are marked with green Xs. The X-axis represents samples from 0 to 225, and the Y-axis shows class labels 0, 1, and 2.

Figure 6. True and predicted labels with misclassified samples.

4.3 SHAP analysis of stress level

As is shown in Figure 7, a bar chart of feature importance based on SHAP values, presenting the average impact degree of different features on the prediction results of students’ stress levels. It can be seen from the figure that the average SHAP value of blood_pressure is the highest, indicating that it has the greatest impact on the model’s prediction of students’ stress levels and is a key factor affecting students’ stress levels. Features such as social_support, depression, and self_esteem also have relatively high average SHAP values, indicating that they also play a relatively important role in the prediction of students’ stress levels. However, features such as study_load and academic_performance have relatively low average SHAP values and have a relatively small impact on the model’s prediction results.

FIGURE 7
Bar graph showing factors affecting model output magnitude, with blood pressure having the highest impact, followed by social support, depression, self-esteem, and extracurricular activities. Other factors include sleep quality, living conditions, and academic performance.

Figure 7. SHAP values of various features for predicting students’ stress levels.

As shown in Figure 8, it displays the SHAP values of various features for predicting students’ stress levels in a specific case. The f(x) = 5.168 on the far-right side of the figure is the final predicted value of the model after comprehensively considering all features. The values and arrows next to each feature indicate the impact of that feature on the prediction result. A positive value such as +0.52 for blood_pressure means that the feature increases the predicted value, that is, it drives the model to predict a higher stress level; a negative value such as −0.02 for bullying means that the feature decreases the predicted value, that is, it inhibits the model from predicting a higher students’ stress level. The larger the absolute value of the number, the greater the impact of the feature on the prediction result. Through this figure, we can intuitively understand the comprehensive impact of each feature on the prediction of students’ stress levels, which is helpful for analyzing which factors play a major role in the prediction of students’ stress, thus providing a reference basis for subsequent research and intervention measures on students’ stress.

FIGURE 8
Bar graph showing SHAP values for factors affecting mental health. Positive influences, indicated in red, include blood pressure, social support, and depression. Negative factors, in blue, include peer pressure and future career concerns. Blood pressure has the highest positive SHAP value of +2.52.

Figure 8. SHAP values of various features in a specific case.

As shown in Figure 9, for some features such as blood_pressure and social_support, they show relatively dark colors in a large number of instances, indicating that these features have a relatively significant and widespread impact on the prediction of students’ stress levels in different samples. For the “Sum of 11 other features,” its color distribution is relatively light and scattered, suggesting that the sum of these 11 features has a relatively small impact on the model’s prediction.

FIGURE 9
SHAP summary plot showing the impact of various features on model output for over 700 instances. Features include blood pressure, social support, depression, self-esteem, and others. SHAP values range from negative to positive impacts, indicated by colors from blue to red.

Figure 9. SHAP values of various features for multiple cases.

To explore the role of social support in the mechanism underlying students’ stress, this study categorized participants into the high social support group (score ≥ 3) and low social support group (score ≤ 2) based on their social_support scores. Independent samples t-test was employed to compare feature differences between the two groups, with results presented in Table 7. At the conventional significance threshold of 0.05, statistically significant differences were observed in 12 features between the two groups. For instance, the high social support group exhibited significantly better living_conditions (t = 10.70, p < 0.001) and higher satisfaction with basic_needs (t = 5.77, p < 0.001) than the low social support group, while their noise exposure was significantly lower (t = 5.82, p < 0.001). These findings suggest a mutually reinforcing relationship between a favorable living environment and higher social support. However, no statistically significant differences were found in 7 features, including teacher_student_relationship, depression, and study_load (p > 0.05), highlighting the need for further investigation into the relationships between these factors and social support in future research.

TABLE 7
www.frontiersin.org

Table 7. T-test for feature differences between high and low social support groups.

5 Discussion and Conclusion

5.1 Deep exploration of feature impacts

Features with high SHAP values, as shown in Figures 68, such as social_support and depression, provide important directions for formulating student stress intervention strategies. Strengthening students’ social support networks can be achieved through various means. For example, organizing social activities to promote communication and mutual assistance among students; establishing a good communication mechanism between teachers and students so that students can receive timely support and help when encountering problems. To pay attention to students’ mental health status, mental health education courses can be carried out to improve students’ psychological adjustment ability, and psychological counseling services can be provided to detect and solve students’ psychological problems in a timely manner. By improving these factors, it is expected to effectively reduce students’ stress levels. Arya et al. (2024) identified through correlation analysis that self_esteem and sleep_quality exert significant negative impact on students’ stress, while bullying has a significant positive impact on students’ stress. In contrast, the present study, based on quantitative analysis using the SHAP model, demonstrates that physiological factors are the most influential predictors of stress, followed by social_support and depression.

Elevated blood pressure aligns with the physiological outcomes of Lazarus’ transactional model (Obbarius et al., 2021). After primary appraisal (evaluating a stressor as threatening), secondary appraisal (assessing coping resources) triggers physiological responses like increased blood pressure, validating the theory’s focus on mind-body interactions in stress. The SHAP emphasis on social_support reflects Lazarus’ secondary appraisal process. Low social support signals insufficient coping resources, a key factor in stress evaluation. Individuals appraising their environment as lacking supportive relationships (e.g., friends, family) are more likely to perceive stressors as unmanageable.

In SHAP analysis, study_load and academic_performance have the least impact on student stress in Nepal. This outcome is shaped by the interplay of educational systems, social culture, and economic conditions. Compared to country like China, Nepal’s low academic stress stems from a balance between limited educational resources and societal tolerance. Local communities prioritize basic education completion over individual academic excellence, with parents focusing more on children’s wellbeing than exam results. Additionally, due to limited access to higher education, only a small fraction of students pursuing university education face intense competition.

Based on the results of SHAP analysis, there are likely interactions between features. For example, good social support may reduce the impact of depression on students’ stress levels, while poor social support may exacerbate the negative impact of depression. Although the current research has revealed the individual impacts of each feature, the interactions between features have not been deeply explored. In the future, more complex data analysis methods, such as constructing interaction effect models, can be used to deeply study these interactions and further improve the understanding of the student stress prediction mechanism. By delving deeper into the complex relationships between features, more targeted and precise intervention measures can be developed to improve the intervention effect.

We analyzed the misclassified cases. There is a case where the corresponding stress level is stress-2, with both depression and social support rated as 1. The model misclassified it as stress-1. Judging from a single feature, although there is a lack of social support, considering its depression feature and other dimensional combinations, the model classified it as moderate stress. In the actual data environment, the interaction between them and numerous other features may have changed the final prediction result. In the future, we can try to construct new composite features or transform the existing ones to better capture the potential information in the data. For example, combining relevant physiological, psychological and social features to form more representative comprehensive features that can help the model judge students’ stress levels more accurately.

5.2 Measures to alleviate students’ stress

It is essential to establish a robust support system, wherein schools and families collaborate to create a strong network that provides necessary psychological counseling and support services (Ruiz and Lopez, 2024). Students should be guided to perceive stress accurately, encouraging them to analyze situations calmly and view stressful circumstances as opportunities to develop their abilities, rather than succumbing to fear or avoidance. Universities can use our prediction model’s results to identify students who at high risk of stress, and can allocate mental health resources more effectively, focusing on students who need it most.

Through targeted training and education, students can be equipped with effective coping strategies (English and Chi, 2020). Instruction in emotional regulation skills, such as deep breathing, relaxation techniques, and meditation, can enable students to swiftly regain composure during emotional turmoil, thereby preventing them from being overwhelmed by negative emotions and allowing them to manage pressure more rationally. Furthermore, through education on resilience, students can cultivate a tenacious character by facing repeated challenges. They should be taught that failure serves as a stepping stone to success, and that each setback is an opportunity to rise again and grow stronger, ultimately enhancing their psychological endurance.

To enhance educational outcomes, it is essential to improve teaching methods and evaluation systems, reduce unnecessary academic burdens, and foster a more relaxed learning atmosphere (Mahalingam et al., 2018; Tom, 2022). This can be achieved through the development of personalized learning plans that align with students’ individual learning rhythms, strengths, and areas for improvement. It is important to allocate study time effectively, emphasizing key concepts and challenging topics while avoiding rote practice of questions. This approach aims to enhance learning efficiency and alleviate academic pressure. Students should be encouraged to explore learning strategies that resonate with their preferences; for instance, visual learners might benefit from mind maps to bolster memory retention, while auditory learners could utilize audiobooks to facilitate their studies. Such tailored methods can stimulate interest in learning and improve academic performance. Furthermore, it is crucial for students to actively communicate any learning challenges to their teachers and provide timely feedback on their progress. This collaboration allows educators to adjust their teaching strategies effectively, fostering a mutually beneficial learning environment.

Regular mental health lectures and activities should be organized to bolster students’ psychological resilience (Arif et al., 2021). Additionally, a variety of team activities, such as group projects and club competitions, can promote communication and collaboration among students, helping them forge meaningful friendships, enhance social skills, and alleviate social anxiety through interaction. For students facing significant social challenges, timely intervention by professional psychological counselors can be invaluable. Through one-on-one tutoring, these counselors can help students navigate their emotions, rebuild confidence, and develop healthy interpersonal relationships. Finally, supporting students in exploring personal interests and hobbies—such as painting, music, and reading—can provide them with an immersive escape from their worries, enriching their spiritual lives and adding vibrancy to their experiences.

5.3 Comparison with deep learning-based approaches

The HHO-DT model offers a clear classification logic, yet its universality and accuracy need further verification with more data. Among various machine learning algorithms, the DT algorithm shows good performance in accuracy. Meanwhile, artificial intelligence has made significant advancements in stress prediction, with traditional machine learning and deep learning methods each demonstrating distinct characteristics. Compared to machine learning, deep learning approaches such as CNN (Tian, 2022), LSTM (Hafeez and Shakil, 2023) show greater potential in processing complex multimodal data (e.g., physiological signals, text, behavioral logs) (Morgan, 2017). However, they rely on large-scale datasets and high computational resources while suffering from insufficient model transparency. As shown in Arya et al. (2024), in terms of algorithm performance, the Naive Bayes model performed the best, achieving a test accuracy of 0.90, while the SVM model performed the worst. In contrast, the HHO-DT model proposed in our study has a test accuracy improved to 0.927, representing an increase of 2.7% points. This demonstrates a significant advantage in prediction accuracy.

Additionally, deep learning excels in automated feature extraction, capable of capturing latent patterns from raw data (e.g., heart rate variability, social media text), but faces challenges like ethical risks and high deployment costs. The core differences between this study and deep learning lie in data scale, interpretability, and applicable scenarios. The lightweight HHO-DT model is suitable for resource-constrained environments with transparent decision logic, facilitating educators’ intervention strategy formulation. In contrast, deep learning can uncover more complex nonlinear relationships with sufficient data but requires balancing performance and interpretability. Future research could explore hybrid models, for example, using CNN to extract physiological signal features before combining with optimized DT for classification, to integrate automation and interpretability. Ultimately, method selection should consider data characteristics, interpretability requirements, and resource constraints. This study provides an efficient and reliable solution for medium-scale scenarios, while deep learning still holds vast development potential in multimodal big data applications.

5.4 Future research directions

To improve the performance of models in student stress prediction, we can attempt to introduce new machine learning algorithms, such as neural network models in deep learning. These models have stronger non-linear fitting capabilities and can better capture the complex relationships in the data. We can also improve existing algorithms. For example, adjust the growth strategy of DT, optimize the combination method of ensemble learning models, etc., to enhance the accuracy and stability of the models. Regarding the problem of sample imbalance, data sampling techniques such as oversampling, undersampling, or synthetic minority over-sampling technique (SMOTE) can be adopted to balance the sample sizes of different stress categories and improve the model’s ability to identify minority classes.

Future research can further expand the exploration of features affecting students’ stress levels. More psychological factors, such as students’ personality traits and coping styles, can be considered, as well as environmental factors like family environment and school atmosphere. A variety of data analysis methods, such as structural equation modeling and Bayesian networks, can be used to deeply study the interactions between features. These methods can more accurately reveal the causal relationships and complex network structures among variables, providing stronger theoretical support for the formulation of intervention measures. At the same time, longitudinal research methods can be combined to track the changes in students’ stress levels, dynamically analyze the relationships between features, and improve the timeliness and accuracy of student stress prediction. And, transformer can also be applied to analyze long sequence information such as students’ behavioral data and changes in psychological states. We can explore its potential in capturing complex features and long term trends, and compare it with traditional machine learning models to evaluate its advantages and applicability. By integrating real-time monitoring technologies and leveraging Internet of Things (IoT) devices such as wearables, we can achieve dynamic monitoring and early warning of students’ stress levels. Additionally, we can develop or integrate existing mobile health applications to enable students to conveniently record information about their psychological feelings and daily activities. Through interaction with the prediction model, students can achieve self-monitoring and management of their stress.

In the future, we will actively collaborate with multiple schools and educational institutions to collect large scale student data covering students from different regions, age groups, and disciplines. By conducting experiments on data from different institutions, we can evaluate the model’s performance in different environments, identify the problems and limitations of the model, and then make improvements and optimizations.

6 Limitations of the data

The dataset sourced from Kaggle presents several limitations. First, the sample lacks representativeness and exhibits geographic and cultural bias. The data is predominantly collected from Nepal and does not include regions such as the United States or other countries. Second, the dataset only covers individuals aged 15–24, making it difficult to reflect characteristics of younger groups like elementary school students. Although we conducted subgroup analysis focusing on the social support dimension to explore potential differences in stress levels among groups with varying access to social support, the study still faces a key constraint: due to the lack of an explicit age variable in the dataset, we cannot perform more granular stratification by academic stages. This limitation prevents the full manifestation of differences in how social support moderates stress across different developmental stages. Additionally, critical variables are missing, including key social determinants of student health such as household income, parental education levels, and community resources, which leads the model to overlook the impacts of structural inequalities. Finally, the dataset lacks temporal dynamics, as it is cross-sectional and fails to capture longitudinal changes in student conditions over time.

Although the proposed model achieves an accuracy of 0.928 on the dataset used in this study, its generalization ability still requires further discussion, primarily due to limitations imposed by differences across academic stages as well as regional and cultural disparities. Meanwhile, student stress prediction involves sensitive information such as psychological states and physiological indicators, and model decisions may affect the allocation of intervention resources, thus, attention must also be paid to privacy protection and bias mitigation.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

This study was approved Jiangsu Vocational Institute of Commerce (Approval No. 2023H038). The research process strictly adheres to relevant ethical guidelines, fully protecting the rights and interests of the research subjects. The research was conducted in accordance with established ethical principles for research involving human participants, including the Declaration of Helsinki. The data used in this study were obtained from a publicly available dataset collected by a previous research project. According to the documentation provided by the data repository, informed consent was obtained from all participants during the original data collection. For participants under the age of 18, written informed consent was also obtained from their parents or legal guardians. The original data have been anonymized and contain no personally identifiable information. This study strictly adheres to the terms of use for the original dataset, is conducted solely for academic research purposes, does not attempt any re-identification of participants, and does not include analyses beyond the scope of the original authorization.

Author contributions

CL: Writing – original draft, Writing – review & editing. SY: Formal analysis, Validation, Visualization, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the Key Funded Project of Jiangsu Provincial Education Science Planning: Research on Monitoring and Early Warning of College Students’ Mental Health from the Perspective of Digital Intelligence Integration (B-a/2024/09).

Acknowledgments

We would like to express gratitude to the team from Jiangsu Vocational Institute of Commerce (Multi-dimensional Heterogeneous Elastic Network Engineering Technology and Application).

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2025.1684529/full#supplementary-material

References

Alkhammash, E. H., Assiri, S. A., Nemenqani, D. M., Althaqafi, R. M. M., Hadjouni, M., Saeed, F., et al. (2023). Application of machine learning to predict COVID-19 spread via an optimized BPSO model. Biomimetics 8:457. doi: 10.3390/biomimetics8060457

PubMed Abstract | Crossref Full Text | Google Scholar

Al-Rouq, F., Al-Otaibi, A., AlSaikhan, A., Al-Essa, M., and Al-Mazidi, S. (2022). Assessing physiological and psychological factors contributing to stress among medical students: Implications for health. Int. J. Environ. Res. Public Health 19:16822. doi: 10.3390/ijerph192416822

PubMed Abstract | Crossref Full Text | Google Scholar

Alzakari, S. A., Alhussan, A. A., Qenawy, A.-S. T., and Elshewey, A. M. (2024). Early detection of potato disease using an enhanced convolutional neural network-long short-term memory deep learning model. Potato Res. 68, 695–713. doi: 10.1007/s11540-024-09760-x

Crossref Full Text | Google Scholar

Antoniadou, M., Manta, G., Kanellopoulou, A., Kalogerakou, T., Satta, A., and Mangoulia, P. (2024). Managing stress and somatization symptoms among students in demanding academic healthcare environments. Healthcare 12:2522. doi: 10.3390/healthcare12242522

PubMed Abstract | Crossref Full Text | Google Scholar

Arif, S., Moran, K., Quinones-Boex, A., and El-Ibiary, S. (2021). Student stress management and wellness programs among colleges of pharmacy. Innov. Pharm. 12:14. doi: 10.24926/iip.v12i2.3478

PubMed Abstract | Crossref Full Text | Google Scholar

Arya, S., Anju, A., and Azuana Ramli, N. (2024). Predicting the stress level of students using Supervised machine learning and Artificial neural network (ANN). Indian J. Eng. 21, 1–24. doi: 10.54905/disssi.v21i55.e9ije1684

Crossref Full Text | Google Scholar

Bartlett, M. L., Taylor, H., and Nelson, J. D. (2016). Comparison of mental health characteristics and stress between baccalaureate nursing students and non-nursing students. J. Nurs. Educ. 55, 87–90. doi: 10.3928/01484834-20160114-05

PubMed Abstract | Crossref Full Text | Google Scholar

Calderon, A., Baik, S. Y., Ng, M. H. S., Fitzsimmons-Craft, E. E., Eisenberg, D., Wilfley, D. E., et al. (2024). Machine learning and Bayesian network analyses identifies associations with insomnia in a national sample of 31,285 treatment-seeking college students. BMC Psychiatry 24:656. doi: 10.1186/s12888-024-06074-7

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, J., Su, S., and Xu, H. (2011). “Decision tree construction algorithm based on multiscale rough set model,” in Proceedings of the 2011 International Conference on Computational and Information Sciences, (Piscataway, NJ: IEEE), 247–250. doi: 10.1109/ICCIS.2011.123

Crossref Full Text | Google Scholar

Chen, W. (2013). A meta-analysis of the relationships among students’ stress, coping strategies, and gender. J. Chinese Mental Health 26, 347–369. doi: 10.30074/FJMH.201309_26(3).0001

Crossref Full Text | Google Scholar

Chust-Hernández, P., Fernández-García, D., López-Martínez, L., García-Montañés, C., and Pérez-Ros, P. (2022). Female gender and low physical activity are risk factors for academic stress in incoming nursing students. Perspect. Psychiatric Care 58, 1281–1290. doi: 10.1111/ppc.12928

PubMed Abstract | Crossref Full Text | Google Scholar

El Morr, C., Jammal, M., Bou-Hamad, I., Hijazi, S., Ayna, D., Romani, M., et al. (2024). Predictive machine learning models for assessing lebanese university students’ depression, anxiety, and stress during COVID-19. J. Primary Care Community Health 15:21501319241235588. doi: 10.1177/21501319241235588

PubMed Abstract | Crossref Full Text | Google Scholar

Elkenawy, E.-S. M., Alhussan, A. A., Khafaga, D. S., Tarek, Z., and Elshewey, A. M. (2024). Greylag goose optimization and multilayer perceptron for enhancing lung cancer classification. Sci. Rep. 14:23784. doi: 10.1038/s41598-024-72013-x

PubMed Abstract | Crossref Full Text | Google Scholar

Elshewey, A. M., Alhussan, A. A., Khafaga, D. S., Elkenawy, E.-S. M., and Tarek, Z. (2024). EEG--BER metaheuristic algorithm. Sci. Rep. 14:24489. doi: 10.1038/s41598-024-74475-5

PubMed Abstract | Crossref Full Text | Google Scholar

English, A., and Chi, R. (2020). A longitudinal study on international students’ stress, problem focused coping and cross-cultural adaptation in China. J. Int. Students 10, 73–86. doi: 10.32674/jis.v10iS(3).1774

PubMed Abstract | Crossref Full Text | Google Scholar

Ghosh, S., Tripathi, K., Garg, A., Singh, D., Prasad, A., Bhavsar, A., et al. (2024). “Predicting stress among students via psychometric assessments and machine learning,” in Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments, (New York, NY: ACM), 662–669. doi: 10.1145/3652037.3663949

Crossref Full Text | Google Scholar

Guo, R. (2024). The relationship between adolescent learning stress and emotional problems, regulatory factors and suggestions for school education. J. Educ. Humanities Soc. Sci. 34, 39–44. doi: 10.54097/2kx76d70

Crossref Full Text | Google Scholar

Guo, W. (2024). “Improved strategy based sparrow search algorithm for UAV 2D path optimization,” in Proceedings of the 2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI), (Piscataway, NJ: IEEE), 1375–1380. doi: 10.1109/ICETCI61221.2024.10594594

Crossref Full Text | Google Scholar

Hafeez, M. A., and Shakil, S. (2023). EEG-based stress identification and classification using deep learning. Multimedia Tools Appl. 83, 42703–42719. doi: 10.1007/s11042-023-17111-0

Crossref Full Text | Google Scholar

Hancock, J. T., Khoshgoftaar, T. M., and Liang, Q. (2025). A problem-agnostic approach to feature selection and analysis using SHAP. J. Big Data 12:12. doi: 10.1186/s40537-024-01041-1

Crossref Full Text | Google Scholar

Harahap, N. R. A., Badrujaman, A., and Hidayat, D. R. (2022). Determinants of academic stress in students. Bisma J Counseling 6, 335–345. doi: 10.23887/bisma.v6i3.53548

Crossref Full Text | Google Scholar

Heidari, A. A., Mirjalili, S., Faris, H., Aljarah, I., Mafarja, M., and Chen, H. (2019). Harris hawks optimization: Algorithm and applications. Future Generation Comput. Syst. 97, 849–872. doi: 10.1016/j.future.2019.02.028

Crossref Full Text | Google Scholar

Hossain, M., Mahfuz, T., Latif, S., and Hossain, M. E. (2023). Determinants of financial stress among university students and its impact on their performance. J. Appl. Res. High. Educ. 15, 226–237. doi: 10.1108/JARHE-02-2021-0082

Crossref Full Text | Google Scholar

Jia, X., and Chu, W. (2024). Research on stress management and coping strategies for college students. Front. Bus. Econ. Manag. 14:153–156. doi: 10.54097/yjps9d69

Crossref Full Text | Google Scholar

Kiupel, S., Kupfer, J., Kottlors, S., Gieler, U., Yosipovitch, G., and Schut, C. (2023). Is stress related to itch in German students? Results of an online survey. Front. Med. 10:1104110. doi: 10.3389/fmed.2023.1104110

PubMed Abstract | Crossref Full Text | Google Scholar

Kong, W., Pei, Z., Guo, Z., Xu, R., and Zhao, J. (2025). Relationship matters: Using machine learning methods to predict the mental health severity of Chinese college freshmen during the pandemic period. J. Affect. Disord. 369, 392–403. doi: 10.1016/j.jad.2024.09.168

PubMed Abstract | Crossref Full Text | Google Scholar

Liao, Z., Fan, X., Ma, W., and Shen, Y. (2024). An examination of mental stress in college students: Utilizing intelligent perception data and the mental stress scale. Mathematics 12:1501. doi: 10.3390/math12101501

Crossref Full Text | Google Scholar

Mahalingam, J., Khatri, C., and Fitzgerald, E. (2018). Pressure of academic publishing for medical students: A student’s perspective. Postgraduate Med. J. 94, 367–368. doi: 10.1136/postgradmedj-2017-135440

PubMed Abstract | Crossref Full Text | Google Scholar

Mahoney, S., and Bussard, M. E. (2024). Relaxation as an educational strategy for stress management and resiliency. Nurse Educ. 49, E213–E217. doi: 10.1097/NNE.0000000000001568

PubMed Abstract | Crossref Full Text | Google Scholar

Maykrantz, S. A., and Houghton, J. D. (2020). Self-leadership and stress among college students: Examining the moderating role of coping skills. J. Am. Coll. Health 68, 89–96. doi: 10.1080/07448481.2018.1515759

PubMed Abstract | Crossref Full Text | Google Scholar

Morgan, B. M. (2017). Stress management for college students: An experiential multi-modal approach. J. Creativity Mental Health 12, 276–288. doi: 10.1080/15401383.2016.1245642

Crossref Full Text | Google Scholar

Nguyen, T. T. T., Seki, N., and Morio, I. (2018). Stress predictors in two Asian dental schools with an integrated curriculum and traditional curriculum. Eur. J. Dental Educ. 22, e594–e601. doi: 10.1111/eje.12358

PubMed Abstract | Crossref Full Text | Google Scholar

Nguyen-Thi, T. (2024). Prevalence of stress and related factors among healthcare students: A cross – sectional study in Can Tho City, Vietnam. Annali Igiene Med. Prevent. E Comunità 36, 292–301. doi: 10.7416/ai.2023.2591

PubMed Abstract | Crossref Full Text | Google Scholar

Obbarius, N., Fischer, F., Liegl, G., Obbarius, A., and Rose, M. (2021). A modified version of the transactional stress concept according to lazarus and folkman was confirmed in a psychosomatic inpatient sample. Front. Psychol. 12:584333. doi: 10.3389/fpsyg.2021.584333

PubMed Abstract | Crossref Full Text | Google Scholar

Ragab, E. A., Dafallah, M. A., Salih, M. H., Osman, W. N., Osman, M., Miskeen, E., et al. (2021). Stress and its correlates among medical students in six medical colleges: An attempt to understand the current situation. Middle East Curr. Psychiatry 28:75. doi: 10.1186/s43045-021-00158-w

Crossref Full Text | Google Scholar

Rahman, M. A., and Kohli, T. (2024). Mental health analysis of international students using machine learning techniques. PLoS One 19:e0304132. doi: 10.1371/journal.pone.0304132

PubMed Abstract | Crossref Full Text | Google Scholar

Ruiz, J., and Lopez, J. (2024). Student and societal pressures: Exploring causes, consequences and potential stress-reduction strategies. J. Student Res. 13:6358. doi: 10.47611/jsrhs.v13i1.6358

Crossref Full Text | Google Scholar

Saipanish, R. (2003). Stress among medical students in a Thai medical school. Med. Teach. 25, 502–506. doi: 10.1080/0142159031000136716

PubMed Abstract | Crossref Full Text | Google Scholar

Schaab, B. L., Calvetti, P. Ü, Hoffmann, S., Diaz, G. B., Rech, M., Cazella, S. C., et al. (2024). How do machine learning models perform in the detection of depression, anxiety, and stress among undergraduate students? A systematic review. Cadernos de Saúde Pública 40:e00029323. doi: 10.1590/0102-311xen029323

PubMed Abstract | Crossref Full Text | Google Scholar

Shin, Y. (2024). Students under academic pressure and their spillover effects on peers’ mental well-being. Labour Econ. 90:102564. doi: 10.1016/j.labeco.2024.102564

Crossref Full Text | Google Scholar

Shvetcov, A., Funke Kupper, J., Zheng, W.-Y., Slade, A., Han, J., Whitton, A., et al. (2024). Passive sensing data predicts stress in university students: A supervised machine learning method for digital phenotyping. Front. Psychiatry 15:1422027. doi: 10.3389/fpsyt.2024.1422027

PubMed Abstract | Crossref Full Text | Google Scholar

Tang, X., Dong, P., Liu, X., Wang, Y., and Gan, X. (2024). “Research on high-precision landslide displacement prediction based on improved intelligent optimization algorithm and machine learning model,” in Proceedings of the 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), (Berlin: Springer), 2152–2158. doi: 10.1109/AINIT61980.2024.10581586

Crossref Full Text | Google Scholar

Tarek, Z., Alhussan, A. A., Khafaga, D. S., El-Kenawy, E.-S. M., and Elshewey, A. M. (2025). A snake optimization algorithm-based feature selection framework for rapid detection of cardiovascular disease in its early stages. Biomed. Signal Process. Control 102:107417. doi: 10.1016/j.bspc.2024.107417

Crossref Full Text | Google Scholar

Tian, Y. (2022). Identification and modeling of college students’ psychological stress indicators for deep learning. Scientific Programming 2022, 1–9. doi: 10.1155/2022/6048088

Crossref Full Text | Google Scholar

Tom, S. (2022). Effect of perceived academic stress on college students. YMER Digital 21, 343–352. doi: 10.37896/YMER21.06/33

Crossref Full Text | Google Scholar

Xue, J., and Shen, B. (2020). A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 8, 22–34. doi: 10.1080/21642583.2019.1708830

Crossref Full Text | Google Scholar

Ye, Z., Xu, Y., He, Q., Wang, M., Bai, W., and Xiao, H. (2022). Feature selection based on adaptive particle swarm optimization with leadership learning. Comput. Intell. Neurosci. 2022, 1–18. doi: 10.1155/2022/1825341

PubMed Abstract | Crossref Full Text | Google Scholar

Yildiz, A. R., Abderazek, H., and Mirjalili, S. (2020). A comparative study of recent non-traditional methods for mechanical design optimization. Arch. Comput. Methods Eng. 27, 1031–1048. doi: 10.1007/s11831-019-09343-x

Crossref Full Text | Google Scholar

Zaky, A. A., Fathy, A., Rezk, H., Gkini, K., Falaras, P., and Abaza, A. (2021). A modified triple-diode model parameters identification for perovskite solar cells via nature-inspired search optimization algorithms. Sustainability 13:12969. doi: 10.3390/su132312969

Crossref Full Text | Google Scholar

Zhang, L., Zhao, S., Yang, Z., Zheng, H., and Lei, M. (2024). An artificial intelligence platform to stratify the risk of experiencing sleep disturbance in university students after analyzing psychological health, lifestyle, and sports: A multicenter externally validated study. Psychol. Res. Behav. Manag. 17, 1057–1071. doi: 10.2147/PRBM.S448698

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: student stress prediction, decision tree algorithm, harris hawks optimization, SHAP model, machine learning

Citation: Liu C and Yu S (2026) Students’ stress prediction and explainable analysis based on improved decision trees. Front. Psychol. 16:1684529. doi: 10.3389/fpsyg.2025.1684529

Received: 13 August 2025; Revised: 26 November 2025; Accepted: 27 November 2025;
Published: 02 January 2026.

Edited by:

Enrique H. Riquelme, Temuco Catholic University, Chile

Reviewed by:

Jayashree Mohanty, Chandigarh University, India
Nahumi Nugrahaningsih, University of Palangka Raya, Indonesia
Subhashree Darshana, Kiit University, India

Copyright © 2026 Liu and Yu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Cheng Liu, MjAwMDIyQGp2aWMuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.