Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Public Health

Sec. Public Mental Health

Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1515094

This article is part of the Research TopicMental Health of Vulnerable Groups: Predictors, Mechanisms, and InterventionsView all 28 articles

A machine learning-based depression risk prediction model for healthy middle-aged and elderly people based on data from the China Health and Aging Tracking Study

Provisionally accepted
Fang  XiaFang Xia1Jie  RenJie Ren1Linlin  LiuLinlin Liu1Yanyin  CuiYanyin Cui2Yufang  HeYufang He1*
  • 1Changchun University of Chinese Medicine, Changchun, China
  • 2School of Nursing, Zhejiang Chinese Medical University, Hangzhou, Jiangsu Province, China

The final, formatted version of the article will be published soon.

Background:Predicting depression risk in adults is critical for timely interventions to improve quality of life. To develop a scientific basis for depression prevention, machine learning models based on longitudinal data that can assess depression risk are necessary. Methods:Data from 2,331 healthy older adults who participated in the China Health and Retirement Longitudinal Study (CHARLS) from 2018 -2020 were used to develop and validate the predictive model. Depression was assessed using the 10-item Center for Epidemiologic Studies Depression Scale (CES-D-10), with a score of ≥ 10 indicating depressive symptoms. Several machine learning algorithms, including logistic regression, k-nearest neighbor, support vector machine, multilayer perceptron, decision tree, and XGBoost, were employed to predict the 2-year depression risk. The dataset was randomly split into a training set (70%) and a testing set (30%), and hyperparameters were optimized in the training phase. The models' performance was evaluated in the testing set using accuracy, sensitivity, specificity, area under the receiver operator characteristic (ROC) curve, and F1 score. Model interpretability was enhanced using SHapley Additive exPlanations (SHAP).Results: A total of 563 (24.15%) participants developed depression during the 2-year follow-up period. LASSO regression identified 12 key predictive features from an initial set of 26. Among the six models tested, XGBoost exhibited the best predictive performance, achieving the highest area under the ROC curve (0.774), accuracy (0.722), sensitivity (0.757), and F1 score (0.720), with a specificity of 0.687. Decision curve analysis (DCA) confirmed the net clinical benefit of the XGBoost model across most threshold ranges. SHAP interpretation revealed that cognitive ability, total income, life satisfaction, sleep quality, and pain were the top five most influential factors in predicting depression risk. Conclusion:Our findings support the feasibility of using machine learning-based models to predict depression risk in healthy older adults over a 2-year period. The integration of XGBoost and SHAP enhances model interpretability, offering valuable insights into individual risk factors. This approach enables personalized risk assessment, which may help develop targeted interventions for depression prevention in aging populations.

Keywords: Prediction model, Depression, machine learning, older adults, china longitudinal study

Received: 22 Oct 2024; Accepted: 07 Jul 2025.

Copyright: © 2025 Xia, Ren, Liu, Cui and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Yufang He, Changchun University of Chinese Medicine, Changchun, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.