AUTHOR=Xia Fang , Ren Jie , Liu Linlin , Cui Yanyin , He Yufang TITLE=A machine learning-based depression risk prediction model for healthy middle-aged and older adult people based on data from the China health and aging tracking study JOURNAL=Frontiers in Public Health VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1515094 DOI=10.3389/fpubh.2025.1515094 ISSN=2296-2565 ABSTRACT=BackgroundPredicting depression risk in adults is critical for timely interventions to improve quality of life. To develop a scientific basis for depression prevention, machine learning models based on longitudinal data that can assess depression risk are necessary.MethodsData from 2,331 healthy older adults who participated in the China Health and Retirement Longitudinal Study (CHARLS) from 2018 to 2020 were used to develop and validate the predictive model. Depression was assessed using the 10-item Center for Epidemiologic Studies Depression Scale (CES-D-10), with a score of ≥10 indicating depressive symptoms. Several machine learning algorithms, including logistic regression, k-nearest neighbor, support vector machine, multilayer perceptron, decision tree, and XGBoost, were employed to predict the 2-year depression risk. The dataset was randomly split into a training set (70%) and a testing set (30%), and hyperparameters were optimized in the training phase. The models’ performance was evaluated in the testing set using accuracy, sensitivity, specificity, area under the receiver operator characteristic (ROC) curve, and F1 score. Model interpretability was enhanced using SHapley Additive exPlanations (SHAP).ResultsA total of 563 (24.15%) participants developed depression during the 2-year follow-up period. LASSO regression identified 12 key predictive features from an initial set of 26. Among the six models tested, XGBoost exhibited the best predictive performance, achieving the highest area under the ROC curve (0.774), accuracy (0.722), sensitivity (0.757), and F1 score (0.720), with a specificity of 0.687. Decision curve analysis (DCA) confirmed the net clinical benefit of the XGBoost model across most threshold ranges. SHAP interpretation revealed that cognitive ability, total income, life satisfaction, sleep quality, and pain were the top five most influential factors in predicting depression risk.ConclusionOur findings support the feasibility of using machine learning-based models to predict depression risk in healthy older adults over a 2-year period. The integration of XGBoost and SHAP enhances model interpretability, offering valuable insights into individual risk factors. This approach enables personalized risk assessment, which may help develop targeted interventions for depression prevention in aging populations.