AUTHOR=Han Yuchen , Wang Shaobing TITLE=Disability risk prediction model based on machine learning among Chinese healthy older adults: results from the China Health and Retirement Longitudinal Study JOURNAL=Frontiers in Public Health VOLUME=Volume 11 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1271595 DOI=10.3389/fpubh.2023.1271595 ISSN=2296-2565 ABSTRACT=Background: Predicting disability risk in the healthy older adults in China is essential for timely preventive intervention and improving their quality of life, as well as providing scientific evidence for disability prevention. Therefore, developing a machine learning model capable of evaluating disability risk based on longitudinal research data is crucial. Methods: We conducted a prospective cohort study of 2,175 older adults enrolled from the China Health and Retirement Longitudinal Study (CHARLS) between 2015 and 2018 to develop and validate this prediction model. Several machine learning algorithms (Logistic Regression, k-Nearest Neighbor, Naive Bayes, Multilayer Perceptron, Random Forests, and XGBoost) were used to assess the 3-year risk of developing disability. The optimal cut-off points and adjustment parameters are explored in training set, the prediction accuracy of the models is compared in testing set, and the best-performing models are further interpreted. Results: During a 3-year follow-up period, a total of 505 (23.22%) healthy older adults individuals developed disabilities. Among the 43 features examined, the LASSO regression identified 11 features as significant for model establishment. When comparing six different machine learning models on the testing set, the XGBoost model demonstrated the best performance across various evaluation metrics, including a highest area under the ROC curve (0.803), accuracy (0.757), sensitivity (0.790), and F1 score (0.789), while its specificity was 0.712. The decision curve analysis (DCA) indicated showed that XGBoost had the highest net benefit in most of the threshold ranges. Based on the importance of features determined by SHAP (model interpretation method), the top 5 important features were identified as right hand grip strength, depressive symptoms, marital status, respiratory function, and age. Moreover, the SHAP summary plot was used to illustrate the positive or negative impacts attributed to the features influenced by XGBoost. The SHAP dependence plot explained how individual features affected the output of the predictive model. Conclusions: Machine learning models can accurately assess the probability of older adults developing disabilities over a three-year period.A combination of XGBoost and SHAP can provide clear explanations for personalized risk prediction and offer a more intuitive understanding of the impact of key features in the model.