AUTHOR=Li Ziyi , Yang Na , He Liyun , Wang Jialu , Ping Fan , Li Wei , Xu Lingling , Zhang Huabing , Li Yuxiu TITLE=Development and validation of questionnaire-based machine learning models for predicting all-cause mortality in a representative population of China JOURNAL=Frontiers in Public Health VOLUME=Volume 11 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1033070 DOI=10.3389/fpubh.2023.1033070 ISSN=2296-2565 ABSTRACT=Background: Considering that the previously developed mortality prediction models have limited applications to the Chinese population, a questionnaire-based prediction model is of great importance for its accuracy and convenience in clinical practice. Methods: Two national cohort, namely, the China Health and Nutrition Survey (CHNS) and the China Health and Retirement Longitudinal Study were used for model development and validation. One hundred and fifty-nine variables were compiled to generate predictions. The Cox regression model and six machine learning (ML) models were used to predict all-cause mortality. Finally, a simple questionnaire-based ML prediction model was developed using the best algorithm and validated. Results: In the internal validation set, all the ML models performed better than the traditional Cox model and the random survival forest (RSF) model performed best. The questionnaire-based ML model, which only included 20 variables, achieved a C-index of 0.86 (95%CI: 0.80–0.92). On external validation, the simple questionnaire-based model achieved a C-index of 0.82 (95%CI: 0.77–0.87), 0.77 (95%CI: 0.75–0.79), and 0.79 (95%CI: 0.77–0.81), respectively, in predicting 2-, 9-, and 11-year mortality. Conclusions: In this prospective population-based study, a model based on the RSF analysis performed best among all models. Furthermore, there was no significant difference between the prediction performance of the questionnaire-based ML model, which only included 20 variables, and that of the model with all variables (including laboratory variables). The simple questionnaire-based ML prediction model, which needs to be further explored, is of great importance for its accuracy and suitability to the Chinese general population.