AUTHOR=Song Wenzhu , Liu Yanfeng , Qiu Lixia , Qing Jianbo , Li Aizhong , Zhao Yan , Li Yafeng , Li Rongshan , Zhou Xiaoshuang 

TITLE=Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province

JOURNAL=Frontiers in Medicine

VOLUME=Volume 9 - 2022

YEAR=2023

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2022.930541

DOI=10.3389/fmed.2022.930541

ISSN=2296-858X

ABSTRACT=Introduction: Chronic kidney disease (CKD) is a chronic kidney failure with high incidence and insidious onset. Since China’s rural areas are subject to inadequate medical check-ups and single disease screening programme, it could easily translate into end-stage renal failure. This study aimed to construct an early warning model for CKD tailored to impoverished areas by employing machine learning (ML) algorithms with easily accessible parameters from ten rural areas in Shanxi Province, thereby, promoting a forward shift of treatment time and improving patients’ quality of life.
Methods: From April 2019 to November 2019, an opportunistic screening for CKD was conducted in ten rural areas in Shanxi Province. Demographic information, physical examination, blood and morning urine samples were first collected from 13, 550 subjects. Then, LASSO regression was employed for feature selection of explanatory variables, and the SMOTE (synthetic minority over-sampling technique) algorithm was used to balance target datasets, i,e, albuminuria-to-creatinine ratio (ACR) and α1-microglobulin-to-creatinine ratio (MCR). Afterwards, Logistic Regression (LR), Bagging, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) were constructed to achieve the classification of ACR outcomes and MCR outcomes, respectively. 
Results: A total of 12,330 rural residents enrolled in this study, with 20 explanatory variables. The number of patients with increased ACR and increased MCR were 1587 (12.8%) and 1456 (11.8%), respectively. After feature selection by LASSO, 14 and 15 explanatory variables remained in these two datasets. LR, Bagging, RF, and XGBoost performed well in classification, with the AUC reaching 0.74, 0.87, 0.87, 0.89 for ACR outcomes and 0.75, 0.88, 0.89, 0.90 for MCR outcomes. The five variables contributing most to the classification of ACR outcomes and MCR outcomes represented SBP, TG, TC, and Hcy, DBP and age, TG, SBP, Hcy and FPG, respectively. Overall, the machine learning algorithms could emerge as a warning model for CKD.
Conclusions: ML algorithms in conjunction with rural accessible indexes boast good performance in classification, which allows for an early warning model for CKD. This model could help achieve large-scale population screening for CKD in poverty-stricken areas and should be promoted to improve the quality of life and reduce the mortality rate.