Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cell Dev. Biol.

Sec. Cancer Cell Biology

Volume 13 - 2025 | doi: 10.3389/fcell.2025.1627355

This article is part of the Research TopicApplication of Novel Biomarkers and Natural Compounds in Precision OncologyView all 8 articles

Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning

Provisionally accepted
Bingkun  ZhouBingkun Zhou1Hu  ZhouHu Zhou1Xiaodong  HuangXiaodong Huang2Shijie  LiuShijie Liu1*
  • 1Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
  • 2Sun Yat-sen Memorial hospital of Sun Yat-sen University, Guangzhou, China

The final, formatted version of the article will be published soon.

Background Early diagnosis and intervention for chronic kidney disease (CKD) can significantly improve patient’s quality of life and prognosis. Besides routine laboratory indicators and medical history, risk prediction models can predict CKD outcome. However, there is currently a lack of CKD prognostic prediction models based on transcriptomics and machine learning. Methods Utilizing weighted correlation network analysis (WGCNA) and random forest algorithms in GSE137570, three core gene sets of different sizes were constructed, which were externally validated in GSE66494 and GSE180394, and evaluated for their predictive performance in GSE45980 by receiver operating characteristic (ROC) curves. Predictive models were built using Cox regression, LASSO regression, and logistic regression in GSE60861. And the reliability of human CKD transcriptomic analysis and the feasibility of functional studies were validated in a mouse UUO model. Results Combining WGCNA and differential gene analysis, 9 genes positively associated with CKD occurrence and development and 20 genes negatively associated with that were identified. By random forest algorithm, three gene sets were constructed: minimal gene set (CCL2, SUCLG1, ACADM), medium gene set (CCL2, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1), and maximal gene set (CCL2, MMP7, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1). In external validation, the maximal plage score had best classification performance for CKD (AUC:0.767) in GSE66494 and in GSE180394 (AUC:0.760), the medium plage score achieved a predictive performance for CKD progression (AUC=0.758) in GSE45980. In the multivariate model, Cox regression analysis constructed a risk model with only minimal z-score, further LASSO regression analysis included gender and minimal z-score, but logistic regression multivariate analysis failed to be constructed with any score. A high degree of similarity between mouse CKD and human CKD in KEGG enrichment was observed in the mouse unilateral ureteral obstruction model, and the core genes related to the occurrence and progression of human CKD remained diagnostically valuable in mice. Conclusions This study provides a transcriptomics-based risk prediction model for the occurrence and development of CKD based on machine learning, offering potential target genes for the further experimental research of CKD. Key words chronic kidney disease, bioinformatics, machine learning, predictive model

Keywords: minimal gene set (CCL2, SUCLG1, ACADM), medium gene set (CCL2, GGT6, Pck2, SFXN2, SLC34A3

Received: 12 May 2025; Accepted: 28 Jul 2025.

Copyright: © 2025 Zhou, Zhou, Huang and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Shijie Liu, Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.