AUTHOR=An Na , Lu Zhongwen , Li Yang , Yang Bing , Ji Shaozhen , Dong Xu , Ding Zhaoliang TITLE=An integrated machine learning framework for developing and validating diagnostic models and drug predictions based on ulcerative colitis genes JOURNAL=Frontiers in Medicine VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1571529 DOI=10.3389/fmed.2025.1571529 ISSN=2296-858X ABSTRACT=Ulcerative colitis (UC) is a long-lasting inflammatory bowel disease that causes inflammation in the intestines and triggers autoimmune responses. This study aims to identify immune-related biomarkers for ulcerative colitis (UC) and explore potential therapeutic targets. First, we downloaded the expression profiles of datasets GSE87466, GSE87473, and GSE92415 from the GEO database. Next, we identified differentially expressed genes (DEGs) that are associated with UC. Using the WGCNA algorithm, we screened key module genes in UC and retrieved immune-related genes (IRGs) from the ImmPort database. We identified immune-related differentially expressed genes by intersecting the results from WGCNA, DEGs, and IRGs. To build a diagnostic model for UC, we applied 113 combinations of 12 machine learning algorithms. This included 10-fold cross-validation on the training set and external validation on the test set. The single-cell results presented the cellular profile of UC and indicated that the key genes were significantly associated with macrophages, epithelial cells, and fibroblasts. The single-cell results presented the cell atlas of UC and suggested that key genes were significantly associated with macrophages, epithelial cells and fibroblasts. Quantitative polymerase chain reaction (q-PCR) was used to verify the expression levels of the core biomarkers screened out by machine learning. We conducted enrichment analysis using Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and gene set enrichment analysis (GSEA), which showed biological processes and signaling pathways associated with UC. Immune cell infiltration analysis based on CIBERSORT was also performed. We also screened potential drugs from the DSigDB drug database. To evaluate their effectiveness, we performed molecular docking and dynamics simulations. The results suggested that compounds like thalidomide and troglitazone are promising candidates for new UC drug development. Our findings provide insights into the pathogenesis of UC, its clinical treatment, and potential drug development.