AUTHOR=Chen Bin , Huan Lu , Lu Junyu , Yuan Jinhe TITLE=Shapley additive explanations based feature selection reveals CXCL14 as a key immune-related gene in predicting idiopathic pulmonary fibrosis JOURNAL=Frontiers in Medicine VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1608078 DOI=10.3389/fmed.2025.1608078 ISSN=2296-858X ABSTRACT=BackgroundIdiopathic pulmonary fibrosis (IPF) is a progressive lung disease marked by excessive fibrous tissue accumulation in the lung interstitium, leading to a gradual deterioration of respiratory function and significantly impairing patients’ quality of life. Despite advances in understanding its etiology and pathogenesis, the exact mechanisms remain unclear, underscoring the need for novel biomarkers and therapeutic targets.MethodsWe analyzed five publicly available datasets from the Gene Expression Omnibus (GEO), specifically “GSE15197,” “GSE53845,” “GSE135065,” “GSE185691,” and “GSE195770,” to identify gene expression changes associated with IPF. Data were annotated and normalized to minimize batch effects and technical variability. Principal Component Analysis (PCA) verified preprocessing efficacy. Differentially expressed genes (DEGs) were identified using linear modeling. Core DEGs were selected via integrative analysis across datasets.ResultsOur analysis revealed DEGs that are substantially linked to crucial biological processes such as extracellular matrix organization and immune response regulation. Integrative analysis of five GEO datasets identified CXCL14, MMP7, and MDK as core differentially expressed genes in the final predictive model. Using Least Absolute Shrinkage and Selection Operator (LASSO) regression and Random Forest, we constructed a logistic regression model with robust predictive performance, achieving an AUC of 0.92 in the training cohort and 0.89 in the validation cohort, with sensitivity of 88% and specificity of 85%. The Shapley Additive Explanations (SHAP) method identified CXCL14 (mean SHAP value = 0.38) as the most influential feature, followed by MMP7 and MDK. Functional enrichment analyses highlighted significant enrichment of TGF-β signaling, extracellular matrix organization, and chemokine signaling pathways. Immune infiltration analysis revealed positive correlations between CXCL14 expression and alveolar macrophage/activated fibroblast populations, while SHAP interaction analysis identified synergistic effects between CXCL14 and TGF-β1 in driving fibrosis.ConclusionThese findings substantiate the hypothesis that IPF pathogenesis is closely linked to extracellular matrix remodeling and immune dysregulation. This suggests that future investigations should delve deeper into the practical applications of identified biomarkers in the early diagnosis and management of IPF. Furthermore, the machine learning-based predictive model demonstrates strong clinical potential and merits further validation in prospective trials to assess its utility and therapeutic implications in real-world settings.