AUTHOR=Fang Dalang , Lin Jie , Wang Jin , Nong Qingxiao , Tao Shouwen , Lu Bimin , Yu Yanrong , Peng Hao , Tian Yingying , Su Qunying , Ma Yanfei , Huang Yuanlu TITLE=CEACAM6 as a machine learning derived immune biomarker for predicting neoadjuvant chemotherapy response in HR+/HER2− breast cancer JOURNAL=Frontiers in Immunology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2025.1662004 DOI=10.3389/fimmu.2025.1662004 ISSN=1664-3224 ABSTRACT=BackgroundHormone receptor-positive/human epidermal growth factor receptor 2-negative (HR+/HER2−) breast cancer is the most common subtype, characterized by heterogeneous neoadjuvant chemotherapy (NAC) responses and low pCR rates. Existing biomarkers have limited predictive accuracy, hindering personalized treatment. This study aimed to identify predictive biomarkers for NAC response and explore their therapeutic potential in HR+/HER2− breast cancer.MethodsWe integrated 497 HR+/HER2− samples from TCGA and 956 from nine GEO datasets (training set: n=708; test set: n=248). Differentially expressed genes (DEGs) between tumors and normal tissues (TCGA) and between pCR and residual disease (RD) groups (GEO) were identified. Overlapping DEGs were further screened using LASSO, random forest, and SVM-RFE algorithms. Predictive models were constructed with 10 machine learning algorithms and interpreted using SHAP. Gene set enrichment analysis (GSEA), CIBERSORT-based immune infiltration, and drug sensitivity prediction using oncoPredict and GDSC2 were performed. Immunohistochemistry (IHC) was conducted on paired pre/post-NAC samples (n=9). Clinical correlation was analyzed in a retrospective cohort of 106 HR+/HER2− NAC patients.ResultsThirty-eight overlapping DEGs were identified, and four key genes (CEACAM6, MELK, RARRES1, BIRC5) were selected. NeuralNet showed the best model performance (AUC=0.816). CEACAM6 was the top-ranked SHAP feature, with high expression predicting RD and was associated with poor survival (p=0.014). GSEA revealed CEACAM6-high tumors were enriched in drug resistance pathways (such as oxidative phosphorylation), while low expression correlated with immune activation. Immune analysis showed pCR tumors had more effector cells (Tfh, γδ T cells, M1 macrophages), whereas RD tumors were enriched in Tregs and resting mast cells. CEACAM6 positively correlated with Tregs and naïve CD4+ T cells, and negatively with CD8+ T cells and M1 macrophages. CEACAM6-high tumors had higher IC50 for six NAC-related drugs. IHC confirmed persistent CEACAM6 expression in RD tumors post-NAC. Clinically, pCR patients had higher lymphocyte counts and more frequent N2–N3 nodal status.ConclusionCEACAM6 is a promising predictive biomarker in HR+/HER2− breast cancer, associated with chemoresistance and immune suppression. Machine learning models integrating immune signatures and pathway features may optimize personalized NAC strategies.