Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Immunol., 27 January 2026

Sec. Cancer Immunity and Immunotherapy

Volume 17 - 2026 | https://doi.org/10.3389/fimmu.2026.1701978

This article is part of the Research TopicRevolutionizing Cancer Care: AI and Technological Advances in Breast and Gynecological OncologyView all 9 articles

Development and validation of an interpretable machine learning model identify the lactylation-related protein SUSD3 as a prognostic and therapeutic biomarker for breast cancer

  • Institute of Trauma and Metabolism, Zhengzhou Central Hospital Affiliated to Zhengzhou University, Zhengzhou, China

Background: Breast cancer is one of the most prevalent malignancies and a leading cause of cancer-related mortality among women. Lactylation, a recently recognized post-translational modification, has emerged as a significant factor in tumor biology, with increasing evidence linking it to cancer progression and immune modulation. However, the role of lactylation in tumorigenesis remains ambiguous. This raises questions about whether it serves as a primary driver or a secondary regulator during cancer development, as well as its influence on the tumor immune microenvironment and prognostic implications.

Methods: This study investigates the clinical relevance of lactylation-related genes (LRGs) in breast cancer through a comprehensive analysis of extensive genomic datasets, including single-cell RNA sequencing, bulk transcriptomic data, and spatial transcriptomics from established public databases such as TISCH, TCGA, and GEO.

Results: By using a combination of multiple machine-learning algorithms, we developed an effective lactylation-related signature that correlates with immune cell infiltration, chemokine expression, and tumor mutation burden. This signature proved useful in identifying breast cancer patients likely to respond to immunotherapy. Finally, we experimentally validated the quantified expression levels of hub genes in human breast samples and demonstrated the role of SUSD3.

Conclusion: These findings indicate that our lactylation risk model can be used to predict the malignant progression and immune evasion of breast cancer. It is expected to become a potential therapeutic target and a diagnostic marker for breast cancer. This model also provides insights into breast cancer therapy and an effective framework for developing gene screening models applicable to other diseases and pathogenic mechanisms.

Introduction

Breast cancer remains a leading cause of cancer-related deaths among women. The diverse molecular subtypes present considerable challenges for precision medicine, underscoring the need for tailored treatment approaches that account for tumor heterogeneity (1, 2). Despite the promise shown by recent advancements in immunotherapy, challenges such as drug resistance and treatment-related toxicities complicate the management of breast cancer, emphasizing the need for strategies that can distinguish patient responses to these therapies (1, 35).

Lactylation—identified by Zhao et al. in 2019—has garnered attention for its role in tumorigenesis (68). Studies indicate that lactylation is prevalent across various cancers, affecting processes like cell proliferation, metastasis, immune escape, drug resistance, and metabolic reprogramming of tumor cells (7). It has been reported that histone lactylation not only promotes proliferation, metastasis, and invasion but also contributes to targeted therapy resistance in clear cell renal cell carcinoma (ccRCC) (9), colorectal cancer (CRC) (10, 11), ocular melanoma (12), non-small cell lung cancer (13), breast cancer (14), and liver cancer (15, 16). Non-histone lactylation has also been validated to promote tumor progression and affect treatment resistance in hepatocellular carcinoma (17, 18), pancreatic ductal adenocarcinoma (19), prostate cancer (20), CRC (2123), glioblastoma (24), breast cancer (25), and gastric cancer (10, 26, 27). Moreover, lactylation is involved in the regulation of tumor metabolism by promoting the expression of enzymes related to the TCA cycle, enhancing the glucose uptake ability of tumor cells, and further exacerbating metabolic disorders within tumors (7). While the significance of lactylation in cancer is increasingly recognized, its precise role in breast cancer, whether as a major driver or a minor regulator, remains poorly understood. At present, predictive models for assessing the prognostic significance of LRGs in breast cancer are still lacking. Thus, developing effective models of lactylation could provide a promising new approach for identifying potential biomarkers for the diagnosis and treatment of breast cancer.

In this study, we integrated scRNA-seq with multi-omics approaches to elucidate the cellular landscape of breast cancer and its association with lactylation-related genes. By identifying differentially expressed genes (DEGs) across lactylation clusters, we constructed a prognostic model that demonstrates improved accuracy compared to existing frameworks by utilizing machine learning algorithms. Our analysis reveals significant correlations between lactylation signatures and immune infiltration, clinical characteristics, and patient survival outcomes through cluster analysis, enrichment analysis, and survival analysis. Additionally, we assessed the potential of these predictive models to identify breast cancer patients who are likely to benefit from immunotherapy by examining the interactions among immune components within the context of breast cancer. Furthermore, we investigated the spatial relationship between SUSD3 and fibroblasts in breast cancer patients using spatial transcriptomics. Collectively, our work provides innovative and comprehensive insights into the role of lactylation in breast cancer and its implications for personalized treatment strategies, laying the groundwork for a deeper understanding of the influence of lactylation on clinical outcomes and the tumor microenvironment.

Materials and methods

Data collection and processing

Single-cell RNA sequencing (scRNA-seq) data specific to breast cancer were sourced from the TISCH database (http://tisch.comp-genomics.org/) with the following datasets: BRCA_GSE161529, BRCA_EMTAB8107, and BRCA_GSE148673. The R package “CellChat” was utilized to analyze intercellular communication by examining ligand-receptor interactions, which allowed for the prediction of potential communication networks within the tumor microenvironment.

Bulk RNA-seq data along with clinical information were obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo/), specifically from the following datasets: GSE162228 (133 samples and 23361 genes), GSE20685 (327 samples and 23342 genes), GSE42568 (104 samples and 21989 genes), GSE58812 (107 samples and 23374 genes), and GSE88770 (117 samples and 23324 genes). Additionally, data from The Cancer Genome Atlas (TCGA) (https://www.cancer.gov/ccg/research/genome-sequencing/tcga) were included, comprising 1072 samples and 41521 genes. To facilitate comprehensive analysis, the datasets were merged, and inter-batch differences were adjusted using the R packages limma and sva. This preprocessing resulted in a final dataset of 19,144 genes across 1,845 samples.

LRGs were identified from literature (PMID:37242427, PMID:35761067, and PMID:36092712). A total of 336 lactylation-related genes were identified, of which 206 were found to overlap with the gene expression profiles present in the RNA-seq datasets utilized.

Scoring of lactylation across different cell types

To evaluate the lactylation score among different cell types, the R package GSVA (Gene Set Variation Analysis) was employed. This analysis categorized 13 distinct cell types into high- and low-score groups based on their lactylation scores.

Following this, hallmark pathways were retrieved from the MsigDB database. The GSVA package was again utilized to assess both lactylation and hallmark pathway scores, facilitating an exploration of the correlation between lactylation and hallmark pathways. The results of this analysis are visually represented in a heatmap.

Further, the CellChat package was employed to analyze receptor-ligand signaling interactions between the high- and low-lactylation score groups, providing insights into the communication dynamics within the tumor microenvironment.

Construct a patient-specific classification of breast cancer based on LPAGs

To identify significant lactylation prognosis-associated genes (LPAGs), the R packages “survival” and “survminer,” alongside univariate COX regression analysis. A total of 35 genes were identified, and the R package “ConsensusClusterPlus” was utilized to classify breast cancer patients into 3 clusters. A heatmap generated using the R package “pheatmap” visually illustrates the correlation between clinical characteristics and gene expression across the 3 clusters. To assess functional differences among the 3 clusters, the R package “GSVA” was employed alongside hallmark pathways from the MsigDB database. The results of this functional analysis were also visualized un a heatmap created using the R package “pheatmap”.

Mutation analysis of different lactylation clusters

To investigate the mutation landscape across different lactylation clusters, the R package “maftools” was utilized to generate a waterfall plot that visually represent the mutation frequencies and types in patients corresponding to each lactylation cluster.

Immune infiltration analysis

To assess immune infiltration levels across the different lactylation clusters, single-sample gene set enrichment analysis (ssGSEA) was performed. Additionally, the R package “IOBR” was utilized in conjunction with 7 established algorithms: MCPcounter, EPIC, CIBERSORT, IPS, quanTIseq, ESTIMATE, and TIMER. These tolls were employed to evaluate the proportions of various immune cell types among the lactylation clusters.

Machine learning-based integration constructs a prognostic model of DEGs

Differentially expressed genes (DEGs) were identified across 3 different lactylation clusters with a |LogFC|>0.5 and an adjusted P <0.05, resulting in a total of 5,640 DEGs after intersection. Subsequent functional analyses, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), were conducted using the R package “clusterprofiler”.

Univariate Cox regression analysis was performed on the identified 5640 DEGs using data from several cohorts, including TCGA and multiple GEO dataset (GSE20685, GSE42568, GSE58812, GSE88770, and GSE162228). The analysis, conducted with the R packages “survival” and “survminer”, identified 14 genes with significant prognostic value (p < 0.05) that were selected for further investigation.

To develop a robust predictive model, a combination of 10 classical machine learning algorithms was integrated, including: random survival forest (RSF), least absolute shrinkage and selection operator (LASSO), gradient boosting machine (GBM), survival support vector machine (Survival-SVM), supervised principal components (SuperPC), ridge regression, partial least squares regression for Cox (plsRcox), CoxBoost, Stepwise Cox, and elastic network (Enet). Among these, RSF, LASSO, CoxBoost, and Stepwise Cox effectively performed dimensionality reduction and variable screening, resulting in 99 combinations of machine learning algorithms. The TCGA dataset served as the training cohort, while the remaining five cohorts were utilized for validation purposes. Harrell’s concordance index (C-index) was calculated across all validation datasets, identifying RSF as the most effective model due to its highest average C-index.

The training and validation cohorts were categorized into low- and high-risk groups based on the median risk score derived from the model. To assess the accuracy of the lactylation-associated genes prognostic index (LAGPI) and the predictive ability of the signature biomarkers, the receiver operating characteristic (ROC) curve and area under the curve (AUC) calculation were utilized using the R package “pROC”. Additionally, time-dependent ROC curves were plotted using the R package “timeROC” to evaluate the predictive efficacy of LAGPI for survival outcomes.

The SHapley Additive exPlanations (SHAP) method was employed to visually analyze the contribution and influence of each feature in the model’s predictions.

Finally, dependency scores for genes affecting cell viability were generated using the CERES algorithm available on the DepMap website (https://depmap.org/portal/).

Drug sensitivity analysis

The R package “oncoPredict” was utilized to calculate the half maximal inhibitory concentration (IC50) of various drugs in both high- and low-risk groups.

Spatial transcriptomics analysis

Spatial transcriptomics data for SUSD3 in breast cancer were obtained from the CROST database (https://ngdc.cncb.ac.cn/crost/home) and analyzed using the R packages “Seurat” and “semla”.

Statistical analysis

All experimental results were performed with a minimum of 3 replicates. Data are presented as mean ± SD and analyzed parametrically using GraphPad Prism (San Diego, CA). Statistical significance between two groups was assessed using the unpaired Student’s t‐test. Ordinary one‐way ANOVA was employed to analyze among more than two groups. Two‐way ANOVA was utilized to assess cell proliferation at different timepoints. Differences between means with P < 0.05 were considered significant.

Results

Large-scale integration of a single-cell RNA sequencing atlas reveals cell diversity in breast tumors

To elucidate the cellular composition and heterogeneity in breast cancer, we collected and analyzed scRNA-seq data from 3 datasets of breast cancer (BRCA_GSE161529, BRCA_EMTAB8107, BRCA_GSE148673) available in the TISCH database. Cells were categorized into 13 distinct cell types based on specific gene expression patterns, which includ B cells, CD4 T cells, CD8 T cells, endothelial cells, epithelial cells, malignant cells, mast cells, monocytes/macrophages, NK cells, pericytes, plasma B cells, and prolif-T (proliferating T) cells (Figure 1A). Differential expression analysis was performed to identify differentially expressed genes for each cluster across all 13 cell types, as illustrated in Figure 1B. The heatmap plot depicted the top 3 marker genes for each cell population (Figure 1C). Functional enrichment analysis revealed significant differences among the cell types (Figures 1D, E).

Figure 1
Panel A shows a UMAP plot with cell types colored differently. Panel B features line graphs indicating log-fold changes and percentage differences across various cell types. Panel C presents a heatmap displaying gene expression levels with a color gradient from low to high. Panel D includes a KEGG pathway bubble plot indicating gene ratio and adjusted p-value across clusters. Panel E exhibits a Reactome pathway bubble plot, similarly showing gene ratios and significance levels across clusters.

Figure 1. Cell diversity and functional enrichment. (A) UMAP visualization was employed to illustrate 13 distinct cell types. (B) The FindAllMarkers function was utilized to identify differentially expressed genes across various cell types, and the top five genes with both high and low expression levels in each cell type were displayed. (C) Marker genes for each cell type were identified using the R package “COSG”, and a heatmap was generated to depict the expression levels of the top 3 marker genes. (D, E) Based on the results obtained from COSG, the top100 genes for each cell type were selected for enrichment analysis using the R package “clusterprofiler”, with the results of the KEGG (D) and Reactome I pathways presented.

A total of 336 lactylation-related genes (LRGs) were identified from published literature (PMID:37242427, 35761067, 36092712). Following the intersection of these with the scRNA-seq gene expression profile, 206 genes were selected to evaluate lactylation scores across the various cell types (Figure 2A). Cells were categorized into high- and low-score groups based on the median lactylation scores (Figure 2B). As illustrated in Figure 2B, plasma B cells exhibited the lowest activity, as shown by the UMAP plot. Furthermore, significant differences in composition and distribution among cell types were observed between the high- and low-score groups (Figures 2C, D). Specifically, CD8 T cells, endothelial cells, epithelial cells, and pericytes displayed higher levels of lactylation, whereas CD 4T cells, fibroblasts, NK cells, and plasma cells exhibited comparatively lower lactylation levels.

Figure 2
Panel A displays a box plot showing lactylation levels across various cell clusters, with variations depicted in different colors. Panel B consists of three UMAP plots displaying cell types, lactylation, and groupings of high and low-score groups. Panel C contains stacked bar charts indicating the relative cell source proportions for high and low groups. Panel D shows UMAP plots of cell types categorized into high and low groups, with distinct coloring for each type.

Figure 2. Lactylation profiling of breast cancer patients based on single cell-sequencing. (A) Each cell was scored based on the lactylation gene set using the R package “GSVA,” assessing the expression levels of the gene set across different cell types. (B) UMAP showed the lactylation gene set scores across each cell type, and cells were divided into high-score and low-score groups. (C) The proportion of cells for each cell type was compared between the high- and low-score groups based on lactylation gene set scoring. (D) A UMAP plot was generated to visualize the distribution of cell types between the high- and low-score groups.

To further elucidate the key pathways related to lactylation, each cell was scored using the R-package GSVA based on the MSigDB dataset, and the correlation between HALLMARK pathway scores and lactylation levels across various cell types was analyzed. We found that the “MYC TARGETS V”, “MTORC1 SIGNALING”, “E2F TARGETS”, and “G2M CHECKPOINT” pathways exhibited the strongest correlation across all cell types (Supplementary Figure S1).

Further analysis of cell communication revealed that the high-score group exhibited significantly greater intercellular interactions compared to the low-score group, with endothelial cells and monocytes/macrophages serving as central communicators (Supplementary Figures S2A, B). The strength of both incoming and outgoing signals was notably higher in the high-score cohort, as illustrated in Supplementary Figures S2C and S2D. The heatmaps showed the strength of incoming and outgoing signal for each signaling pathway among different cell types (Supplementary Figure S2E).

Screening and characterization of genes associated with lactylation in breast cancer

To identify genes associated with differential lactylation in breast cancer, 6 breast cancer-related datasets (GSE162228, GSE20685, GSE42568, GSE58812, GSE88770, and TCGA) were obtained. Differential analysis was conducted using the “limma” package, encompassing 1860 samples and 19144 genes. The intersection of these genes with 336 lactylation-related genes yielded a total of 206 genes. Further univariate Cox analysis identified 35 lactylation prognosis-associated genes (LPAGs) (Cox p<0.01), as shown in Supplementary Figure S3. Unsupervised consensus clustering was employed to develop a patient-specific classification of breast cancer based on the 35 LPAGs. Using the optimal classification with k = 3, breast cancer samples were clustered into 3 groups: cluster A (n = 837, 45%), cluster B (n = 582, 31.3%), and cluster C (n = 441, 23.7%) (Figure 3A). Principal component analysis clearly distinguished breast cancer samples across LPAG clusters A, B, and C (Figure 3B). Additionally, clinicopathological features and the expression levels of the 35 LPAGs were visualized in a heatmap, with most differentially expressed genes occurring in gene cluster B (Figure 3C). To uncover the biological pathways related to these 3 clusters, each cell was scored using the R package GSVA based on the MSigDB dataset, revealing that cluster C was markedly enriched in “MITOTIC SPINDLE”, “MTORC1 SIGNALING”, “UNFOLODED PROTEIN RESPONSE”, “G2M CHECKPOINT”, “E2F TARGETS”, “MYC TARGETS V1”, and “MYC TARGETS V2” (Figure 3D). Furthermore, the differential expression levels of the 35 LAPGs across the 3 clusters are presented in Figure 3E, indicating that, apart from AHNAK, ALDH1A1, DDX17, FUBP1, HNRNPA1, LAP3, LCP1, MNDA, RPL5, and ZRANB2, which were highly expressed in cluster C, the remaining genes exhibited higher expression in cluster B.

Figure 3
Panel A shows a heatmap of a consensus matrix for k equals three with blocks indicating clusters. Panel B presents a scatter plot of PCA clustering with confidence ellipses, showing distribution across principal components one and two for clusters A, B, and C. Panel C displays a heatmap of gene expression with annotations on the x-axis and gene names on the y-axis. Panel D is a heatmap of pathway analysis with hierarchical clustering details. Panel E features box plots of gene expression levels across different clusters, comparing clusters A, B, and C for various genes.

Figure 3. Consensus clustering analysis of patient-specific classification of breast cancer based on 35 LPAGs. (A) A heatmap described the unsupervised consensus clustering solution when k = 3. (B) A PCA plot showed the distribution of samples across the 3 clusters. (C) A heatmap showed the characterization of clinicopathologic features and the expression of LPAGs within the 3 lactylation clusters. (D) The GSVA comparison of biological pathways among the 3 lactylation clusters in breast cancer is presented. (E) Differences in the expression of 35 LPAGs across the various clusters are shown. Ns p>0.05, ***p<0.001.

Given the significant role of gene mutations in tumor progression, the maftools algorithm was utilized to investigate the top ten genes with the highest mutation rates across the 3 clusters. Notably, the mutation rates of TP53 (cluster B, 49%; cluster A, 19%; cluster C, 14%) and TIN (cluster B, 19%; cluster A, 15%; cluster C, 13%) exhibited higher mutation rates in cluster B than in clusters A and C (Supplementary Figures S4A–C). The differences in gene mutations between each pair of clusters are illustrated in Supplementary Figures S4D, S4E, and S4F.

Landscape of immune cell infiltration and correlation

The differences in immune cell infiltration among the 3 clusters were assessed using the ssGSEA algorithm, which provided valuable insights into the composition and functionality of immune cells. Initially, the immune infiltration scores of 28 immune cell types across the 3 clusters were compared. Cluster C exhibited the highest infiltration scores for 23 immune cell types, including activated B cell, activated CD8 T cell, activated dendritic cell, CD56bright natural killer cell, CD56 dim killer cell, central memory CD4 T cell, central memory CD8 T cell, effector memory CD4 T cell, effector memory CD 8 T cell, eosinophil, immature B cell, immature dendritic cell, macrophage, mast cell, memory B cell, myeloid derived suppressor cell, natural killer cell, nature killer T cell, plasmacytoid dendritic cell, regulatory T cell, T type 1 T helper cell, follicular helper cell, and type 2 T helper cell. In contrast, the highest infiltration scores for neutrophils and Th17 cells were observed in cluster A, while the highest scores for activated CD4 T cell and gamma delta T cell were noted in cluster B (Supplementary Figure S5A). To minimize potential bias from the analytical algorithm, further evaluations of immune infiltration levels among the 3 subtypes were conducted using 7 different immune infiltration assessment methods, with the results presented in a heatmap (Supplementary Figure S5B).

Machine learning-based integration constructs a prognostic model of DEGs

Distinct sets of DEGs were identified among the 3 subtypes, resulting in a total of 5640 DEGs (with a threshold of |logFC|>0.5 and an adjusted p<0.05). To explore the associated biological pathways, Gene Ontology (GO) functional enrichment and KEGG pathway enrichment analyses were performed based on the DEGs. The GO enrichment analysis indicated that the DEGs were related to RNA splicing, DNA-binding transcription factor binding, spindle, and chromosomal region (Supplementary Figure S6A). The KEGG analysis revealed that the DEGs were enriched in the PI3K-AKT signaling pathway, MAPK signaling pathway, cell cycle, and cellular senescence (Supplementary Figure S6B).

To assess the prognostic value of the DEGs across the 3 clusters in breast cancer, a risk model was constructed. Univariate regression analysis of all 5640 subtype-related genes revealed that 14 of these genes were significantly associated with prognosis across at least 5 breast cancer datasets (Supplementary Figure S7).

These 14 genes were subsequently subjected to various machine learning algorithms to construct a lactylation-associated genes prognostic index (LAGPI). The TCGA dataset was utilized as the training cohort, while 5 other cohorts served as validation cohorts. The optimal model, identified through the average concordance index (C-index) across the 5 cohorts, was the random survival forest (RSF) model, achieving the highest average C-index of approximately 0.696 (Figure 4A). Following this, prognostic models were developed for the identified prognosis-related genes, demonstrating robust predictive capability in the RSF model (Figure 4B). The final prognostic model comprised 13 genes, as illustrated in Figure 5B, including SUSD3, G6PD, FBXL6, GAPDHP28, NDRG1, ELOVL1, SLC52A2, HGH1, THEM6, ELF4EBP1, MRPS18A, and DHCR7.

Figure 4
A set of ten visualizations including a heatmap, line graph, bar chart, ROC curves, calibration plots, and Kaplan-Meier survival plots. Panel A depicts a heatmap with different models and datasets, color-coded by performance accuracy. Panel B includes a line graph showing error rates versus the number of trees and a bar chart of variable importance. Panels C and F show ROC curves for different models with AUC values. Panels D and G present calibration plots with observed versus predicted values. Panels E, H, I, and J display Kaplan-Meier survival curves comparing high and low scores, with statistical significance indicated by p-values.

Figure 4. Machine learning-based integration constructs a prognostic model of DEGs. (A) A total of 10 classical algorithms were combined with other algorithms to create 99 machine-learning algorithm combinations for constructing a lactylation-associated genes prognostic index (LAGPI). The C-index for each model was calculated across all validation datasets, with the model developed using the random forest (RSF) method identified as the most effective. (B) The results of the selected random forest model were displayed, and scoring models were constructed based on the importance of the variables identified. (C–E) The prognostic score was predicted using RSF within the TCGA training set. The ROC curve was generated, where the X-axis represents “1-specificity”, and the Y-axis represents “sensitivity”. Prediction for different genes yield varying “sensitivity and specificity” and “1-specificity”. Consequently, different genes can be predicted with diverse sensitivities, specificities, and areas under the ROC curves (AUC), reflecting the predictive power of the genes concerning diseases occurrence. I The time-dependent ROC curve and (F) the survival curve was also included. F-H. The prognostic score was predicted by using RSF within the merged validation cohort. (F) The ROC curve, (G) the time-dependent ROC curve, and (G) the survival curve was illustrated. (I) The survival curve was conducted using GSE7390. (J) The survival curve was conducted using GSE9893.

Figure 5
Panel A displays SHAP dependence plots for five genes, illustrating how SHAP values change with feature values. Panel B shows the distribution of SHAP values by feature, using violin plots to depict density. Panel C presents a ridge plot of multiple genes, comparing gene effects with color-coded distributions for different genes.

Figure 5. SHAP analysis identifies key DEGs affecting lactylation risk prediction models of breast cancer. (A) SHAP (SHapley Additive exPlanations) is a tool utilized to elucidate predictions made by machine learning models, aiming to assign an “importance” score to each feature, which reflects the feature’s contribution to the model’s predictions. Five of these genes were presented. (B) The distribution of SHAP values by feature was illustrated. (C) DepMap (The Cancer Dependency Map) analysis was employed to investigate the influence of genes on cell survival and function. The distribution of the effects of key genes was presented. Genes positioned further to the left exhibit a stronger inhibitory effect on cell proliferation following knockout.

To emphasize the prognostic accuracy of the model, receiver operating characteristic (ROC) analysis and time-dependent ROC analysis were performed using both the training and the merged validation cohorts. The Area Under Curve (AUC) for the TCGA group was estimated to be 0.971, 0.99, and 0.991 for 1, 3, and 5 years, respectively (Figures 4C, D). In contrast, for the merged validation cohorts, the AUC was estimated at 0.868, 0.795, and 0.804 for 1, 3, and 5 years, respectively (Figures 4F, G). Furthermore, curve analysis was conducted for both the training and merged validation cohorts, stratifying patients into high- and low-risk groups based on median scores. The survival curve for the training group indicated that the high-risk group experienced poorer survival outcomes, a finding corroborated by the merged validation cohorts (Figures 4E, H).

Additionally, we evaluated the prognostic performance of LAGPI in a completely new breast cancer dataset that was not involved in model training, demonstrating its general applicability. In this analysis, the high-risk group again exhibited poorer survival outcomes, further confirming the robustness of the LAGPI in predicting the prognosis of breast cancer patients (Figures 4I, J).

SHAP analysis identifies key DEGs affecting breast cancer prediction models

The SHAP (Shapley Additive exPlanations) analysis was conducted to evaluate the significance and effects of 13 DEGs identified through the RSF model-specifically, DHCR7, SUSD3, G6PD, FBXL6, GAPDHP28, NDRG1, ELOVL1, SLC52A2, HGH1, THEM6, ELF4EBP1, and MRPS18A-on the predictive performance of machine learning models. Visualizations of SHAP values provided further insights into the contributions of these genes. The results revealed that the top 5 influential genes were SUSD3, which emerged as the most significant, followed by DHCR7, THEM6, GAPDHP28, and SLC52A2 (Figure 5A). Additionally, Figure 5B illustrated the distribution of SHAP values across features, highlighting a strong correlation between higher expression levels of SUSD3 and positive predictions, thereby underscoring its vital role in the model’s decision-making process. Using data from the DepMap (The Cancer Dependency Map) database, we investigated gene dependency across various cell lines through the Chronos score generated by CRISPR-Cas9 technology. The findings indicated that FBXL6, ELOVL1, ELF4EBP1, and DHCR7 were essential genes for cellular function and may promote cell proliferation. In contrast, SUSD3 and G6PD showed no significant effect on cell proliferation, while THEM6 and NDRG1 were associated with the inhibition of cell proliferation (Figure 5C). In summary, the SHAP analysis confirms the substantial impact of the DEGs identified by the RSF, particularly SUSD3, on the predictive accuracy of the model, highlighting their potential as biomarkers for diagnostic and therapeutic applications.

The relationships between clinicopathological features and risk scores

Given that various clinicopathological features exert differing effects on disease prognosis, we explored the distribution of these features within the RSF model to elucidate the relationship between risk scores and clinical characteristics. The results indicated a significant correlation between risk scores and tumor progression. Notable differences in risk scores were observed across various grade levels (Supplementary Figure S8A), M category (Supplementary Figure S8B), N category (Supplementary Figure S8C), overall survival (OS) (Supplementary Figure S8D), stage level (Supplementary Figure S8E), and T category (Supplementary Figure S8F). These findings indicate that the high-risk group is characterized by a more advanced TNM stage, higher tumor grade, and poorer survival outcomes.

Subsequently, we performed univariate and multivariate Cox regression analyses to assess the predictive efficacy of the risk score in conjunction with other clinical features in breast cancer patients. The univariate Cox regression analysis revealed that risk score, grade levels, T category, N category, and M category were significantly associated with the prognosis of breast cancer patients (Supplementary Figure S9A). Furthermore, the multivariate Cox regression analysis indicated that age and risk score were independent prognostic factors for breast cancer patients (Supplementary Figure S9B).

Functional enrichment analysis of the risk model

To further investigate the underlying mechanisms contributing to the disparate outcomes observed between high-risk and low-risk groups based on the risk score, we identified DEGs between these groups and analyzed the correlation between risk scores and DEGs. The heatmap in Supplementary Figure S10A illustrates the top 50 genes that are positively correlated with risk scores, while Supplementary Figure S10B displays the top 50 genes that are negatively correlated. Subsequently, GSEA analysis was conducted to explore GO, KEGG, and Reactome terms associated with the positively correlated top 50 genes. The significantly enriched GO terms included fat cell differentiation, regulation of phosphatidylinositol 3-kinase signaling, blood vessel development, and vasculature development. The significantly enriched KEGG terms comprised cytokine-cytokine receptor interaction, regulation of lipolysis in adipocytes, Ras signaling pathway, and Th1 and Th2 cell differentiation. Additionally, the significantly enriched Reactome terms included synthesis of DNA, G1/S transition, formation of the cornified envelope, and mitochondrial translation elongation (Supplementary Figure S10C).

The correlation between risk score and stemness

Stemness refers to the capacity of normal cells to differentiate from their origin into various cell types, contributing to the development of the human organism. The gradual loss of differentiation potential and the acquisition of stem cell-like characteristics are primary factors driving tumor progression (28, 29). This study further investigates the relationship between risk score and stemness features, which influence tumor immunogenicity and susceptibility to immunotherapy. Two independent indices, mDNAsi and mRNAsi, were used to quantify the degree of differentiation and stemness in cancer cells. The findings indicated that both mDNAsi (Supplementary Figure S11A) and mRNAsi (Supplementary Figure S11B) were significantly positively correlated with the risk score.

Additionally, an analysis of 26 stem gene sets was conducted to assess the distribution of stemness scores across high-risk and low-risk groups, revealing that the high-risk group exhibited higher stemness scores (Supplementary Figure S11C).

Relationships between risk score and mutations

To investigate the differences in genomic mutation frequencies between the high- and low-risk groups, mutation landscapes were illustrated for both groups. The results highlighted the most frequently mutated genes within each group. TP53 was identified as the most commonly mutated gene in the high-risk group, while PIK3CA was the most prevalent mutation in the low-risk group. Additionally, mutations in TP53, TTN, MUC16, MUC4, and HMCN1 were found to occur significantly more frequently in the high-risk group, whereas PIK3CA mutations were less common in the high-risk group compared to the low-risk group (Supplementary Figure S11D). The differences in gene mutations between the high-risk and low-risk groups were depicted in Supplementary Figure S11E, which demonstrated a higher mutation frequency in the high-risk cohort.

The relationship between risk score and tumor microenvironment

As a crucial component of the tumor microenvironment (TME), the distribution of immune cells significantly influences immunotherapy strategies for patients with breast cancer. Therefore, we investigated the correlation between the risk model and the level of immune cell infiltration using the MCPCOUNTER, EPIC, CIBERSORT, IPS, QUANTISEQ, ESTIMATE, and TIMER algorithms. It was observed that the low-risk group showed markedly higher levels of immune cell infiltration, including B cells, CD4+ T cells, CD8+ T cells, and NK cells, in comparison to the high-risk group. The ESTIMATE algorithm was employed to assess tumor immune microenvironments, revealing that the low-risk group had higher ESTIMATE, immune, and stromal scores, along with lower tumor purity (Figure 6A). Furthermore, the relationship between the risk score and chemokine signatures was illustrated in Figure 6B. The TIDE database was utilized to predict the likelihood of immunotherapeutic response in breast cancer patients. A significant difference in the response rates to immunotherapy was observed between the two groups. The analysis indicated that the immune dysfunction score and TIDE score were higher in the low-risk group than in the high-risk group. In contrast, the high-risk group demonstrated a greater response rate to immunotherapy, indicating that this group may be less susceptible to tumor immune escape and that patients within this group may exhibit greater sensitivity to immunotherapy (Figure 6C).

Figure 6
Heatmaps and box plots comparing high-risk and low-risk groups. Image A and B show gene expression heatmaps categorized by risk group, displaying color variations. Image C includes box plots illustrating dysfunction and TIDE scores, with statistical significance marked. A bar chart shows response percentages within each risk group.

Figure 6. Immune correlation analysis of different lactylation risk scores in breast cancer. (A) The comparison of immune infiltration between the high- and low-score groups was presented. (B) The comparison of chemokine levels across the high- and low-score groups was provided. (C) The TIDE database was used to predict the TIDE score of patients and assess the efficacy of immunotherapy. The upper panel demonstrated the comparison of immune dysfunction between the high- and low-score groups. The middle panel showed the comparison of the percentage of responders to immunotherapy within the risk groups. The lower panel exhibited the distribution of TIDE scores in the high- and low-score groups.

Correlation of the risk model with chemosensitivity and immunotherapy responses

The chemotherapeutic drug responses of patients in both groups to conventional and innovative anticancer agents were assessed. The low-risk group exhibited greater sensitivity to several agents, including Doramapimod_1042, Daporinad_1248, Dactinomycin_1811, BMS-754807_2171, AZD1332_1463, Teniposide_1809, Telomerase Inhibitor IX_1930, Sabutoclax_1849, PRT062607_1631, PRIMA-1MET_1131, PCI-34051_1621, Nutlin-3a (-)_1047, Niraparib_1177, Nelarabine_1814, Mitoxantrone_1810, LY2109761_1852, JQ1_2172, JAK_8517_1739, and Elephantin_1835. In contrast, a higher responsiveness to Lapatinib_1558 was observed in the high-risk group (Figure 7). These findings indicate that the risk score may serve as a predictor of sensitivity to anticancer therapy.

Figure 7
Twenty violin plots compare drug responses between high and low risk groups. Each plot displays data distribution and median values for different drugs, highlighting significant differences where marked. Orange represents high risk; purple, low risk.

Figure 7. Drug sensitivity analysis among subgroups of a lactylation risk model for breast cancer. The IC50 values for various anticancer drugs were analyzed, revealing differences between the high- and low-score groups. Higher IC50 values indicated reduced sensitivity to treatment.

SUSD3 was significantly positively associated with fibroblasts in breast cancer

Given the crucial roles of SUSD3 in the decision-making process of the model and its predictive accuracy, which underscores its potential as a biomarker for diagnostic and therapeutic applications, we conducted an exploration of the spatial distribution of SUSD3 in breast cancer using spatial transcriptomics analysis. It was observed that SUSD3 was distinctly positively correlated with fibroblasts (Figure 8), indicating that the elevated expression of SUSD3 in fibroblasts may significantly influence the progression and prognosis of breast cancer by regulating lactylation modification.

Figure 8
Five sets of visuals depicting fibroblast cell proportions, SUSD3 expression patterns, and scatter plots. Each set is identified by codes VISDS000450 to VISDS000554, showing correlation between fibroblast proportions and SUSD3 expression with varying Spearman's rho values and p-values. Each set contains a fibroblasts cell proportion map, a SUSD3 expression pattern map, and a scatter plot with trendlines.

Figure 8. Spatial transcriptomics overview of SUSD3 in 5 breast cancer samples. The left panels showed the spatial distribution of fibroblasts across the 5 samples. The middle panels exhibited the spatial expression of SUSD3 in these samples. The right panels presented the positive correlation between SUSD3 expression and fibroblast presence across the 5 samples. .

Experimental validation of hub LRGs

Experimental validation was conducted using immunohistochemistry (IHC) on clinical breast samples to confirm the differential expression and potential roles of SUSD3, G6PD, and FBXL6 in breast cancer pathogenesis. As shown in Figure 9A, we observed significantly increased expression levels of SUSD3, G6PD, and FBXL6 in tumor tissues compared to normal tissues. Subsequently, we investigated the global lactylation levels in human breast cancer tissues relative to control tissues. We found that breast cancer tissues exhibited significantly elevated lactylation levels compared to control tissues (Figure 9B). Then we selected SUSD3 for further exploration of its influence on lactylation modification and found that SUSD3 was markedly positively correlated with the lactylation levels (Figure 9C). Finally, the results of the CCK-8, migration, wound healing, and colony formation assays demonstrated that SUSD3 significantly enhanced the proliferation and migratory abilities of breast cancer cells (Figures 9D–G).

Figure 9
Composite image containing multiple panels. Panel A: Immunohistochemistry of normal and tumor tissues showing differential staining. Panel B and C: Western blots comparing Pan Kla and β-actin levels in normal versus tumor tissues and various conditions. Panel D: Line graphs displaying cell proliferation over 72 hours for SUSD3 and controls. Panel E: Migration assay images with bar graph showing migration cell number across different conditions. Panel F: Wound healing assay images at initial and 24-hour intervals with a bar graph indicating the migration index. Panel G: Colony formation assay images with a bar graph depicting relative colony numbers.

Figure 9. Experimental validation of hub LRGs. (A) The expression levels of SUSD3, G6PD, and FBXL6 in breast cancer tissues. (B) The different lactylation levels in breast cancer tissues and control tissues (C) The lactylation levels of MCF-7 cells when SUSD3 was overexpressed or downregulated. (D) The CCK-8 assay of MCF-7 cells when SUSD3 was overexpressed or downregulated. ****p ≤ 0.0001. (E) The migrated ability of MCF-7 cells when SUSD3 was overexpressed or downregulated. *p < 05, **p ≤ 0.01. (F) The wound healing assay of MCF-7 cells when SUSD3 was overexpressed or downregulated. **p ≤ 0.01. (G) The colony formation assay of MCF-7 cells when SUSD3 was overexpressed or downregulated. ***p ≤ 0.001.

Discussion

As a crucial post-translational regulatory mechanism, lactylation has garnered significant attention due to its role in cancer progression (6, 12, 30, 31). Lactylation can occur in both histone and non-histone proteins. In histones, lactylation regulates gene transcription through epigenetic mechanisms. However, the identity of the readers for histone lactylation remains unclear, despite the identification of the corresponding writers and erasers. Most studies have focus on specific gene sets, leaving the question of whether histone lactylation positively or negatively regulates gene expression unresolved (6, 32). In contrast, research on non-histone proteins has demonstrated that lactylation can enhance or suppress their functions (23, 3335). Whether lactylation can confer entirely new functions to these proteins, however, remains unknown.

Numerous studies have elucidated the molecular mechanisms linking lactylation to cancer progression, including cell proliferation, metastasis, metabolism, and chemotherapy resistance (10, 17, 30, 36). These findings suggest that targeting lactylation could represent a promising strategy for cancer diagnostics and treatment, although many regulatory mechanisms remain to be fully understood. In comparison to other PTMs such as acetylation, phosphorylation, and ubiquitination, the specific role of lactylation in breast cancer remains largely unexplored, highlighting the need for further investigation into its therapeutic implications.

Multi-omics analysis integrates data from various genetic levels, including transcriptomics, genomics, and metabolomics, to comprehensively investigate tumor characteristics and the influences of lactylation on tumors. This approach enhances our understanding of the molecular mechanisms underlying tumors and contributes to the discovery of new biomarkers and drug targets, which may facilitate the development of predictive, preventive, and personalized medicine (37).

In the current study, distinct cell types within breast cancer were identified using scRNA-seq, highlighting the complexity of the TME. Notably, the assessment of lactylation scores across these cell types exhibited significant variability.

Furthermore, a comprehensive multi-omics analysis was conducted on publicly available data to evaluate the lactylation phenotype of breast cancer. This analysis aimed to clarify the differences in gene expression, survival prognosis, immune infiltration levels, functional enrichment, genomic mutations, stemness features, chemotherapeutic drug responses, and clinicopathological characteristics across different lactylation phenotypes. A LAGPI model was developed using 10 classical algorithms combined with 99 machine-learning algorithm combinations, with the RSF algorithm selected due to its highest C-index, demonstrating exceptional accuracy in predicting breast cancer prognosis in both training and test datasets. This underscores its potential for future clinical applications.

Based on the LAGPI, patients with breast cancer were stratified into high- and low-LAGPI groups, which were identified as independent prognostic risk factors through univariate and multivariate regression analyses. The high-risk group was associated with shorter survival times and worse prognoses, leading to adverse clinical outcomes. Subsequently, the LAGPI was integrated with multiple clinical features (TNM stage, grade, and OS) to construct a tumor predictive nomogram. The LAGPI exhibited effective prognostic prediction capabilities and clinical utility, potentially aiding in the timely identification of patients with poor prognosis for breast cancer and facilitating the formulation of early and targeted interventions to improve patient outcomes.

Immunotherapy has significantly altered the treatment landscape for breast cancer by targeting the signaling pathways that facilitate tumor evasion of immune surveillance. Nonetheless, not all patients derive sustained benefits from immunotherapy, emphasizing the necessary for precise identification of individuals who are more likely to respond to these treatments (1, 38). While it is recognized that lactate plays a role in drug resistance by influencing cellular metabolism and the acidification of TME, the specific contribution of lactylation to this process remains inadequately understood. In our study, we utilized 7 algorithms to estimate immune cell infiltration between high- and low-risk groups, revealing distinct immune responses and clinical outcomes for both groups. These results highlight the crucial role of LRGs in chemoresistance, and suggest that the LAGPI model may serve as a valuable tool for identifying breast cancer patients who are more responsive to immunotherapy, thereby enhancing personalized treatment strategies. Recent research indicated that metabolic reprogramming, including lactate accumulation, may contribute to the failure of cancer therapies; however, the underlying mechanisms require further investigation.

To improve the predictive accuracy of immunotherapy outcomes, it is crucial to incorporate multiple biomarkers, given the complexity of the immune system. TMB has been found to correlate with T cell infiltration, tumor neoantigen burden, and response to immune checkpoint inhibitors (ICIs) across various solid tumor types (39). Moreover, TMB serves as a predictive marker for immunogenicity, with elevated TMB levels leading to the production of more neoantigens, which correlates with a more favorable response to immunotherapy (40, 41). Our investigation revealed that patients in the high-risk group showed a higher number of mutations compared to those in the low-risk group; these additional mutations were linked to increased sensitivity to immunotherapy. Furthermore, we observed a negative correlation between the TIDE score and the LAGPI, signifying a reduced likelihood of tumor immune escape in the high-risk group. These findings suggest that the LAGPI may enhance the development of personalized immunotherapy strategies for breast cancer patients in the future.

Through RSF regression and SHAP analysis, SUSD3 was identified as the gene having the most significant impact on the predictive accuracy of our model. SUSD3 (Sushi Domain-Containing 3) is a cell surface protein characterized by extracellular, transmembrane, and cytoplasmic domains, with notably high expression in estrogen-sensitive tissues, particularly in breast cancer. Previous studies has reported that SUSD3 is involved in estrogen-dependent metastatic processes and serves as a potential biomarker for predicting both the occurrence and prognosis of breast cancer (42). To further investigate the spatial distribution of SUSD3 in breast cancer, we conducted spatial transcriptomics analysis, which revealed a notable positive correlation between SUSD3 expression and fibroblasts. This finding suggests that the elevated SUSD3 levels in fibroblasts may influence the progression and prognosis of breast cancer through the modulation of lactylation, a relationship that has not been previously reported.

The LAGPI performed in this study identified key immune-related features and their correlations with patient prognosis, suggesting that LRGs not only function as biomarkers in clinical settings but also play an indispensable role in immune modulation and chemoresistance. Additionally, we observed higher levels of lactylation, as well as increased expression of SUSD, G6PD, and FBXL6 in breast cancer tissues compared to normal tissues. Furthermore, our findings demonstrated that SUSD3 promotes the proliferation and migration of breast cancer cells, highlighting its potential role in tumor progression.

The lactylation-related genes selected for model construction have been associated with distinct biological processes relevant to cancer progression. For instance, G6PD modulates NADPH levels, which in turn influence cellular proliferation, survival, and stress responses; however, its enzyme activity is inhibited by lactylation (43). Additionally, lactylation of H2BK58 mediated by LDHA regulates the expression of NDRG1 (44). In turn, NDRG1 stabilizes LDHA, enhancing glycolysis, lactate accumulation, and promoting H3K18 lactylation (45). ELOVL1 may influence tumor cell proliferation by regulating the activation, proliferation, and metabolic reprogramming of CD8+ T cells (46). FBXL6 has been shown to activate aerobic glycolysis, thereby contributing to tumor malignancy (47). Zhao et al. identified the lactate-DHCR7 axis as a crucial biomarker involved in cisplatin resistance and its impact on the efficacy of immunotherapy in bladder cancer (48). Although previous studies have examined the relationship between lactylation signatures and breast cancer prognosis (4951), our analysis using the lactylation-associated gene panel (LAGPI) identified new biomarkers alongside DHCR7 and G6PD. Notably, direct evidence linking SUSD3, HGH1, SLC52A2, and THEM6 specifically to lactylation modifications in breast cancer remains lacking. Furthermore, we conducted in vitro experiments that confirmed the role of SUSD3 in promoting the malignant phenotype of breast cancer cells, providing experimental evidence to support SUSD3 as a potential therapeutic target.

However, several limitations exist in this study. First, further experimental validation, both in vitro and in vivo, is needed to provide more robust support for the conclusions drawn. Second, the reasons behind the observed phenomenon in the high-risk group- characterized by a lower TIDE score and higher TMB, yet exhibiting lower immune infiltration and poorwe prognosis compared to the low-risk group-necessitate further investigation. Finally, multicenter, large-scale studies are essential to validate the prognostic significance of the LAGPI.

In summary, we identified 14 LRGs that are correlated with the prognosis of breast cancer patients. By integrating these genes, we developed the LAGPI using multiple machine learning algorithms, which demonstrated significant predictive capabilities for distinguishing patient outcomes and response to immunotherapy. This suggests that targeting lactylation may present new therapeutic opportunities. Future research should focus on exploring the functional mechanisms of lactylation within the TME to fully assess its potential as a therapeutic target in breast cancer. We anticipate that the well-organized data presented in this study will facilitate future investigations into the underlying mechanisms of lactylation modification in breast cancer.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by Ethics Committee of Zhengzhou Central Hospital Affiliated to Zhengzhou University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

LT: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This study was supported by the Natural Science Foundation of Henan Province (232300420243), Research Fund of Zhengzhou Central Hospital (SR-0048).

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2026.1701978/full#supplementary-material

References

1. Dvir K, Giordano S, and Leone JP. Immunotherapy in breast cancer. Int J Mol Sci. (2024) 25. doi: 10.3390/ijms25147517

PubMed Abstract | Crossref Full Text | Google Scholar

2. Xiong X, Zheng LW, Ding Y, Chen YF, Cai YW, Wang LP, et al. Breast cancer: pathogenesis and treatments. Signal Transduct Target Ther. (2025) 10:49. doi: 10.1038/s41392-024-02108-4

PubMed Abstract | Crossref Full Text | Google Scholar

3. Winer EP, Lipatov O, Im SA, Goncalves A, Munoz-Couselo E, Lee KS, et al. Pembrolizumab versus investigator-choice chemotherapy for metastatic triple-negative breast cancer (KEYNOTE-119): a randomised, open-label, phase 3 trial. Lancet Oncol. (2021) 22:499–511. doi: 10.1016/S1470-2045(20)30754-3

PubMed Abstract | Crossref Full Text | Google Scholar

4. Li Y, Zhang H, Merkher Y, Chen L, Liu N, Leonov S, et al. Recent advances in therapeutic strategies for triple-negative breast cancer. J Hematol Oncol. (2022) 15:121. doi: 10.1186/s13045-022-01341-0

PubMed Abstract | Crossref Full Text | Google Scholar

5. Cortes J, Cescon DW, Rugo HS, Nowecki Z, Im SA, Yusof MM, et al. Pembrolizumab plus chemotherapy versus placebo plus chemotherapy for previously untreated locally recurrent inoperable or metastatic triple-negative breast cancer (KEYNOTE-355): a randomised, placebo-controlled, double-blind, phase 3 clinical trial. Lancet. (2020) 396:1817–28:no. doi: 10.1016/S0140-6736(20)32531-9

PubMed Abstract | Crossref Full Text | Google Scholar

6. Zhang D, Tang Z, Huang H, Zhou G, Cui C, Weng Y, et al. Metabolic regulation of gene expression by histone lactylation. Nature. (2019) 574:575–80. doi: 10.1038/s41586-019-1678-1

PubMed Abstract | Crossref Full Text | Google Scholar

7. Chen J, Huang Z, Chen Y, Tian H, Chai P, Shen Y, et al. Lactate and lactylation in cancer. Signal Transduct Target Ther. (2025) 10:38. doi: 10.1038/s41392-024-02082-x

PubMed Abstract | Crossref Full Text | Google Scholar

8. Yang Z, Yan C, Ma J, Peng P, Ren X, Cai S, et al. Lactylome analysis suggests lactylation-dependent mechanisms of metabolic adaptation in hepatocellular carcinoma. Nat Metab. (2023) 5:61–79. doi: 10.1038/s42255-022-00710-w

PubMed Abstract | Crossref Full Text | Google Scholar

9. Yang JF, Luo L, Zhao CY, Li XY, Wang ZM, Zeng ZW, et al. A positive feedback loop between inactive VHL-triggered histone lactylation and PDGFRβ Signaling drives clear cell renal cell carcinoma progression,” (in english). Int J Biol Sci. (2022) 187(10):2375-92.e33. doi: 10.7150/ijbs.73398

PubMed Abstract | Crossref Full Text | Google Scholar

10. Li W, Zhou C, Yu L, Hou Z, Liu H, Kong L, et al. Tumor-derived lactate promotes resistance to bevacizumab treatment by facilitating autophagy enhancer protein RUBCNL expression through histone H3 lysine 18 lactylation (H3K18la) in colorectal cancer. Autophagy. (2024) 20:114–30. doi: 10.1080/15548627.2023.2249762

PubMed Abstract | Crossref Full Text | Google Scholar

11. Sun X, He L, Liu H, Thorne RF, Zeng T, Liu L, et al. The diapause-like colorectal cancer cells induced by SMC4 attenuation are characterized by low proliferation and chemotherapy insensitivity. Cell Metab. (2023) 35:1563–1579.e8. doi: 10.1016/j.cmet.2023.07.005

PubMed Abstract | Crossref Full Text | Google Scholar

12. Yu J, Chai P, Xie M, Ge S, Ruan J, Fan X, et al. Histone lactylation drives oncogenesis by facilitating m(6)A reader protein YTHDF2 expression in ocular melanoma. Genome Biol. (2021) 22:85. doi: 10.1186/s13059-021-02308-z

PubMed Abstract | Crossref Full Text | Google Scholar

13. Zhang C, Zhou L, Zhang M, Du Y, Li C, Ren H, et al. H3K18 lactylation potentiates immune escape of non-small cell lung cancer. Cancer Res. (2024) 84:3589–601. doi: 10.1158/0008-5472.CAN-23-3513

PubMed Abstract | Crossref Full Text | Google Scholar

14. Liu J, Zhao L, Yan M, Jin S, Shang L, Wang J, et al. H4K79 and H4K91 histone lactylation, newly identified lactylation sites enriched in breast cancer. J Exp Clin Cancer Res. (2025) 44:252. doi: 10.1186/s13046-025-03512-6

PubMed Abstract | Crossref Full Text | Google Scholar

15. Xu HY, Li LQ, Wang SS, Wang ZJ, Qu LH, Wang CL, et al. Royal jelly acid suppresses hepatocellular carcinoma tumorigenicity by inhibiting H3 histone lactylation at H3K9la and H3K14la sites,” (in English). Phytomedicine. (2023) 118:154940. doi: 10.1016/j.phymed.2023.154940

PubMed Abstract | Crossref Full Text | Google Scholar

16. Pan L, Feng F, Wu J, Fan S, Han J, Wang S, et al. Demethylzeylasteral targets lactate by inhibiting histone lactylation to suppress the tumorigenicity of liver cancer stem cells. Pharmacol Res. (2022) 181:106270. doi: 10.1016/j.phrs.2022.106270

PubMed Abstract | Crossref Full Text | Google Scholar

17. Jin J, Bai L, Wang D, Ding W, Cao Z, Yan P, et al. SIRT3-dependent delactylation of cyclin E2 prevents hepatocellular carcinoma growth. EMBO Rep. (2023) 24:e56052. doi: 10.15252/embr.202256052

PubMed Abstract | Crossref Full Text | Google Scholar

18. Yang Z, Yan C, Ma J, Peng P, Ren X, Cai S, et al. Lactylome analysis suggests lactylation-dependent mechanisms of metabolic adaptation in hepatocellular carcinoma,” (in English). Nat Metab. (2023) 5:61–+. doi: 10.1038/s42255-022-00710-w

PubMed Abstract | Crossref Full Text | Google Scholar

19. Chen M, Cen K, Song Y, Zhang X, Liou YC, Liu P, et al. NUSAP1-LDHA-Glycolysis-Lactate feedforward loop promotes Warburg effect and metastasis in pancreatic ductal adenocarcinoma. Cancer Lett. (2023) 567:216285. doi: 10.1016/j.canlet.2023.216285

PubMed Abstract | Crossref Full Text | Google Scholar

20. Zhang XW, Li L, Liao M, Liu D, Rehman A, Liu Y, et al. Thermal proteome profiling strategy identifies CNPY3 as a cellular target of gambogic acid for inducing prostate cancer pyroptosis,” (in english). J Med Chem. (2024) 67:10005–11. doi: 10.1021/acs.jmedchem.4c00140

PubMed Abstract | Crossref Full Text | Google Scholar

21. Zong Z, Xie F, Wang S, Wu XJ, Zhang ZY, Yang B, et al. Alanyl-Trna synthetase, AARS1, is a lactate sensor and lactyltransferase that lactylates p53 and contributes to tumorigenesis,” (in English). Cell. (2024) 187. doi: 10.1016/j.cell.2024.04.002

PubMed Abstract | Crossref Full Text | Google Scholar

22. Xie B, Zhang M, Li J, Cui J, Zhang P, Liu F, et al. KAT8-catalyzed lactylation promotes Eef1A2-mediated protein synthesis and colorectal carcinogenesis. Proc Natl Acad Sci U.S.A. (2024) 121:e2314128121. doi: 10.1073/pnas.2314128121

PubMed Abstract | Crossref Full Text | Google Scholar

23. Chen Y, Wu J, Zhai L, Zhang T, Yin H, Gao H, et al. Metabolic regulation of homologous recombination repair by MRE11 lactylation. Cell. (2024) 187:294–311.e21. doi: 10.1016/j.cell.2023.11.022

PubMed Abstract | Crossref Full Text | Google Scholar

24. Sun P, Ma L, and Lu Z. Lactylation: Linking the Warburg effect to DNA damage repair. Cell Metab. (2024) 36:1637–9. doi: 10.1016/j.cmet.2024.06.015

PubMed Abstract | Crossref Full Text | Google Scholar

25. Cui ZL, Zheng CQ, Lin YY, Xiao ZZ, Li YH, Peng W, et al. Lactate dehydrogenase C4 accelerates triple-negative breast cancer progression by promoting acetyl-coA acyltransferase 2 lactylation to increase free fatty acid accumulation. Adv Sci. (2025) 12(40):e11849. doi: 10.1002/advs.202511849

PubMed Abstract | Crossref Full Text | Google Scholar

26. Sun LH, Zhang Y, Yang BY, Sun SJ, Zhang PS, Luo Z, et al. Lactylation of METTL16 promotes cuproptosis via mA-modification on Mrna in gastric cancer,” (in English). Nat Commun. (2023) 14(1):6523. doi: 10.1038/s41467-023-42025-8

PubMed Abstract | Crossref Full Text | Google Scholar

27. Duan Y, Zhan H, Wang Q, Li B, Gao H, Liu D, et al. Integrated lactylome characterization reveals the molecular dynamics of protein regulation in gastrointestinal cancers. Adv Sci (Weinh). (2024) 11:e2400227. doi: 10.1002/advs.202400227

PubMed Abstract | Crossref Full Text | Google Scholar

28. Prasetyanti PR and Medema JP. Intra-tumor heterogeneity from a cancer stem cell perspective. Mol Cancer. (2017) 16:41. doi: 10.1186/s12943-017-0600-4

PubMed Abstract | Crossref Full Text | Google Scholar

29. Seguin L, Desgrosellier JS, Weis SM, and Cheresh DA. Integrins and cancer: regulators of cancer stemness, metastasis, and drug resistance. Trends Cell Biol. (2015) 25:234–40. doi: 10.1016/j.tcb.2014.12.006

PubMed Abstract | Crossref Full Text | Google Scholar

30. Li H, Sun L, Gao P, and Hu H. Lactylation in cancer: Current understanding and challenges. Cancer Cell. (2024) 42:1803–7. doi: 10.1016/j.ccell.2024.09.006

PubMed Abstract | Crossref Full Text | Google Scholar

31. Pan RY, He L, Zhang J, Liu X, Liao Y, Gao J, et al. Positive feedback regulation of microglial glucose metabolism by histone H4 lysine 12 lactylation in Alzheimer’s disease. Cell Metab. (2022) 34:634–648.e6. doi: 10.1016/j.cmet.2022.02.013

PubMed Abstract | Crossref Full Text | Google Scholar

32. Moreno-Yruela C, Zhang D, Wei W, Baek M, Liu W, Gao J, et al. Class I histone deacetylases (HDAC1-3) are histone lysine delactylases. Sci Adv. (2022) 8:eabi6696. doi: 10.1126/sciadv.abi6696

PubMed Abstract | Crossref Full Text | Google Scholar

33. Ju J, et al. The alanyl-Trna synthetase AARS1 moonlights as a lactyltransferase to promote YAP signaling in gastric cancer. J Clin Invest. (2024) 134:no. doi: 10.1172/JCI174587

PubMed Abstract | Crossref Full Text | Google Scholar

34. Chen H, Li Y, Li H, Chen X, Fu H, Mao D, et al. NBS1 lactylation is required for efficient DNA repair and chemotherapy resistance. Nature. (2024) 631:663–9. doi: 10.1038/s41586-024-07620-9

PubMed Abstract | Crossref Full Text | Google Scholar

35. Zong Z, Xie F, Wang S, Wu X, Zhang Z, Yang B, et al. Alanyl-Trna synthetase, AARS1, is a lactate sensor and lactyltransferase that lactylates p53 and contributes to tumorigenesis. Cell. (2024) 187:2375–2392.e33. doi: 10.1016/j.cell.2024.04.002

PubMed Abstract | Crossref Full Text | Google Scholar

36. Zhou J, Xu W, Wu Y, Wang M, Zhang N, Wang L, et al. GPR37 promotes colorectal cancer liver metastases by enhancing the glycolysis and histone lactylation via Hippo pathway. Oncogene. (2023) 42:3319–30. doi: 10.1038/s41388-023-02841-0

PubMed Abstract | Crossref Full Text | Google Scholar

37. Lu M and Zhan X. The crucial role of multiomic approach in cancer research and clinically relevant outcomes. EPMA J. (2018) 9:77–102. doi: 10.1007/s13167-018-0128-8

PubMed Abstract | Crossref Full Text | Google Scholar

38. Ye F, Dewanjee S, Li Y, Jha NK, Chen ZS, Kumar A, et al. Advancements in clinical aspects of targeted therapy and immunotherapy in breast cancer. Mol Cancer. (2023) 22:105. doi: 10.1186/s12943-023-01805-y

PubMed Abstract | Crossref Full Text | Google Scholar

39. Barroso-Sousa R, Pacifico JP, Sammons S, and Tolaney SM. Tumor mutational burden in breast cancer: current evidence, challenges, and opportunities. Cancers (Basel). (2023) 15:no. doi: 10.3390/cancers15153997

PubMed Abstract | Crossref Full Text | Google Scholar

40. Esteva FJ, Hubbard-Lucey VM, Tang J, and Pusztai L. Immunotherapy and targeted therapy combinations in metastatic breast cancer. Lancet Oncol. (2019) 20:e175–86. doi: 10.1016/S1470-2045(19)30026-9

PubMed Abstract | Crossref Full Text | Google Scholar

41. Jardim DL, Goodman A, de Melo Gagliato D, and Kurzrock R. The challenges of tumor mutational burden as an immunotherapy biomarker. Cancer Cell. (2021) 39:154–73. doi: 10.1016/j.ccell.2020.10.001

PubMed Abstract | Crossref Full Text | Google Scholar

42. Moy I, Todorovic V, Dubash AD, Coon JS, Parker JB, Buranapramest M, et al. Estrogen-dependent sushi domain containing 3 regulates cytoskeleton organization and migration in breast cancer cells. Oncogene. (2015) 34:323–33. doi: 10.1038/onc.2013.553

PubMed Abstract | Crossref Full Text | Google Scholar

43. Meng QF, Zhang YH, Sun HH, Yang XZ, Hao SM, Liu B, et al. Human papillomavirus-16 E6 activates the pentose phosphate pathway to promote cervical cancer cell proliferation by inhibiting G6PD lactylation,” (in English). Redox Biol. (2024) 71:103108. doi: 10.1016/j.redox.2024.103108

PubMed Abstract | Crossref Full Text | Google Scholar

44. Li L, Dong JY, Xu CW, and Wang SQ. Lactate drives senescence-resistant lineages in hepatocellular carcinoma via histone H2B lactylation of NDRG1,” (in English). Cancer Lett. (2025) 616:217567. doi: 10.1016/j.canlet.2025.217567

PubMed Abstract | Crossref Full Text | Google Scholar

45. Wu GJ, Cheng HX, Yin JC, Zheng YS, Shi HC, Pan BY, et al. NDRG1-driven lactate accumulation promotes lung adenocarcinoma progression through the induction of an immunosuppressive microenvironment,” (in english). Adv Sci. (2025) 12(33):e01238. doi: 10.1002/advs.202501238

PubMed Abstract | Crossref Full Text | Google Scholar

46. Pretto S, Yu Q, Bourdely P, Trusso Cafarello S, Van Acker HH, Verelst J, et al. A functional single-cell metabolic survey identifies as a target to enhance CD8 T cell fitness in solid tumours,” (in English). Nat Metab. (2025) 7(3):508–30. doi: 10.1038/s42255-025-01233-w

PubMed Abstract | Crossref Full Text | Google Scholar

47. Wu Y, Pan J, Wei J, Wang Z, Chen Q, Li M, et al. Elevated FBXL6 activates ATAD3A through K63-linked polyubiquitination and promotes the Malignant progression of TNBC via metabolic reprogramming,” (in English). Int J Biol Macromol. (2025) 329(Pt 2):147713. doi: 10.1016/j.ijbiomac.2025.147713

PubMed Abstract | Crossref Full Text | Google Scholar

48. Zhao YQ, Xing Z, Zhao YQ, Xu HZ, Liu RL, Yang TJ, et al. Lactylation prognostic signature identifies DHCR7 as a modulator of chemoresistance and immunotherapy efficacy in bladder cancer,” (in English). Front Immunol. (2025) 16:1585727. doi: 10.3389/fimmu.2025.1585727

PubMed Abstract | Crossref Full Text | Google Scholar

49. Yang Q, Cai XF, Tang HC, Guo WH, Yu JB, Zhong MC, et al. Integrative single-cell and bulk RNA sequencing of lactate metabolism identifies as a prognostic biomarker in breast cancer,” (in English). Int J Biol Macromol. (2025) 329(Pt 2):147910. doi: 10.1016/j.ijbiomac.2025.147910

PubMed Abstract | Crossref Full Text | Google Scholar

50. Min SM, Zhang XN, Liu YL, Wang WQ, Guan JW, Chen YY, et al. Personalized treatment decision-making using a machine learning-derived lactylation signature for breast cancer prognosis,” (in English). Front Immunol. (2025) 16:1540018. doi: 10.3389/fimmu.2025.1540018

PubMed Abstract | Crossref Full Text | Google Scholar

51. Jiao YC, Ji FQ, Hou L, Lv YG, and Zhang JL. Lactylation-related gene signature for prognostic prediction and immune infiltration analysis in breast cancer,” (in English). Heliyon. (2024) 10:e24777. doi: 10.1016/j.heliyon.2024.e24777

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: breast cancer, lactylation, single-cell RNA sequencing, spatial transcriptomics, machine learning, SHAP

Citation: Tang L (2026) Development and validation of an interpretable machine learning model identify the lactylation-related protein SUSD3 as a prognostic and therapeutic biomarker for breast cancer. Front. Immunol. 17:1701978. doi: 10.3389/fimmu.2026.1701978

Received: 09 September 2025; Accepted: 05 January 2026; Revised: 31 October 2025;
Published: 27 January 2026.

Edited by:

Redhwan Ahmed Al-Naggar, National University of Malaysia, Malaysia

Reviewed by:

Yulou Luo, Xinjiang Medical University, China
Salvatore Cortellino, SSM Scuola Superiore Meridionale, Italy

Copyright © 2026 Tang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lina Tang, dGFuZ2xpbmFAenp1LmVkdS5jbg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.