Role of Immune Cell-Specific Hypermethylation Signatures in Classification and Risk Stratification of Breast Cancer

Background: Epigenetic regulation, including DNA methylation, plays a major role in shaping the identity and function of immune cells. Innate and adaptive immune cells recruited into tumor tissues contribute to the formation of the tumor immune microenvironment (TIME), which is closely involved in tumor progression in breast cancer (BC). However, the specific methylation signatures of immune cells have not been thoroughly investigated yet. Additionally, it remains unknown whether immune cells-specific methylation signatures can identify subgroups and stratify the prognosis of BC patients. Methods: DNA methylation profiles of six immune cell types from eight datasets downloaded from the Gene Expression Omnibus were collected to identify immune cell-specific hypermethylation signatures (IC-SHMSs). Univariate and multivariate cox regression analyses were performed using BC data obtained from The Cancer Genome Atlas to identify the prognostic value of these IC-SHMSs. An unsupervised clustering analysis of the IC-SHMSs with prognostic value was performed to categorize BC patients into subgroups. Multiple Cox proportional hazard models were constructed to explore the role of IC-SHMSs and their relationship to clinical characteristics in the risk stratification of BC patients. Integrated discrimination improvement (IDI) was performed to determine whether the improvement of IC-SHMSs on clinical characteristics in risk stratification was statistically significant. Results: A total of 655 IC-SHMSs of six immune cell types were identified. Thirty of them had prognostic value, and 10 showed independent prognostic value. Four subgroups of BC patients, which showed significant heterogeneity in terms of survival prognosis and immune landscape, were identified. The model incorporating nine IC-SHMSs showed similar survival prediction accuracy as the clinical model incorporating age and TNM stage [3-year area under the curve (AUC): 0.793 vs. 0.785; 5-year AUC: 0.735 vs. 0.761]. Adding the IC-SHMSs to the clinical model significantly improved its prediction accuracy in risk stratification (3-year AUC: 0.897; 5-year AUC: 0.856). The results of IDI validated the statistical significance of the improvement (p < 0.05). Conclusions: Our study suggests that IC-SHMSs may serve as signatures of classification and risk stratification in BC. Our findings provide new insights into epigenetic signatures, which may help improve subgroup identification, risk stratification, and treatment management.


INTRODUCTION
Breast cancer (BC) is the most common cancer among women worldwide. Despite significant advances in locoregional therapies, endocrine therapies, chemotherapy, and molecular targeted therapy, BC remains the second leading cause of cancerrelated deaths among women (1,2). Immune evasion has recently been recognized as a hallmark of tumor progression. Tumors can induce local immune dysregulation by suppressing innate and adaptive immune responses in BC (3). The tumor immune microenvironment (TIME) is an important part of the tumor microenvironment. It is highly heterogeneous and plays an important role in tumor progression and disease prognosis in various cancers (4). Therefore, accurate classification of disease based on the TIME is crucial for the assessment of prognosis, as well as to aid in making treatment decisions.
The rapid development of high throughput technologies has enabled the identification of the transcriptomic signatures of immune cells. Several tools based on the use of information about the transcriptomic signatures of immune cells have been successfully used for the classification and risk stratification of various cancers, including BC (5-7). These tools include ESTIMATE, TIMER, and CIBERSORTx, which were created to assess the conditions prevailing in TIME, as well as indicators such as immune score and immune cell population. Unlike the transcriptomic signatures of immune cells, methylation signatures are heritable and reversible, and can be adjusted rapidly in the course of an immune response, to appropriately regulate the progression of immunity. However, the methylation signatures of immune cells have not been thoroughly investigated, and it is not clear whether they would be useful for classification and risk stratification in BC.
The primary tumor, regional lymph nodes, distant metastases (TNM) staging system, established by the American Joint Committee on Cancer (AJCC)/Union for International Cancer Control (UICC), is the most commonly used risk stratification system in BC (8). Prognostic information provided by this system, although useful, is incomplete, and many studies have shown that incorporating additional clinicopathological and molecular characteristics, such as tumor differentiation grade, non-coding RNA, mutation status, immune score and microsatellite instability, may improve the accuracy of prediction of prognosis (9)(10)(11). Due to the rapid advances that have been made in methylation sequencing technology, single-base resolution has been achieved, and a large number of methylation signatures have been discovered and defined as biomarkers of prognosis in BC (12,13). Therefore, we speculated that immune-related methylation signatures may be useful for risk stratification.
DNA methylation is a dynamic epigenetic modification, which plays a prominent role in determining cell development and lineage identity, especially in the immune system (14). During the differentiation of hematopoietic stem cells into innate and adaptive immune cells, methylation events, including hypermethylation and hypomethylation, facilitate the commitment of these cells to a lymphoid or myeloid fate, thereby establishing the identities of differentiated cell types (15). Therefore, immune cell-specific methylation signatures may be closely associated with the function of the cells in immunity. Due to the association between methylation events and immune cells, which is compounded by the association between immune cells and tumor progression, specific methylation signatures of immune cells may play an important role in classification and risk stratification of cancers. In this study, we focused on the role of immune cell-specific hypermethylation signatures (IC-SHMSs) in the classification and risk stratification of BC. Specific hypermethylation signatures of six immune cell types were separately identified by analyzing methylome data. Unsupervised hierarchical clustering analysis and Cox proportional hazard models were used to explore the roles of these signatures in the classification and risk stratification of BC patients.

Data Collection
DNA methylation profiles of immune cells from eight datasets (GSE35069, GSE83156, GSE88824, GSE61195, GSE130030, GSE130029, GSE131989, and GSE87095) based on the Illumina HumanMethylation 450 (450K) platform were downloaded from the Gene Expression Omnibus (GEO) database. Immune cell samples from patients with immune-related diseases were filtered out, leaving only normal samples, and all these immune cell samples were isolated from peripheral blood. The Cancer Genome Atlas (TCGA) level 3 gene expression data normalized by fragments per kilobase of exon per million reads mapped (FPKM), DNA methylation data, and somatic mutation data (mutect2) from BC patients, which were downloaded from the National Cancer Institute's Genomic Data Commons Portal (GDC; https://portal.gdc.cancer.gov/); matched survival and clinical information were obtained from the University of Santa Cruz (UCSC) Xena database (http://xena.ucsc.edu/). Only primary BC samples marked with barcode 01A from TCGA database were retained for our study.

Identification of the IC-SHMSs
The methylation profiles of immune cells were quality controlled and normalized separately using the R package "ChAMP" and combined after correcting for batch effects between different datasets using the ComBat method associated with the "ChAMP" package (16). Principal component analysis (PCA) was used to test the quality of immune cell samples showing methylation profiles, using the R package "FactoMineR" (17). Using the champ.DMP function in the "ChAMP" package with the cutoff adjusted to p < 0.05 and deltabeta >0.2, five differential analyses were conducted between CD8 + T cells and the other five immune cell types separately. Five sets of significant hypermethylated probes of CD8 + T were identified, and the intersection of these five sets was defined as the IC-SHMSs of CD8 + T cells. In a similar manner, the IC-SHMSs of the other five types of immune cells were also identified separately. Moreover, we performed univariate and multivariate Cox regression analyses to identify the prognostic value of these IC-SHMSs, using the survival information and methylation profiles of BC.

Identification of the Subgroups of BC Patients
To identify subgroups of BC patients, we used the list of IC-SHMSs showing prognostic value to perform unsupervised hierarchical clustering of BC patients using the R package "cluster." The distances between BC samples were calculated using the Euclidean method, and clustering was accomplished via the ward.D2 method. In addition, we utilized the Calinski-Harabasz index (CHI) to evaluate the clustering significance using the R package "fpc." CHI is the ratio of the sum of between-clusters dispersion and inter-cluster dispersion for all clusters, the higher the score, the better the performances of clustering. To validate the stability of the clustering result, we utilized the bootstrap resampling method in the R package "boot" to randomly resample from the original dataset to generate resampling datasets with the same sample size 1,000 times and calculated the CHIs, respectively, in the same way. Then, a ttest was performed to identify if the difference between the CHIs from the resampling datasets and the CHI from the original dataset is statistically significant. The Kaplan-Meier (KM) curves and log-rank tests were used to identify survival differences between the subgroups. Finally, the optimal cluster number was decided by combining the clustering results and the clinical features.

Immune Landscape of the Subgroups
To estimate the immune infiltration levels of the 28 immune cell types for each BC sample, we performed single-sample gene set enrichment analysis (ssGSEA) to derive an enrichment score for each immune cell population using the R package "GSVA" (18), and the R code and the immune cell maker list were, respectively, shown in Supplementary Tables 1, 7. The enrichment scores were normalized using the Min-Max Normalization method, which turned the scores into values ranging from 0 to 1. The marker gene set for the 28 immune cell types was obtained from a previous study (19), which included data from innate and adaptive immune cells. We used the ESTIMATE algorithm to assess the tumor microenvironment of each BC sample by calculating immune scores, stromal scores, and tumor purity based on the specific gene expression signatures of immune cells and stromal cells.

Calculation of Tumor Mutation Burden
Tumor mutation burden (TMB) is defined as the total number of non-synonymous mutations per million bases. Missense, nonsense, splice-site, and frameshift mutations were considered nonsynonymous for the purposes of this study. We calculated the TMB score using the formula: TMB score = the number of nonsynonymous mutations/length of exons (38 million), for each sample (20). Wilcoxon rank-sum tests were used for comparisons between two groups, and Kruskal-Wallis tests were used for three or more groups.

Differential Analysis and Functional Annotation of the Subgroups
To clarify the functional differences between the subgroups, we first performed differential analyses to identify differentially expressed genes (DEGs) between the subgroups, using the R package "limma" and the gene expression profiles of BC (21). The criteria for identification of DEGs was an adjusted p < 0.05 and |Log2 (Fold Change)| > 1. These DEGs were subjected to functional annotation via gene ontology (GO) enrichment analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, using the R package, "clusterProfiler" (22).

Construction and Validation of the Prognostic Model of IC-SHMS
Multiple Cox proportional hazard models were constructed with different prognostic features using the R package "survival" (23,24) and the algorithm equation used in the package was displayed in Supplementary Table 2. IC-SHMSs showing independent prognostic values were used to construct the IC-SHMS model. To investigate the efficacy and stability of the IC-SHMS model, we randomly divided BC patients into a training set and a validation set, at a ratio of 7:3. The training set was used to construct the model, while the validation set was used to test the model. Subsequently, the R package "rms" was used to determine the optimal model, in which the algorithm was used to select the optimal combination of IC-SHMSs via Akaike's Information Criterion (AIC) in a stepwise approach. The IC-SHMSs included in the final model were used to establish the risk signature, and the risk score for each patient was calculated using the function "predict" supplied in the R package "stats." Patients were divided into high and low-risk groups based on the median of risk scores. Kaplan-Meier (KM) curves and logrank tests were used to identify survival differences between the two groups. The area under the curve (AUC) of time-dependent receiver operating characteristic (ROC) curve was calculated to evaluate the prediction accuracy for 3-and 5-year overall survival (OS) using the R package "survivalROC" (25), and the equation used in the package was also shown in Supplementary Table 2. Finally, the IC-SHMS model developed on the training set was applied to the validation set in order to quantify its efficacy and stability.

Comparison of the Prognostic Value of the IC-SHMSs and Clinical Features
To compare the prognostic value of the IC-SHMSs and clinical features, we used all BC patients' survival information and different prognostic features to, respectively, construct an IC-SHMS model and a clinical model. The IC-SHMS model was constructed using the IC-SHMSs with independent prognostic values as described before, and the clinical model was constructed using the clinical features of patient age and TNM stage. Timedependent ROC analyses were then performed to estimate the survival prediction accuracy of these two models separately, using 1,000 × bootstrap resampling. Finally, to identify whether the prognoses predicted by the IC-SHMSs could significantly improve the prediction accuracy of the clinical model, we integrated the IC-SHMSs with the clinical features to construct a combined model, and calculated the integrated discrimination improvement (IDI) compared with the clinical model, using the R package "survIDINRI" (26). In constructing each optimal prognostic model, a stepwise algorithm incorporating AIC was used to select the features.

Clinical Application of the Combined Model
The features in the combined model were used to establish a risk signature, and the risk scores of BC patients were calculated for risk stratification purposes. Kaplan-Meier (KM) curves and logrank tests were used to identify differences in OS between highand low-risk groups, defined based on the median of risk scores. A nomogram with weighted scores calculated using the features included in the combined model was used to predict the 3-and 5-year OS of BC patients. Calibration curves were created to test the accuracy of the survival prediction of the combined model compared with that of an ideal model.

Statistical Analyses
All statistical analyses were conducted using R software, version 3.63 (https://www.r-project.org/). Appropriate R packages and statistical methods were selected for different analyses. The R package "survival" was used for survival analysis, and only patients whose follow-up time was >30 d were included. Wilcoxon rank-sum tests were used for mean comparisons between two groups, and Kruskal-Wallis tests were used for three or more groups. Statistical significance was set at p < 0.05.

Data Preparation
A total of 325 samples obtained from six immune cell types (CD14 + monocyte: 45; CD19 + B cell: 117; CD4 + T cell: 94; CD8 + T cell: 41; CD56 + NK cell: 14; neutrophil: 14) found in eight DNA methylation datasets were used to identify the IC-SHMSs. A methylation beta value matrix containing 774 BC samples was used to detect the prognostic value of the IC-SHMSs and to perform cluster analysis. The gene expression matrix of 1,077 BC samples was used to assess the immune landscape.
Somatic mutation data containing 985 BC samples were used to calculate the tumor mutation burden (TMB) score. The entire workflow is shown in Figure 1.

Identification of the IC-SHMSs
Following quality control and correction for batch effects in the methylation profiles, the combined methylation beta value matrix contained a total of 325 immune cell samples and 361,262 methylation probes. The PCA plot clearly discriminated between these six types of immune cells (Figure 2A). A total of 655 IC-SHMSs was identified from the six immune cell types as follows: CD14 + monocytes, 34; CD19 + B cells, 270; CD4 + T cells, 109; CD8 + T cells, 54; CD56 + NK cells, 84; and neutrophils, 104 ( Figure 2B; Supplementary Table 3). PCA analysis of immune cells was performed based on the methylation profiles of these 655 IC-SHMSs. The discrimination between immune cells was more obvious than that based on the whole methylation profiles, indicating that IC-SHMSs may accurately represent these immune cells ( Figure 2C). The distributions corresponding to the gene region and CpG island of these IC-SHMSs are shown in Figure 3, which shows that IC-SHMSs were most frequently located in the region of gene body and CpG island opensea. In addition, univariate Cox regression analyses suggested that 30/655 IC-SHMSs (CD14 + monocyte: 4; CD19 + B cell: 12; CD4 + T cell: 5; CD8 + T cell: 1; CD56 + NK cell: 5; and neutrophil: 3), age, and TNM stage may have prognostic values. Those features were used in multivariate Cox regression analyses, and the results showed that 10/30 IC-SHMSs, age, and TNM stage may have independent prognostic values ( Table 1). This observation indicated that the IC-SHMSs which showed prognostic value may affect the prognoses of BC patients by regulating immune cell function.

Identification of the Subgroups of BC Patients
The 30 IC-SHMSs showing prognostic value were used as variables to perform clustering analysis and four subgroups of BC patients were identified. Besides, the results of the CHI analyses showed the clustering has a good significance when k = 4 (CHI = 98.23; Supplementary Table 4), and the t-test result indicated the difference between the CHI distribution from the resampling datasets and the CHI value from the original dataset is not statistically significant (p = 0.1662; Supplementary Figure 1). A heatmap of the beta values of the 30 IC-SHMSs is displayed in Figure 4. It indicates significant differences in the methylation levels of IC-SHMSs between different groups. In addition, survival analysis showed that patients in group 3 and group 4 had a significantly better OS than patients in group 1 and group 2 ( Figure 5B). Therefore, our findings indicate that IC-SHMSs can be used to effectively group BC patients.

Immune Landscape and TMB Score Distribution of the Four Subgroups
The enrichment scores of the 28 immune cell types were calculated for each BC sample using the ssGSEA method and were normalized using the Min-Max Normalization method. The distribution of enrichment scores of the 28 immune cell types Frontiers in Medicine | www.frontiersin.org between the four subgroups is shown in Figure 5A. Group 1 had the lowest immune response; group 2 showed moderate immune activation and relatively high immune suppression; group 3 displayed medium immune activation and relatively low immune suppression; group 4 showed the strongest immune response. Combined with the results of survival analysis, these results indicate that the ratio of immune activating cells (activated B cells and activated CD8 T cells) to immune suppressing cells [myeloidderived suppressor cells (MDSCs) and regulatory T cells] may play an important role in tumor progression and patient survival. A box plot depicting the ratio of immune-activating cells to immune-suppressing cells of the four subgroups strongly supported this hypothesis (Figure 5C).
On the other hand, the tumor microenvironment of each BC sample was assessed using immune scores, stromal scores, and tumor purity, using the ESTIMATE algorithm. The distribution of these features between the four subgroups showed significant differences (Figure 5D). TMB scores were calculated as previously described, and outliers detected by the quartile method were deleted. The box plot showed that the TMB scores of groups 2 and 4 were significantly higher than those of groups 1 and 3 ( Figure 5D). A box plot of the expression levels of seven immune checkpoint molecules (PD-1, PD-L1, CTLA-4, TIM3, LAG3, KIR, and VISTA) also showed significant differences between the four subgroups ( Figure 5E). Thus, our findings indicate that the four subgroups displayed heterogeneity in immune landscapes and somatic mutations.

Differential Analysis and Functional Annotation of the Subgroups
Based on differences in the immune landscape between the subgroups, we performed two differential analyses: group   Figure 2). GO and KEGG enrichment analyses indicated that the difference in immune response between group 4 and group 1 may be closely associated with the biological processes of lymphocyte activation and leukocyte cellcell adhesion, and that the signaling pathways of cytokinecytokine receptor interaction, viral protein interaction with cytokines and cytokine receptors and cell adhesion molecules may play an important role in this process (Figures 6A,B;  Supplementary Table 5). On the other hand, the difference in immune response between group 3 and group 2 may be closely related to the biological processes of cornification and regulation of hormone secretion, the neuroactive ligandreceptor interaction signaling pathway, the PPAR signaling pathway and the estrogen signaling pathway, all of which may play an important role in this process (Figures 6C,D;  Supplementary Table 6). Therefore, these results may clarify the differences in immune responses that were observed between groups, as well as provide clues that are useful for personalized treatment.

Construction and Validation of the Prognostic Model of IC-SHMS
To test the efficacy of the prognostic model of IC-SHMS, 751 BC patients with follow-up times >30 d were divided into two groups, at a ratio of 7:3, as follows: a training set (n = 523); and a validation set (n = 228). The training set was used to construct an IC-SHMS model, while the validation set was used to verify the IC-SHMS model. Ten IC-SHMSs showing independent prognostic values were used to construct the Cox proportional hazard model using the training set, and 9/10 IC-SHMSs (cg08708961, cg19473529, cg24088496, cg24536818, cg17124583, cg17988310, cg10639435, cg14084689, and cg07141504) were retained in the optimal model selected by the stepwise algorithm (Supplementary Figure 3). Based on these nine risk signatures in this model, risk scores were calculated for each BC patient in the training set and validation set separately. Time-dependent ROC analyses indicated that both the training set (3-year AUC: 0.780; 5-year AUC: 0.728) and the validation set (3-year AUC: 0.786; 5-year AUC: 0.722) of this model showed similarly good performances in terms of survival prediction accuracy (Figures 7A,B). Patients with highrisk scores had poorer prognoses than those with low-risk scores in the training set, a result which was verified using the validation set (Figures 7C,D). Based on the median risk scores, patients included in the four groups (TNM stages I, II, III, and IV) could be, respectively, stratified into two groups with significantly different prognoses, indicating that higher risk scores were associated with poorer survival (Figures 7E-H). Therefore, our findings demonstrated that the prognostic model based on IC-SHMSs showed good stability and accuracy of survival prediction in BC.

Comparison of Prediction Accuracy of the IC-SHMS Model, the Clinical Model, and the Combined Model
Using the survival information and different features of 751 BC patients, three Cox proportional hazard models were constructed as follows: the 10 IC-SHMSs with independent prognostic value were used to construct an IC-SHMS model, and 9/10 IC-SHMSs were selected for the optimal model using AIC (Supplementary Figure 4A); the clinical model was constructed using the clinical features of patient age and TNM stage, both of which features were retained in the optimal model (Supplementary Figure 4B); the combined model was constructed using the 10 IC-SHMSs, age, and TNM stage. 9/10 IC-SHMSs, age, and TNM stage were retained in the optimal model selected using AIC (Figure 9A). Time-dependent ROC analyses showed that the prediction accuracy for 3-and 5-year OS obtained via the IC-SHMS model (3-year AUC: 0.793; 5-year AUC: 0.735) was similar to that of the clinical model (3-year AUC: 0.785; 5-year AUC: 0.761), and considerably better than the prediction accuracy obtained using age alone or TNM stage alone. The combined model, with added IC-SHMSs, significantly improved upon the prediction accuracy of the clinical model (3-year AUC: 0.897; 5-year AUC: 0.856); (Figures 8A,B). The IDI results indicated that the improvement that was gained in the accuracy of predicting both 3-and 5-year OS using the combined model was statistically significant when compared with the clinical model (Figures 8C,D). Therefore, our results indicated that IC-SHMSs exhibit good prognostic value and show potential for use as a supplement in the current risk stratification system.

Clinical Application of the Combined Model
The 11 features (cg08708961, cg19473529, cg24088496, cg24536818, cg17124583, cg17988310, cg10639435, cg14084689, cg07141504, age, and TNM stage) that were included in the combined model were used to establish the risk signatures, following which the risk score of each BC patient was calculated for risk stratification purposes. The hazard ratios associated with these features are shown in Figure 9A. Kaplan-Meier curves and log-rank tests showed a significant difference in OS between the high and low-risk groups as defined based on the median of risk scores (Figure 9B). A nomogram plot was used to predict the 3-and 5-year OS of BC patients, based on the weighted scores of the 11 features incorporated into the combined model ( Figure 9C). The calibration curves showed that the combined model had good performance in predicting the 3-and 5-year OS, compared with the ideal model (Supplementary Figure 5).

DISCUSSION
DNA methylation plays a central role in immune cell differentiation and function, by stabilizing and driving gene activity (27). Both innate and adaptive immune cells in blood are able to infiltrate tumor tissues to form the tumor immune microenvironment (TIME). Previous studies have shown that the TIME, which is heterogeneous and consists of a variety of different immune cells, plays an important role in tumor progression and patient survival-outcomes which are closely associated with immune evasion (28). Tools such as ESTIMATE, TIMER, and CIBERSORTx, which are based on the transcriptomic signatures of immune cells, have been created to assess the status of the TIME. The indicators generated by them have been used for classification and risk stratification in various cancers (29)(30)(31). However, few studies have either investigated the methylation signatures of immune cells or linked these signatures with classification and risk stratification in cancer patients. The current study successfully identified the immune cell-specific hypermethylation sites (IC-SHMSs) of six immune cell types separately by analyzing methylome data, and subsequently, applied these IC-SHMSs to classification and risk stratification in BC. Our findings indicated that BC patients could be classified into four heterogeneous clusters based on the 30 IC-SHMSs with prognostic value. Our results indicated that incorporating IC-SHMSs with independent prognostic value into a prognostic model with age and TNM stage may significantly improve the accuracy of prediction of prognosis in BC patients. BC is a heterogeneous disease, and has significant variability in clinical presentation, response to treatment, and survival prognosis. At an individual level, accurate classification may provide valuable predictive information and guide the selection of patients for adjuvant hormonal therapy, chemotherapy, and radiotherapy (32). The classification of BC according to the expression of estrogen receptors (ER), progesterone receptors (PR), and HER2 is currently standard practice for histopathological examination of BC patients (33). However, this method of classification provides limited information for decision-making related to immunotherapy. Therefore, the development of better predictive biomarkers may facilitate the identification of particular subsets of patients that are most likely to benefit from immunotherapy, either alone or in combination with chemotherapy or other therapies. In this study, four heterogeneous clusters based on IC-SHMSs were identified in BC. These subgroups displayed obvious differences in immune landscapes and survival prognoses. Our findings suggested that the balance between immune activation (activated B cells and activated CD8 T cells) and immune suppression (MDSC and regulatory T cells) was closely related to survival prognosis. This conclusion was consistent with those of previous studies (34,35), which showed that the ratio of CD8 + T cells to regulatory T cells plays an important role in the survival prognosis and molecular subtypes of BC patients. Furthermore, our findings may provide important indicators for personalized immunotherapy. For instance, patients in group 1, characterized by the lowest immune response, lowest expression levels of all seven immune checkpoint molecules, highest tumor purity and poor prognoses, may benefit from a combination therapy of enhanced immunity and targeted tumor cell, while patients in group 2, characterized by medium immune activation, relatively high immune suppression, relatively high expression levels of immune checkpoint molecules of CTLA-4 and LAG-3 and poor prognoses, may benefit from a combination therapy of enhanced immunity and immune checkpoint inhibitors. Many studies have shown that, compared with monotherapy, a combination of immunotherapy with conventional therapies such as chemotherapy and radiotherapy, may significantly improve cancer therapeutic efficacy, possibly leading to the development of promising methods of treatment (36)(37)(38). We propose that combining the traditional classification method of BC with our classification method would lead to BC patients being offered more accurate and personalized treatment plans. In addition to exploring the differences in immune landscapes that were observed between subgroups, we performed GO and KEGG enrichment analyses to perform functional annotation of these differences with respect to immunity. Our results suggested that the differences in immune response between groups 4 and 1, which were associated with immune activation, centered on the processes of regulating lymphocyte activation and leukocyte cellcell adhesion, as well as cytokine-cytokine receptor interaction signaling. Our results also indicated that a few hormone-related biological processes and signaling pathways, such as hormone secretion and transport, and the estrogen signaling pathway, may be closely related to the differences in immune suppression between groups 3 and 2. Although estrogen and its metabolites are known to be important to the development of BC, the association and mechanism between estrogen and immunity remains unclear (39,40). Our findings regarding the regulatory relationship between hormones and immune suppression are consistent with those of several previous studies. For instance, Svoronos et al. (41) showed that estrogens facilitated tumor progression by driving MDSC mobilization and augmenting their immunosuppressive activity. Segovia-Mendoza et al. (42) reported that functions in, and responses of, infiltrating immune cells in BC are regulated by steroid hormones and their receptors. Thus, our study identified several major biological processes, as well as key signaling pathways associated with the variability in immune responses displayed by these subgroups, thereby providing important guidelines for personalized treatment.
As understanding of the association between the TIME and tumor progression improves, immunotherapy has also rapidly advanced. To date, immunotherapies using PD-1/PD-L1 or CTLA-4 antagonistic antibodies have shown promising outcomes in various cancers such as melanoma, liver cancer and nonsmall cell lung cancer, thereby establishing immunotherapy as one of the most promising new therapeutic approaches (43)(44)(45). However, due to the heterogeneity of the TIME, and the effects of various resistance mechanisms, <25% of patients treated with immunotherapy have shown a meaningful response (46). Therefore, efforts to identify other specific, effective biomarkers that may be used for immunotherapy are important. In this study, 30 IC-SHMSs with prognostic value, closely related to the identity and/or function of immune cells and potential targets of immunotherapy, were identified. The dynamic plasticity of the epigenome makes it susceptible to therapeutic operations, and the past few years have witnessed an unprecedented development of targeted epigenetic therapies. The most clinically advanced epigenetic therapies in cancers thus far are DNA hypomethylating agents, such as DNA methyltransferase inhibitors, which restore aberrant hypermethylation patterns to the normal phenotype, and thereby provide significant therapeutic advantages compared to genetic alterations (47). Preclinical studies suggest that DNA methyltransferase inhibitors have the greatest efficacy when combined with other cancer therapies. Although epigenetic therapy is undoubtedly a potential and powerful tool that is especially associated with immunity based cancer therapy, much work is required to produce satisfactory treatments.
Currently, risk stratification of BC patients is predominantly based on clinicopathological characteristics, and the TNM risk stratification system remains the gold standard. However, an increasing number of studies have shown that many other characteristics may be used to supplement the TNM system. For instance, Mavaddat et al. (48) assessed the role of genetic variants in risk stratification of BC patients. Lai et al. (49) incorporated novel miRNAs into a prognostic model for BC patients. CD8 + T cell and NK cell infiltration has been shown to serve as an independent prognostic biomarker of BC (34,50). An increasing number of tumor-related methylation markers are being discovered in BC (51), and Tao et al. (52) has proposed a seven DNA methylation signature-based prognostic model. The current study explored the role of specific methylation signatures of immune cells in the risk stratification of BC patients. Compared with Tao's seven methylation signature prognostic model, the IC-SHMS model with nine methylation signatures of immune cells used in our study yielded a more accurate survival prediction (AUC: 0.793 vs. 0.74). Besides, adding these IC-SHMSs to the clinical model of patient age and TNM stage significantly improved the accuracy of prognosis prediction (AUC: 0.897). Although our results indicated the IC-SHMS model almost has the same predictive efficacy for prognosis as the clinical model, it doesn't exceed. We think possible reasons leading to this result include: firstly, these clinical prognostic markers are selected through long-term clinical practice, so its predictive efficacy for prognosis is very reliable; secondly, the tumor microenvironment is a complex regulatory system composed of a highly heterogeneous population of cancer cells, as well as a large variety of resident and infiltrating host cells, extracellular matrix proteins, and secreted proteins. Although TIME plays an important role in tumor progression, it is just a small part of the whole tumor microenvironment. Therefore, the predictive efficacy for prognosis is limited to a certain extent by only using the IC-SHMSs as prognostic factors and many other molecular characteristics also should be considered. Therefore, our findings suggest that IC-SHMSs may be used not only for classification but also to supplement the TNM risk stratification system.
To our knowledge, ours is the first study to correlate immune cell hypermethylation signatures with classification and risk stratification in BC. Our finding of four subgroups and 30 IC-SHMSs with prognostic value may contribute to personalized treatment and targeted therapy in BC patients, and the new prognostic model constructed in this study is expected to increase the accuracy of risk stratification, thereby contributing to decisions pertaining to clinical treatment that may lead to improved outcomes for BC patients. However, the volume of data used in this study was limited, and these findings may require more clinical data and experiments in order to be fully validated.
Our future research entails testing additional clinical data and performing additional mechanistic experiments on the signatures that have been identified.
In conclusion, our study demonstrated that IC-SHMSs are well-suited to serve as signatures of classification and risk stratification in BC. Furthermore, this study provides new insights into the use of epigenetic signatures, which may help improve subtype identification, risk stratification, and the management of treatment.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Materials.