Immune Landscape of Invasive Ductal Carcinoma Tumor Microenvironment Identifies a Prognostic and Immunotherapeutically Relevant Gene Signature

Background: Invasive ductal carcinoma (IDC) is a clinically and molecularly distinct disease. Tumor microenvironment (TME) immune phenotypes play crucial roles in predicting clinical outcomes and therapeutic efficacy. Method: In this study, we depict the immune landscape of IDC by using transcriptome profiling and clinical characteristics retrieved from The Cancer Genome Atlas (TCGA) data portal. Immune cell infiltration was evaluated via single-sample gene set enrichment (ssGSEA) analysis and systematically correlated with genomic characteristics and clinicopathological features of IDC patients. Furthermore, an immune signature was constructed using the least absolute shrinkage and selection operator (LASSO) Cox regression algorithm. A random forest algorithm was applied to identify the most important somatic gene mutations associated with the constructed immune signature. A nomogram that integrated clinicopathological features with the immune signature to predict survival probability was constructed by multivariate Cox regression. Results: The IDC were clustered into low immune infiltration, intermediate immune infiltration, and high immune infiltration by the immune landscape. The high infiltration group had a favorable survival probability compared with that of the low infiltration group. The low-risk score subtype identified by the immune signature was characterized by T cell-mediated immune activation. Additionally, activation of the interferon-α response, interferon-γ response, and TNF-α signaling via the NFκB pathway was observed in the low-risk score subtype, which indicated T cell activation and may be responsible for significantly favorable outcomes in IDC patients. A random forest algorithm identified the most important somatic gene mutations associated with the constructed immune signature. Furthermore, a nomogram that integrated clinicopathological features with the immune signature to predict survival probability was constructed, revealing that the immune signature was an independent prognostic biomarker. Finally, the relationship of VEGFA, PD1, PDL-1, and CTLA-4 expression with the immune infiltration landscape and the immune signature was analyzed to interpret the responses of IDC patients to immunotherapy. Conclusion: Taken together, we performed a comprehensive evaluation of the immune landscape of IDC and constructed an immune signature related to the immune landscape. This analysis of TME immune infiltration landscape has shed light on how IDC respond to immunotherapy and may guide the development of novel drug combination strategies.

Method: In this study, we depict the immune landscape of IDC by using transcriptome profiling and clinical characteristics retrieved from The Cancer Genome Atlas (TCGA) data portal. Immune cell infiltration was evaluated via single-sample gene set enrichment (ssGSEA) analysis and systematically correlated with genomic characteristics and clinicopathological features of IDC patients. Furthermore, an immune signature was constructed using the least absolute shrinkage and selection operator (LASSO) Cox regression algorithm. A random forest algorithm was applied to identify the most important somatic gene mutations associated with the constructed immune signature. A nomogram that integrated clinicopathological features with the immune signature to predict survival probability was constructed by multivariate Cox regression.
Results: The IDC were clustered into low immune infiltration, intermediate immune infiltration, and high immune infiltration by the immune landscape. The high infiltration group had a favorable survival probability compared with that of the low infiltration group. The low-risk score subtype identified by the immune signature was characterized by T cell-mediated immune activation. Additionally, activation of the interferon-α response, interferon-γ response, and TNF-α signaling via the NFκB pathway was observed in the low-risk score subtype, which indicated T cell activation and may be responsible for significantly favorable outcomes in IDC patients. A random forest algorithm identified the most important somatic gene mutations associated with the constructed immune signature. Furthermore, a nomogram that integrated clinicopathological features with the immune signature to predict survival probability was constructed, revealing that INTRODUCTION Invasive ductal carcinoma (IDC) is a clinically and molecularly distinct disease. IDCs are typically of high histologic grade and high mitotic index. HER2 overexpression or amplification is detected in 20% of these tumors (1). IDC tends to metastasize to bone, liver, and lung, whereas invasive lobular carcinoma (ILC) has a higher tendency to metastasize in gastrointestinal and genital tracts, serosal cavities, and meninges (2). IDCs usually form glandular structures in contrast to the small clusters formed by ILCs. The loss of CDH1 leads to the discohesive morphology in ILCs, whereas IDCs maintain intact cell adhesion (3). Furthermore, the frequency of recurrently mutated genes and recurrent copy-number alterations often differs significantly between IDCs and ILCs (3). These features are generally associated with a poor prognosis. Taken together, these differences suggest that ILCs and IDCs are distinct cancer types and progress along different pathways.
Genetic and epigenetic changes contribute to the progression of tumor progression and recurrence in different cancer types. However, accumulated evidence indicates that the tumor microenvironment (TME) has clinicopathological significance in predicting survival outcomes and assessing therapeutic efficacy factors (4,5). TME cells constitute a vital element of cellular and non-cellular components in the tumoural niche, including extracellular matrix and cellular components, such as fibroblasts, adipose cells, immune-inflammatory cells, and neuroendocrine cells. Previous studies have revealed that immune cells in the TME modulate cancer progression and are attractive therapeutic targets (6,7). To date, the comprehensive landscape of immune cells infiltrating the TME of IDCs has not yet been elucidated. We propose that IDCs have a distinct immune landscape and that the immune landscape might lead to different prognoses and treatment responses. In this study, by applying several computational algorithms, we estimated the abundance of immune cells in the TME of IDCs and analyzed the correlation Abbreviations: TME, tumor microenvironment; IDC, invasive ductal carcinoma; TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus; ssGSEA, single-sample gene set enrichment; LASSO, least absolute shrinkage and selection operator; ILC, invasive lobular carcinoma; DEG, differentially expressed gene; WGCNA, weighted correlation network analysis; IRF4, interferon-regulatory factor 4; AICE, AP-1-IRF consensus element. of the immune landscape with genomic characteristics and pathological features of IDCs. Furthermore, we built an immune signature, which is a robust prognostic biomarker and predictive factor for the response to immunotherapy.

Data Download
TCGA RNA-seq datasets and clinical data for IDCs were downloaded by UCSC Xena browser (https://xenabrowser.net/). GSE20685 and GSE86948 were downloaded from the Gene Expression Omnibus (GEO) database.

Implementation of Single-Sample Gene Set Enrichment Analysis (ssGSEA)
We obtained the marker gene set for immune cell types from Bindea et al. (8). MDSC gene set was imported from MSIGDB gmt file from Broad institute. We used the ssGSEA program to derive the enrichment scores of each immune-related term. In brief, the infiltration levels of immune cell types were quantified by ssGSEA in the R package gsva (9). The ssGSEA applies gene signatures expressed by immune cell populations to individual cancer samples. The computational approach used in our study included immune cells types that are involved in innate immunity and adaptive immunity. Tumors with qualitatively different immune cell infiltration patterns were grouped using hierarchical agglomerative clustering (based on Euclidean distance and Ward's linkage).
The T cell infiltration score (TIS) was defined as the average of the standardized values for CD8 + T, central memory CD4 + T, effector memory CD4 + T, central memory CD8 + T, effector memory CD8 + T, Th1, Th2, Th17, and Treg cells. The obtained cytotoxic activity scores (CYT) score was calculated by the geometrical mean of PRF1 and GZMA (10). The CD8 + T/Treg ratio was the digital ratio of the ssGSEA scores for these two cell types. The correlation between risk score and immune cell ssGSEA score was calculated by Pearson correlation.

LASSO Regularization
LASSO (least absolute shrinkage and selection operator) is an important regularization in many regression analysis methods (e.g., COX regression and logistic regression) (10-12). The idea behind LASSO is that a L1-norm is used to penalize the weight of the model parameters. Assuming a model has a set of parameters, the LASSO regularization can be defined as: It can also be expressed as a constraint to the targeted objective function: An important property of the LASSO regularization term is that it can force the parameter values to be 0, thus generating a sparse parameter space, which is a desirable characteristic for feature selection. In our analysis, 19 genes which were highly associated with OS were used as the input. QRSL1, TIMM8A, IGHA1, BATF, KLRB1, SPIB, and FLT3LG were picked after the penalizing process. A risk score (RS) formula was established by including individual normalized gene expression values weighted by their LASSO Cox coefficients:

Differentially Expressed Gene (DEG) Analysis
DEG analysis was performed by the Limma package (13). The samples were separated into a high-risk score group and a lowrisk score group. An empirical Bayesian approach was applied to estimate the gene expression changes using moderated ttests. The Q-value (adjusted p-value) for multiple testing was calculated using the Benjamini-Hochberg correction. The DEGs were defined as genes with a Q-value < 0.05. The clusterProfiler R package was applied for the GO analysis (14). GSEA was applied with the GSEA software.

Co-expression Gene Network Based on RNA-seq Data
The Weighted correlation network analysis (WGCNA) was used to construct the gene co-expression network (15,16). The coexpression similarity s i, j was defined as the absolute value of the correlation coefficient between the profiles of nodes i and j: where, x i and x j are expression values of for genes i and j, and s i, j represent Pearson's correlation coefficients of genes i and j, respectively.
A weighed network adjacency was defined by raising the co-expression similarity to a power β: with β ≥ 1. We selected the power of β = 5 and scalefree R 2 = 0.95 as the soft-thresholding parameters to ensure a signed scale-free co-expression gene network. Briefly, network construction, module detection, feature selection, calculations of topological properties, data simulation, and visualization were performed. Modules were identified via hierarchical clustering of the weighting coefficient matrix. The module membership of node i in module q was defined as: where, x i is the profile of node i, and E(q) is the module eigengene (the first principal component of a given module) of module q. The module membership measure K (q) cor,i , lies in [−1, 1] and specifies how close node i is to module q, q = 1, · · ·, Q.
By evaluating the correlations between the immune infiltration status, immune signature of IDCs and the module membership of each module, a brown module was selected for further analysis.

Data Processing and Integration
The mutation datasets were download by R package TCGAbiolinks. The expression profiles of the most powerful prognostic features (QRSL1, TIMM8A, IGHA1, BATF, KLRB1, SPIB, and FLT3LG) were extracted from the whole transcriptome datasets. The immune infiltration status was calculated by the deconvolution algorithm and grouped using hierarchical agglomerative clustering. We summarized the clinic datasets, mutation datasets, expression profiles and immune infiltration status into an integrated dataset (Supplementary File 1).

Statistical Analysis
A random forest algorithm was applied to find the most important somatic mutation associated with the immune signature. Survival outcome analysis modeled the results in reference to the patient OS and RFS. P-values and Hazard ratios were obtained from univariate Cox proportional-hazards regression models using the R package survival. Multivariate Cox regression was used to calculate the coefficients in the nomogram. The nomogram was plotted by the rms package. The time-dependent AUC value was calculated by the survivalROC package.

RESULTS
Immune Phenotype Landscape in the TME of IDC Immune cell populations modulate diverse immune responses and lead to anti-tumour effects by infiltrating the IDC TME. The immune cell infiltration status was assessed by applying the ssGSEA approach to the transcriptomes of IDCs. Twentyfour immune-related terms were incorporated to deconvolve the abundance of diverse immune cell types in IDCs. The IDCs were clustered into 3 clusters (low infiltration: 208; intermediate infiltration: 430; and high infiltration: 130) in terms of immune infiltration by applying an unsupervised hierarchical clustering algorithm ( Figure 1A). By applying hierarchical cluster analysis and K-means clustering analysis, we constructed a TME cell network, depicting a comprehensive landscape of tumor-immune cell interactions and their effects on the OS of patients with IDC ( Figure 1B and Figures S1,  S2). The TME immune cells were clustered into 4 clusters, and the correlation among different immune cell types is shown in Figure 1B. The association of OS and RFS with different clusters of IDCs was analyzed by a pairwise log-rank test. The results indicated that the high infiltration group had a favorable survival probability compared with that of the low infiltration group (Figures 1C,D).

Construction of the Immune Signature
A total of 413 genes were involved in the 24 immune-related terms. We applied the univariate COX regression based on the survival datasets of patients with IDC and the expression profiles of the 413 genes. The 19 most significant genes were selected with the criteria of a p-value < 0.0005 (Figure 2A). The expression profiles of the 19 genes are shown in Figure 2B. LASSO Cox regression was performed on the 19 genes to identify the most important features in terms of predicting the survival of IDC patients (Figures 2C-E). By forcing the sum of the absolute value of the regression coefficients to be less than a fixed value, certain coefficients were reduced to exactly zero, and the most powerful prognostic features (QRSL1, TIMM8A, IGHA1, BATF, KLRB1, SPIB, and FLT3LG) were identified with relative regression coefficients. Cross-validation was applied to prevent over-fitting. A 7-gene immune signature was constructed according to the individual coefficients of the genes. Then, we calculated the risk score for each IDC patient and ranked them ( Figure 2F). Figure 2G shows the survival overview in the IDC patients. A heatmap showed that patients in the high-risk group tended to have increased QRSL1 and TIMM8A expression levels, as well as decreased expression levels of IGHA1, BATF, KLRB1, SPIB, and FLT3LG ( Figure 2H). The Kaplan-Meier curve and Cox regression suggested that patients with high risk scores had significantly worse OS and RFS than those with low risk scores (HR = 2.94, p < 0.0001 and HR = 2.28, p = 0.001, respectively) (Figures 2I,J). The effect of the seven genes on the OS and RFS of IDC patients is shown in Figures S3, S4, respectively. To confirm our findings in the IDC cohort, we validated the prognostic function of the immune signature in two independent GEO cohorts (GSE20685 and GSE86948). The risk score was calculated for each patient by using the same formula as in the IDC cohort. The GSE20685 and GSE86948 cohorts were used to predict the OS of BRCA patients based on our immune signature model. Consistent with our previous findings, the Kaplan-Meier curve suggested a significantly better overall survival in the low-risk group than in the high-risk group (Figures S5A,B).

The Low Risk Score Was Associated With Active Infiltration Status and High Cytotoxic Potential
High infiltration status showed a lower risk score than the intermediate infiltration status and low infiltration status showed ( Figure 3A). Similarly, patients with a low risk score had a higher proportion of high immune infiltration than patients with a high risk score ( Figure 3B). The presence of high immune infiltration in patients was linked to a low risk score and was associated with a favorable outcome ( Figure 3C). To compare cytotoxic function with the immune landscape and immune signature that we constructed, the associated signatures were identified for each patient. IDCs with high infiltration status and low risk score were associated with increased levels of immune activation. The TIS (p < 0.0001 and p < 0.0001, respectively) ( Figures 3D,G), interferon-γ signature (p < 0.0001 and p < 0.0001, respectively) (Figures 3E,H), and CYT (p < 0.0001 and p < 0.0001, respectively) (Figures 3F,I) were increased in the low-risk score group and high infiltration group. The ssGSEA score of DCs was higher in the low-risk score group than in the high-risk score group. The Kaplan-Meier curve showed that in the low-risk score group, the ssGSEA score of DC cells affected survival but did not affect the high-risk score group (Figures S6A-C). Furthermore, the correlation between MDSCs and risk score was analyzed ( Figure S7A). The ssGSEA score for MDSCs was positively associated with the OS of IDC patients in whole cohorts (p = 0.017) (Figure S7B). When we stratified the patients into low-risk score and high-risk score groups, the ssGSEA score of MDSCs showed opposite association with the survival of IDC patients (HR = 2.42 and 0.63, respectively) (Figures S7C,D). These data indicate that compared with highrisk score tumors, low-risk score tumors have a distinct immune phenotype, characterized by increased immune infiltration and increased levels of immune activation.

The Low-Risk Score Was Associated With Increased T Cell Infiltration
The association of risk score and immune-related cells was analyzed by Pearson correlation. Cytotoxic cells, CD8 + T cells, T cells and the 6 other most significant immune-related cell types are shown in Figure 4. A high level of correlation was found between the risk score and the T cell-mediated immune response. The ssGSEA scores of 24 immune-related terms in the low, intermediate, and high immune status and low-and highrisk score groups are shown in Figures S8A,C. The p-value and difference in the mean ssGSEA score from the high-and lowinfiltration status and low-and high-risk score groups are shown in Figures S8B,D

Functional Annotation and WGNCA of the Transcriptomes of IDC Patients
To identify the underlying biological characteristics of the constructed immune signature, DEG analysis was performed based on the high-risk score group and low-risk score group. The heatmap depicts the significant DEGs between the two groups ( Figure 5A). The GO analysis indicated that T cell activation, positive regulation of leukocyte cell-cell adhesion, and regulation of lymphocyte activation were the most significantly enriched biological processes between the high-risk score group and the low-risk score group (Figure 5B). The GSEA results showed that allograft rejection, IL-6/JAK/STAT3 signaling, the inflammatory response, interferon-α response, interferon-γ response, and TNF-α signaling via the NFκB pathway were the most predominantly upregulated pathways in the low-risk score group. In contrast, the E2F targets, G2M checkpoints, MTORC1 signaling, and protein secretion pathways were significantly   downregulated in the low-risk score group (Figures 5C,D).
To further identify the underlying biological characteristics in the immune signature, WGCNA was performed, and the correlation of risk score and immune infiltration status with module membership were analyzed. The soft threshold selection is shown in Figure S9. The module-trait heatmap illustrates that the brown module had a significant p-value with both immune signature and immune infiltration status ( Figure 5E); the coefficients were −0.64 and 0.8, respectively. The association between module membership and gene significance for each gene in the brown module is shown in Figure 5F. The genes from the brown module with a coefficient >0.5 were selected as hub genes, and GO enrichment analysis revealed that T cell activation and lymphocyte activation were the most significantly enriched biological processes, which further confirmed the results from the DEG analysis ( Figure 5G).

Mutation Load and Immune Signature
The spectrum of somatic mutations in patients with IDCs is known to be varied. We next investigated the distributions of somatic mutations and observed different patterns among IDCs in terms of total mutations. The risk score from the immune signature had a positive correlation with total mutations in IDC patients ( Figure 6A). By applying a random forest algorithm, we identified 35 highly variable mutated genes that were associated with the immune signature ( Figure 6B). TP53 was the predominant gene of the 35 identified genes.

Construction of a Nomogram to Predict Overall Survival in IDC Patients
We constructed a nomogram that integrated clinicopathological features with the immune signature to predict the survival probability of IDC patients (Figure 7A). The AUC(t) functions of the multivariable models were developed to indicate how well these features serve as prognostic markers. Compared to other features, such as signaturebased risk score, AJCC-TNM stage, and total mutation burden, the nomogram consistently showed the highest predictive power for overall survival in the follow-up period ( Figure 7B).

The Immune Signature Predicted the Immunotherapeutic Benefits in IDC Patients
VEGF-A, the main mediator in tumor angiogenesis, hinders T cell infiltration in the tumor microenvironment. Hence, we explored the correlation between VEGF-A expression and the T cell immune response in IDC tumors. Interestingly, the increased VEGFA expression significantly correlated with both decreased levels of activated CD8 + T cells and Th1 cell infiltration in the high immune infiltration tumor microenvironment but not in the low immune infiltration tumor microenvironment (Figures 8A,B). Furthermore, perforin, the molecular effector found in the granules of cytotoxic T lymphocytes and natural killer cells, also showed a negative correlation with VEGF-A expression ( Figure 8C). Finally, the positive correlation of VEGF-A and the risk score was identified ( Figure 8D). PD-1, PDL-1, and cytotoxic T lymphocyte antigen-4 (CTLA-4) are promising targets for the treatment of patients with breast and non-small cell lung cancer. PD-1, PDL-1, and CTLA-4 antibodies are undergoing studies for the treatment of breast cancer. We analyzed the correlation of PD-1, PDL-1, and CTLA-4 expression in the high-and low-infiltration groups (Figures 8E-P). The expression of PD-1, PDL-1, and CTLA-4 was more significantly correlated with CD8 + T cells, Th1 cell ssGSEA score, and perforin expression in the high-infiltration group than in the low-infiltration group. Furthermore, the mean expression of PD-1, PDL-1, and CTLA-4 was significantly increased in the highinfiltration group, indicating a potentially enhanced response to the corresponding anticancer antibody for IDCs with high immune infiltration status. In our constructed immune signature, the risk score showed a negative correlation with PD-1, PDL-1, and CTLA-4 expression, which implies a potentially enhanced effect of PD-1, PDL-1, and CTLA-4 antibodies in patients with low risk score. Lastly, we checked the correlation of the expression profiles of several immune checkpoint proteins, e.g., CD160, CD274, CD276, CTLA-4, LAG3, and PDCD1, risk score, and VEGF-A in the TCGA and GSE20685 cohorts ( Figure S10).

DISCUSSION
In this study, we depicted the immune landscape of IDC using a large cohort. The immune landscape might explain the differences in prognoses of patients with IDC and responses to PD1, PDL-1, and CTLA-4 antibodies. Based on the immune landscape, we constructed an immune signature that calculated the risk score per patient. The correlation of signature and immune landscape revealed that the T cell-mediated immune response played a crucial role in the signature. Patients with low risk scores had increased T cell infiltration scores, interferon-γ signatures, and cytotoxic activity scores, indicating active T cell immune responses and favorable survival probability. A random forest algorithm was applied to find the most important somatic mutation correlated with the immune signature. A nomogram was constructed based on the immune signature and other clinicopathological properties of IDCs. A time-dependent ROC analysis showed high accuracy of the immune signature and nomogram in terms of predicting the survival of IDC patients. Lastly, PD-1, PDL-1, and CTLA-4 expression was found to be highly associated with the risk score. The patients with low risk scores had increased expression levels of PD-1, PDL-1, and CTLA-4, indicating a potentially high response rate to PD-1, PDL-1, and CTLA-4 antibodies. In our analysis, the IDCs were clustered into three main clusters (low immune infiltration, intermediate immune infiltration, and high immune infiltration). The patients in the high-infiltration cluster had the best survival probability compared with patients in the low-and intermediate-infiltration clusters. The T cell immune response is the central event in antitumour immunity (17). T cells are divided into CD4 + (helper T cells, Th) and CD8 + (cytotoxic T cells, Tc) T cells. Their secretomes include IFN-γ, TNF-α, and IL17, which have antitumour effects. Hence, the increased T cell infiltration score, interferon-γ signature, and cytotoxic activity score may lead to an anti-tumor effect in the high-infiltration group. This finding could explain the different OS and RFS in the high-and lowinfiltration groups.
From the immune landscape in IDCs, we built an immune signature that included seven features (QRSL1, TIMM8A, IGHA1, BATF, KLRB1, SPIB, and FLT3LG). FLT3LG is a crucial cytokine that controls the development of DCs and is particularly important for CD8-positive classical DCs and their CD103-positive tissue counterparts. A clinical trial is currently underway to treat melanoma patients with a combination of immunostimulatory FLT3LG and a peptide-based vaccine targeting DCs (18). KLRB1, which encodes CD161, a surface marker on several T cell subsets and NK cells, has been found to be most frequently associated with favorable outcomes in many cancer types by enhancing innate immune characteristics (19). SPIB is a member of the ETS family and profoundly affects B cell functions. B cells that lack SPIB fail to proliferate in response to IgM cross-linking, exhibit limited capacity to respond to T-dependent antigens, and produce low levels of IgG1, IgG2a, and IgG2b (20). In addition, SPIB can activate enhancer elements in both Ig-λ and Ig-κ genes, increasing the expression of these two genes. BATF is an inhibitor of AP-1driven transcription. Recent studies have revealed that BATF can regulate positive transcriptional activity in dendritic cells, B cells and T cells (21). BATF leucine zipper motifs interact with interferon-regulatory factor 4 (IRF4) and IRF8 at AP-1-IRF consensus elements (AICEs), adding additional flexibility to the actions of IRF4 and IRF8, which were previously considered to interact with SPIB and PU.1 (22). The interaction of IRF4 and BATF in T helper 17 cells increases the production of IL-17, IL-21, IL-22, and IL-23 receptor. TIMM8A is involved in the import and insertion of hydrophobic membrane proteins from the cytoplasm to the mitochondrial inner membrane. The Bax/Bak complex mediates the release of DDP/TIMM8a and activates Drp1-mediated fission to promote mitochondrial fragmentation and subsequent elimination during programmed cell death (23). From the expression profiles of the seven genes above, we calculated the risk score for each patient and predicted the survival of IDC patients.
The risk score from the immune signature was most significantly correlated with the ssGSEA score of cytotoxic cells, CD8 T cells and T cells, indicating the important roles of the T cell immune response in the immune signature. Interestingly, DCs in the low-risk group played a more important role than DCs in the high-risk group. The increased proportion of DCs significantly correlated with favorable survival in the low-risk group but did not correlate with favorable survival of patients in the high-risk group. Th innate inflammatory cytokines, such as IL-1, IL-12, and IL-23 expressed by DCs, promote IFN-γ-secreting CD4 + T cell and cytotoxic T lymphocyte responses (24). The high proportion of DCs and T cells cooperate to achieve the antitumour effect in IDC patients with low risk scores. MDSCs were immunosuppressive population. Patients in the high-risk score group had lower infiltration status and poor survival compared with that of patients in the low-risk score group. This might explain why the patients in the high-risk score group with high MDSC score had a poor survival compared with that of patients with low MDSC score. Furthermore, the GSEA results revealed high levels of IFN-γ, TNF-α, and TNF-α secretion in the low-risk group, which contribute to the antitumour activity in IDC patients with low risk scores.
WGCNA revealed opposing directions of the risk score (cor = −0.64) and immune infiltration (cor = 0.8) with the brown module, indicating the high level of correlation of risk score (calculated by immune signature) and immune infiltration. The hub gene in the brown module plays an essential role in regulating immune infiltration. The GO analysis revealed that T cell activation was the most significantly enriched biological process, indicating that the T cell-mediated immune response is the central event in both immune infiltration and the immune signature.
The spectrum of somatic mutations varied in IDC patients. The different mutation burdens in IDCs led us to analyse whether the landscape of immune cells and the immune signature were associated with somatic mutations. The total mutations showed a positive correlation with the risk score in IDC patients. Furthermore, a random forest algorithm was performed to identify the most important variables correlated with the immune signature. TP53, SCN10A, PIK3CA, and 32 other genes were the most significant variables in the analysis. TP53 and PIK3CA mutations are the most common gene mutations in IDCs (44 and 33%, respectively). In the 35 gene variables, GATA3, a key regulator of ER activity, is a newly identified gene that is mutated in IDCs (5% in ILC vs. 13% in IDC, q = 0.03) (3). Mutations in GATA3 are more frequent in luminal A IDC and are mutually exclusive with FOXA1 events. The differential expression level and enrichment for mutations of GATA3 in IDCs and of FOXA1 in ILC indicates a preferential requirement for the distinct regulation of ER activity in ILC and IDC (3). Previous studies revealed that the GATA3 mutation correlates with increased expression, which is associated with the immune response (25,26). Our analysis further confirms the correlation of the GATA3 mutation with immune infiltration. In addition, we constructed a nomogram that integrated clinicopathological features with the immune signature to predict the survival probability of IDC patients. Compared with other clinicopathological features, the immune signature showed the best accuracy in predicting the survival of IDC patients at any time point and would therefore be helpful for the diagnosis and precise treatment of IDC patients.
There have been several studies for the treatment of breast cancer with immunotherapeutic antibodies. PD-1 is expressed by exhausted T cells. PD-1 and PD-L1 exhibit inhibitory receptorligand interactions, which are involved in the negative regulation of T cell activation and peripheral tolerance during immune responses by cancer cells. Despite demonstrated successes, only a proportion of patients benefit from PD-1 and PDL-1 antibody treatment. Hence, it is important to determine the mechanism that leads to the varied therapeutic effect of PD-1 and PDL-1 antibody treatment and thus improve individual diagnosis and precision medicine. PD-L1 expression, microsatellite instability and deficient mismatch repair are important biomarkers that predict the response to anti-PD-1/PD-L1 therapies (27)(28)(29). Among the three biomarkers, PD-L1 expression has been validated in nearly all tumor types for all approved anti-PD-1/PD-L1 therapies. In our analysis, the expression of PD1, PDL-1, and CTLA-4 was significantly increased in the high-infiltration group. Furthermore, the expression of PD1, PDL-1, and CTLA-4 had a significant correlation with CD8 + T cells, Th1 cell ssGSEA score, and perforin expression in the high-infiltration group, which provides a basis for PD-1/PD-L1 and CTLA-4 treatment. Similarly, the immune signature we constructed also indicated that high expression levels of PD1, PDL-1, and CTLA-4 correlated with low risk score. Therefore, patients with a low risk score could derive more benefit from immunotherapy than patients with a high risk score.
Some limitations should be acknowledged. First, this is a retrospective study, so the robustness of predictive value of the gene signature should be further validated in large prospective clinical trials. Second, experimental studies are required to further elucidate the biological functions underlying the gene signature in IDC.
In the current study, we performed a comprehensive evaluation of the immune landscape of IDC and constructed an immune signature related to the immune landscape. This analysis of TME immune infiltration patterns has shed light on how IDC respond to immunotherapy and may guide the development of novel drug combination strategies.

DATA AVAILABILITY
The datasets supporting the conclusions of this article are available in the Xena browser (https://xenabrowser.net/) repository.

AUTHOR CONTRIBUTIONS
XB and RS conceived and designed the experiments. XB, SX, and KZ analyzed the data. XB and YW wrote the paper. YZ, KZ, and RS reviewed the draft. All authors read and approved the final manuscript.

FUNDING
This work was partially supported by the Zhejiang Provincial Natural Science Foundation (No. LY16H020005).

ACKNOWLEDGMENTS
We would like to thank Dr. Michael Rosemann for helpful discussions and suggestions.