A large-scale screening and functional sorting of tumour microenvironment prognostic genes for breast cancer patients

Purpose The aim of this study was to systematically establish a comprehensive tumour microenvironment (TME)-relevant prognostic gene and target miRNA network for breast cancer patients. Methods Based on a large-scale screening of TME-relevant prognostic genes (760 genes) for breast cancer patients, the prognostic model was established. The primary TME prognostic genes were selected from the constructing database and verified in the testing database. The internal relationships between the potential TME prognostic genes and the prognosis of breast cancer patients were explored in depth. The associated miRNAs for the TME prognostic genes were generated, and the functions of each primary TME member were investigated in the breast cancer cell line. Results Compared with sibling controls, breast cancer patients showed 55 differentially expressed TME prognostic genes, of which 31 were considered as protective genes, while the remaining 24 genes were considered as risk genes. According to the lambda values of the LASSO Cox analysis, the 15 potential TME prognostic genes were as follows: ENPEP, CCDC102B, FEZ1, NOS2, SCG2, RPLP2, RELB, RGS3, EMP1, PDLIM4, EPHA3, PCDH9, VIM, GFI1, and IRF1. Among these, there was a remarkable linear internal relationship for CCDC102B but non-linear relationships for others with breast cancer patient prognosis. Using the siRNA technique, we silenced the expression of each TME prognostic gene. Seven of the 15 TME prognostic genes (NOS2, SCG2, RGS3, EMP1, PDLIM4, PCDH9, and GFI1) were involved in enhancing cell proliferation, destroying cell apoptosis, promoting cell invasion, or migration in breast cancer. Six of them (CCDC102B, RPLP2, RELB, EPHA3, VIM, and IRF1) were favourable for maintaining cell invasion or migration. Only two of them (ENPEP and FEZ1) were favourable for the processes of cell proliferation and apoptosis. Conclusions This integrated study hypothesised an innovative TME-associated genetic functional network for breast cancer patients. The external relationships between these TME prognostic genes and the disease were measured. Meanwhile, the internal molecular mechanisms were also investigated.


Introduction
Breast cancer is a rapidly growing public health problem of global concern (1). In developed countries, breast cancer has become the second most common cause of cancer death in women and is also the leading cause of cancer death in women in low-and middle-income countries (2). Several elements have been shown to be closely associated with breast cancer, such as hormonerelated elements, pregnancy-related elements, anthropometric index elements, physical condition, dietary elements, and environmental exposures (1). Clinically, breast cancer patients can be classified into different subtypes. To date, classifications have been based on different expression patterns of progesterone receptor (PR), estrogen receptor (ER), and human epidermal growth factor receptor-2 (HER-2). Traditional classification does not fully reflect the heterogeneity of breast cancer. For this purpose, breast cancer could also be divided into five different subtypes according to genetic profiling, which represent the heterogeneity of breast cancer (3). Clinically, the recommended diagnostic methods for breast cancer patients are MRI, mammography, ultrasonography, and PET (2). Among them, the recommended technique for the diagnosis of breast cancer is mammography, which, to date, represents the gold standard screening method for breast cancer patients (4). However, approximately 20% of breast cancers are missed by mammography (5). When mammography fails to detect breast cancer, ultrasound and magnetic resonance imaging can be used to detect breast cancer (3).
Tumour progression has long been thought to be associated with epigenetic changes in tumour cells (6). There is increasing evidence that the cells and matrix components surrounding tumour cells also play an indispensable role in the tumour process. Together, these components constitute the tumour microenvironment (TME) (7). The TME consists of various TME-related genes localised or secreted by immune cells and stromal cells, which play a critical role in tumour proliferation and metastasis (8). For example, some immune cells in the TME, such as M1 macrophages, have a significant effect on tumour inhibition by activating Th1 responses. At the same time, other cells such as tumour-associated endothelial cells, cancer-associated fibroblasts, and M2 macrophages in the TME can promote tumour growth and proliferation (9) (10). In addition, localised stromal cells, immune cells, and tumour cells in the TME can interact with each other through cytokines, which could effectively accelerate tumour growth and proliferation. For example, integrins and soluble factors (e.g., IL-6, SDF1, and HGF) could mediate the interaction between tumour cells and the stroma. Signalling pathways involving MAPK, PI3K/Akt, ERK1/2, and STAT have been shown to be highly activated in tumour cells. On the other hand, the activities of anti-apoptotic proteins (e.g., Bcl-2 and Bcl-XL) could be enhanced in the TME, which then initiate cancer development (11).
The miRNAs (short for microRNAs) are a super-family of small non-coding RNAs, most of which are approximately 21 nucleotides in length. Although these miRNAs cannot be translated, they can regulate transcription by binding to the 5' or 3' non-coding regions of mRNA (12). The miRNAs play an important role in tumour development and are thought to be key factors in TME. The miRNAs can affect tumour progression by regulating the cell cycle, manipulating programmed cell death, controlling cell invasion, and targeting angiogenesis (13).
Since TME has received more and more attention in cancer research, we have thoroughly analysed the expression of 760 TMErelevant prognostic genes and their prognostic values in breast cancer patients. In addition to the broad screening, the internal functions of each primary TME prognostic gene for breast cancer prognosis were also thoroughly investigated in this integrated study.

Data source
We used two independent data sources in this study. The breast cancer patients from the TCGA-BRCA database were considered as a constructing database. At the same time, the breast cancer patients from the GSE162228 chip were treated as a testing database.
For the constructing database, we downloaded the gene information profile and corresponding clinical information of breast cancer patients from The Cancer Genome Atlas (TCGA, https://tcgadata.nci.nih.gov/tcga/) database, which contains 1,082 breast cancer patients with complete survival information. In addition, for the testing database, we downloaded a dataset from the GEO database (GEO, https://www.ncbi.nlm.nih.gov/geoprofiles/?term=GSE162228), which contains 109 breast cancer patients with complete survival information. The genetic information of the breast cancer patients in the GEO dataset was analysed using the Affymetrix Human Genome U133A Array platform.
The final differentially expressed TME-relevant prognostic genes for breast cancer patients were selected and verified in two steps: first, a single-factor Cox regression analysis for breast cancer prognosis was developed to analyse the expression values of each TME-relevant prognostic gene in breast cancer samples, with a threshold of p< 0.05; second, bootstrapping was performed to test the genes that passed the initial filtering for robustness as follows: 70% of patients randomly selected from the cohort were tested for the survival impact of their genes.

LASSO Cox regression analysis
To avoid over-fitting the model, redundant prognosis-related molecules were removed from the dimension using LASSO Cox regression analysis. The model's penalty parameter (lambda value) was also determined using 10-fold cross-validation, with the smallest lambda value selected to remove redundant prognosisrelated molecules.
For this purpose, the glmnet function package of the R language was performed for LASSO Cox regression analysis to further infer the TME prognostic genes related to breast cancer prognosis. The following formula was established to calculate the risk score for each individual breast cancer sample: Coef i * X i In this formula, Coefi represents the risk coefficient of each factor estimated by the LASSO-Cox model. X i indicates the expression activity of each TME prognostic gene. The breast cancer patients of the constructing database and the testing database could be subdivided into a high-risk group and a lowrisk group based on the median of the corresponding calculated risk score.

Survival analysis
The survival and survminer packages in the R language were used to analyse the survival of each patient. The package was based on the Kaplan-Meier method (26).

Cell culturing
The mda-MB-453 cell line, purchased from ATCC Co., was selected as a breast cancer cell line in this study and cultured in L15 culturing medium with 10% fetal bovine serum plus 1% P/S.

Luciferase reporter assay
The 3' non-coding region of each TME gene was synthesised and inserted into the XhoI and NotI sites of the pCheck2 reporter luciferase vector downstream of the luciferase gene. The wild type (WT) or mutant plasmid and negative control or miRNA were transfected together into mda-MB-453 cells. Final luciferase analysis was performed using a dual luciferase reporter analysis system (Promega, USA).
The siRNA-transfected cells were harvested and the proliferation ability was measured using the CCK-8 method (Fisher, China). At the same time, the apoptosis ability of the cells was examined by the method of flow cytometry after Annexin V FITC/PI double staining. For transwell migration assays, 2.5×10 4 cells were seeded in the appropriate serum-free medium into the pre-coated upper chamber. The 500-ml complete medium was used as chemoattractant in the lower chambers. The incubation time was set at 48 h, after which cells without migration or invasion were removed. The final number of migrated or invaded cells was examined using Image-Pro Plus version 6.0 software. All experiments were repeated three times independently.

Statistical analysis
The multi-factor Cox regression model was built to analyze whether risk score could predict the survival of patients with breast cancer independently of all other factors. The statistical analysis was established by R software, with version number v4.2.2.

TME prognostic gene selections for breast cancer prognosis
First, the breast cancer patients from the TCGA-BRCA database were used as a constructing database. We focused on TME genetic profiling in breast cancer patients. The selection scope was based on a complete list of 760 TME genes, which are listed in Supplementary  Table S1. Differential analysis was performed using single-factor Cox regression analysis. The 760 TME-relevant prognostic gene expression values were treated as continuous variables in the regression analysis. The 55 differentially expressed TME prognostic genes were finally screened out as p-value< 0.05 (Table 1).
In the list, the genes with HR value less than 1 were favourable for breast cancer prognosis (protective TME prognostic genes). On the other hand, the genes with HR value greater than 1 were unfavourable for breast cancer prognosis (risk TME prognostic genes). For this purpose, 31 of the 55 genes were considered as protective genes, while the remaining 24 genes were considered as risk genes.
To remove redundant prognosis-related molecules, the 55 differentially expressed TME prognostic genes were then plotted in a LASSO Cox regression analysis. As shown in Figure 1A, the optimal number of differentially expressed TME prognostic genes was determined to be 15 ( Figure 1A, with the lowest lambda value). Therefore, the forest plot of the top 15 genes with the smallest pvalue among the 55 genes is shown in Figure 1B. The 15 TME prognostic genes were ENPEP, CCDC102B, FEZ1, NOS2, SCG2, RPLP2, RELB, RGS3, EMP1, PDLIM4, EPHA3, PCDH9, VIM, GFI1, and IRF1.
The expression values of 15 TME prognostic genes were then weighted with the regression coefficients of the LASSO Cox regression model to generate a risk score for predicting survival in breast cancer patients. Each patient's risk score was calibrated individually. Using the risk score as a further selection criterion, the 1,082 patients were divided into a high-risk group and a low-risk group based on the median of the risk score. According to the survival analysis, the patients in the high-risk sample showed an overall poor survival rate compared to the low-risk sample ( Figure 1C).
In addition, it could be shown that the AUC of the 1-year, 3year, and 5-year survival period of the breast cancer patients were 0.666, 0.721, and 0.681, respectively, obtained from the timedependent ROC ( Figure 1D). These results indicated that the risk model could accurately predict the prognosis of breast cancer patients. Meanwhile, the 15 TME prognostic genes were remarkably different when comparing the high-risk and low-risk groups ( Figure 1E), which further confirmed the specificity of the 15 TME prognostic genes as well as the efficiency of the risk score constructed by them.  Verification of potential TME prognostic genes and risk score using the testing database Previously, we selected primary TME prognostic genes from the constructing database and obtained the corresponding risk factor using the TME prognostic genes. We then sought to confirm the results using an independent testing database (GSE database). Out of 109 breast cancer patients (stage 0 patients were excluded) with complete clinical information, 102 were processed for the study. The age, TNM stage, and risk score of each individual breast cancer patient were all subjected to multivariate Cox regression analysis to decide whether the risk score was an independent prognostic indicator for breast cancer patients in the test database. As shown in Figure 2A, it could be demonstrated that the risk score was dramatically associated with the overall survival of the testing database, and the samples with a high risk score had a higher risk of death and were unfavourable for prognosis (HR = 3.8, 95% CI: 2.34-6, p< 0.001).
To investigate the prognostic value of the risk score established by 15 potential TME-relevant prognostic genes, we further regrouped patients from the testing database and performed Kaplan-Meier survival analysis. Patients were divided into group A ( Figure 2B, <60 years old) and group B ( Figure 2C, ≥60 years old). Patients were defined as high risk or low risk based on the risk score determined by 15 potential TME-relevant prognostic genes. Regardless of age, patients in the high-risk group had a significantly lower overall survival rate than those in the low-risk   Figures 2B, C). These results concluded that the risk score constructed by primary TME prognostic genes was an independently accurate indicator for predicting the prognosis of breast cancer patients. Furthermore, the risk factor was shown to be independent of TNM stage (Supplementary Figure 1).
Correlation between potential selected TME prognostic genes and prognosis of breast cancer patients To investigate the relationships between 15 TME prognostic genes and the prognosis of breast cancer patients in depth, the expressions of these TME prognostic genes were re-entered into the regression model with the survival probability of each individual patient from the construction database. As shown in Figure 3, the risk TME prognostic genes were negatively associated with prognosis ( Figures 3A-E, ENPEP, CCDC102B, FEZ1, NOS2, and SCG2), while the protective TME prognostic genes were positively associated with prognosis ( Figure 3F, RPLP2). The remaining TME prognostic genes are shown in Supplementary Figure 2. Based on the results of forest plot ( Figure 1B) and regression model analysis (Figure 3), ENPEP, CCDC102B, FEZ1, NOS2, SCG2, RGS3, EMP1, EPHA3, and PCDH9 were suggested as risk TME prognostic genes. In contrast, RPLP2, RELB, PDLIM4, VIM, GFI1, and IRF1 were suggested as protective TME prognostic genes.
Among them, there was an apparent linear internal relationship for CCDC102B ( Figure 3B), but non-linear relationships for others ( Figures 3A, C-F).

Prediction of potential miRNAs for TME prognostic gene of breast cancer prognosis
The 15 TME prognostic genes and associated potential target miRNAs were searched using six well-established search tools, namely, miRBase, TargetScan, miRanda, miRWalk, TarBase, and miRecords. The search results were sorted according to TarGetScore (the complete list is shown in Table 2).
In the list, the higher the score, the higher the confidence. The primary miRNAs with prediction scores above 80 were considered relatively reliable, while those with prediction scores below 60 were considered less reliable. The final network of 15 TME prognostic genes and targeting miRNAs was constructed using all primary miRNAs with prediction scores greater than 80 ( Figure 4A). Notably, all miRNAs with high scores interacted with EPHA3, suggesting that the miRNA-manipulated signalling cassette may play a pivotal role in the functions of this TME prognostic gene.
To further validate our predictive results, the direct interactions between potential miRNAs and targeted TME prognostic genes were supported by the luciferase reporter assay. As shown in Figure 4B, the WT-specific fragments of the 3' UTR of EPHA3 and mutant (MUT) were cloned into the luciferase reporter. Subsequently, hsa-miR-559 (with the highest predictive score for EPHA3) and non-targeting miRNA control were transfected into mda-MB-453 cells. According to the luciferase reporter assay results, hsa-miR-559 could directly bind to the 3'UTR of EPHA3 and inhibit its function (WT; Figure 4C). On the other hand, the inhibition was significantly reduced when the sequence of the identified interaction site was mutated (mutant; Figure 4C). Verification of potential TME prognostic genes and risk score using testing database.

Diverse functions of TME prognostic genes for breast cancer development
To date, it has been suggested that promoting cell proliferation, destroying cell apoptosis, promoting cell invasion, and promoting cell migration are the primary causes of cancerogenesis. Based on these hypotheses, we sought to investigate the functions of 15 TME prognostic genes. We used siRNA technology to knock down the expression of each TME prognostic gene. Knockdown of risk TME prognostic genes (ENPEP, CCDC102B, FEZ1, NOS2, SCG2, RGS3, EMP1, EPHA3, and PCDH9) could induce decreased cell proliferation, increased cell apoptosis, and impaired cell invasion or migration. In contrast, deletion of protective TME prognostic genes (RPLP2, RELB, PDLIM4, VIM, GFI1, and IRF1, shown in dark colour in each graph) showed the opposite pattern. Among the 15 TME prognostic genes, most of them (NOS2, SCG2, RGS3, EMP1, PDLIM4, PCDH9, and GFI1) were involved in all three entries, while 6 of them (CCDC102B, RPLP2, RELB, EPHA3, VIM, and IRF1) were favourable for maintaining cell invasion or migration. Furthermore, only two of them (ENPEP and FEZ1) were favourable for the processes of cell proliferation and apoptosis (Figures 5A-C).

Discussion
The TME generally refers to the non-cancerous cells and various components present in the tumour, consisting of immune cells and stromal cells as well as molecules (27). The constant connections between tumour cells and TME components play a pivotal role in Correlationship between potential selected TME prognostic genes and prognosis of breast cancer patients. The relationship analysis between prognosis of breast cancer patients and ENPEP (A), CCDC102B (B), FEZ1 (C), NOS2 (D), SCG2 (E), and RPLP2 (F), respectively.  tumour initiation, progression, development, and metastasis (28). Functionally, the TME components could harbour tumour cells through direct interaction with surrounding cells via the lymphatic and circulatory systems to ultimately influence cancer development (8). Thus, the TME has served as a potential therapeutic target for cancer treatment and has attracted basic research and clinical interest (29). In breast cancer, various preclinical and clinical studies have provided ample evidence that TME genes are involved not only in breast cancer progression but also in determining therapeutic response (30). Furthermore, some of the TME genes have shown a prominent value for the existing predictive and prognostic marker panels. The significant alterations in the TME genes could be recognised as a critical element in the development of breast cancer.  From the 760 TME-relevant prognostic genes, we narrowed down our selection to 15 potential TME prognostic genes with a notable differential expression pattern in breast cancer patients. Among these, we proposed ENPEP, CCDC102B, FEZ1, NOS2, SCG2, RGS3, EMP1, EPHA3, and PCDH9 as risk TME prognostic genes. RPLP2, RELB, PDLIM4, VIM, GFI1, and IRF1 were identified as protective TME prognostic genes ( Figure 2). As every coin has two sides, the protective TME factors from stromal cells or immune cells could abolish cancer cell metastasis or facilitate the immune system defence mechanism. On the other hand, risk TME factors from suppressive immune cells, together with the extracellular matrix (ECM) element, could work together to exhibit anti-tumour immunity and promote breast cancer development. Either enhanced risk TME gene expression or suppressed protective TME prognostic gene expression, caused by dysfunctional or aberrant specific signalling pathways, could contribute to the development of breast cancer. However, there are some limitations to this study. For example, due to the complicated and dynamic status of TME, it is quite difficult to show the exact expression for each individual gene in TME. Our identification of TME-relevant prognostic genes in breast cancer patients was based on previous established literature searches and summaries. At the same time, we did not claim that these genes are only functional in TME, and we only focused on their functional transition from TME to tumour progression. In the future, it may be useful to show where (which cell types) and how (the expression level) the TME-relevant prognostic gene is expressed in the breast cancer environment.
Despite the many breakthroughs in cancer research, there is still a lack of solid evidence for the cancer process due to the extremely complicated molecular mechanisms that underlie the disorders. In breast cancer, we prefer the "seed-and-soil" hypothesis, which suggests that the primary localised breast cancer cells represent "seed" cells (31). The permissive secondary tissues could be considered as "soil" for migrative or invasive cancer cells. The permissive secondary tissues could be considered as "soil" for migrative or invasive cancer cells. Based on this hypothesis, the TME could be considered as a "fertiliser" consisting of ECM, various inflammatory cytokines, and other remodelling enzymes (32). The cancer cells themselves interact with the surrounding functions and components of the TME to initiate or suppress cancer development. Our parallel data demonstrated that CCL, a critical chemokine (C-C) motif ligand of the TME, exerts both anti-cancer properties dependent on the recruitment of anti-cancer tumour-infiltrating lymphocytes (TILs), which destroy cancer cells, and pro-cancer functions correlating with the recruitment of cells functional to stimulate tumour growth, enhance tumour cell migration, and block the activities of tumour suppressors (article in preparation). These were similar to previously published data (33)(34)(35).
Using a broad screening approach, we identified 15 attractive TME genes. Furthermore, we testified the functions of each individual TME from three aspects, namely, cell proliferation, cell apoptosis, and cell invasion or migration manipulation. From the results, the enhancement of cell proliferation and the attenuation of cell apoptosis could be closely related, and none of the TME prognostic genes showed a single function for them. Therefore, cell proliferation and cell apoptosis could be combined in this study. As expected, seven of the TME prognostic genes (NOS2, SCG2, RGS3, EMP1, PDLIM4, PCDH9, and GFI1) showed dual functions  in breast cancer progression. The rest had only a single function in breast cancer development ( Figure 6). Some of the potential members are "old hands" in breast cancer research. For example, ENPEP has been shown to be a key factor involved in breast cancer cell proliferation through the function of inducing G2/M cell cycle arrest and reducing anchorageindependent cell growth of mammary origin (36). Si and colleagues claimed that coiled-coil domain containing 102B (CCDC102B) was apparently increased in metastatic lesions in lymph nodes of breast cancer patients (37). Increased expression of CCDC102B was required for breast cancer metastasis. Here, the functions of CCDC102B may be achieved through the regulation of NF-kB pathway components. The FEZ1 gene was mapped to chromosome 8p22 (a common aberration in human tumours). Mutations in the FEZ1 gene have been found in a variety of cancers (38). EMP1, which stands for epithelial membrane protein gene 1, together with EMP2 and EMP3 belong to the PMP22 (peripheral myelin protein 22-kDa) gene family. There are six members of this gene family, namely, brain cell membrane protein 1, MP20, EMP1, EMP2, EMP3, and PMP22. In lung cancer, EMP1 has been implicated as a biomarker for gefitinib resistance. EMP1, EMP2, and EMP3 have been reported as novel therapeutic targets in human cancer (39). Previously, the PDLIM4 gene was identified as a tumour suppressor. In breast cancer cells, the PDLIM4 gene encodes an adaptor protein that functions as a key regulator of stress fibre assembly, actin cytoskeleton remodelling, and epithelial-mesenchymal transition (40). The IRF1 gene has been shown to have essential functions during the epithelialmesenchymal transition process. However, IRF1 is also required for the maintenance of epithelial differentiation. This dual role of A B C FIGURE 5 Diverse functions of TME prognostic genes for breast cancer development. The cell proliferation analysis (A), the cell apoptosis examination (B), and the cell invasion or migration measurement (C) comparing untreated control and 15 TME prognostic gene siRNA-transfected breast cancer cells. * as an indication of P < 0.05 compared with internal control.
IRF1 is context-dependent, particularly for the modulation of epithelial-mesenchymal plasticity, which may be of interest for future breast cancer treatment (41).
Overall, we have identified several interesting TME prognostic genes involved in breast cancer progression through a large-scale screening approach. The targeted miRNAs and the molecular mechanisms highlighted were validated both in vivo and in vitro, opening a new window for future studies.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/, GSE162228.