Prognostic Risk Model of Immune-Related Genes in Colorectal Cancer

Purpose We focused on immune-related genes (IRGs) derived from transcriptomic studies, which had the potential to stratify patients’ prognosis and to establish a risk assessment model in colorectal cancer. Summary This article examined our understanding of the molecular pathways associated with intratumoral immune response, which represented a critical step for the implementation of stratification strategies toward the development of personalized immunotherapy of colorectal cancer. More and more evidence shows that IRGs play an important role in tumors. We have used data analysis to screen and identify immune-related molecular biomarkers of colon cancer. We selected 18 immune-related prognostic genes and established models to assess prognostic risks of patients, which can provide recommendations for clinical treatment and follow-up. Colorectal cancer (CRC) is a leading cause of cancer-related death in human. Several studies have investigated whether IRGs and tumor immune microenvironment (TIME) could be indicators of CRC prognoses. This study aimed to develop an improved prognostic signature for CRC based on IRGs to predict overall survival (OS) and provide new therapeutic targets for CRC treatment. Based on the screened IRGs, the Cox regression model was used to build a prediction model based on 18-IRG signature. Cox regression analysis revealed that the 18-IRG signature was an independent prognostic factor for OS in CRC patients. Then, we used the TIMER online database to explore the relationship between the risk scoring model and the infiltration of immune cells, and the results showed that the risk model can reflect the state of TIME to a certain extent. In short, an 18-IRG prognostic signature for predicting CRC patients’ survival was firmly established.


INTRODUCTION
Colorectal cancer (CRC) ranks among the top causes of cancerrelated deaths worldwide that endangers human health. The GLOBOCAN data in 2018 released by the International Cancer Research Agency showed that each year there were approximately 1.85 million new CRCs and more than 880,000 deaths worldwide. The morbidity and mortality of CRC rank third and second, respectively, in malignant tumors, in which the morbidity accounts for approximately 10% of the total cancer incidence, and the mortality accounts for 9% of the total deaths due to cancer (Bray et al., 2018). It was predicted that the number of cases will increase by more than 60% in 2030, with 2.2 million new cases and 1.1 million deaths (Arnold et al., 2017). Surgical resection is the main treatment option for CRC patients. With the application and popularity of colonoscopy, early treatment work has been improved. The clinical outcomes of CRC patients in many countries have improved significantly over the past few decades (Atkin et al., 2017). Despite the complete surgical resection, many CRC patients eventually relapsed and developed metastatic disease (Angenete, 2019). In clinical practice, a more effective prognostic evaluation system is urgently needed to provide personalized medicine for CRC patients and improve patient outcomes.
It is noteworthy that after Fearon and Vogelstein proposed the model of CRC genetic basis, researchers have begun to understand the heterogeneity of CRC (Fearon and Vogelstein, 1990). Patients with different genetic backgrounds had different outcomes after receiving the same treatment (Fearon and Vogelstein, 1990). Some researchers believed that it was attributed to immunity-related factors (Becht et al., 2016). As we knew that the immune system plays an important role in the development of a variety of cancers, including CRC (Gentles et al., 2015). A recent study found that immunological data (such as type, density, and location of immune cells in tumor samples) can predict patient survival better than the current histopathological characteristics used for CRC patients (Galon et al., 2006). Immune cells are important parts of the tumor microenvironment and affect the development and metastasis of CRC (Pages et al., 2005). Tumor-infiltrating macrophages and dendritic cells in CRC are related to local regulatory T cells and systemic T-cell responses to tumor-associated antigens and have an impact on patients' survival (Nagorsen et al., 2007). In addition, studies have shown that immunerelated genes (IRGs) in colon cancer are closely related to the occurrence and development of colon cancer. However, there is currently no prognostic model based on IRGs to predict the overall prognosis of CRC patients and systematically assess the immune environment of CRC (Ge et al., 2019). Therefore, constructing an immune-based prognostic model that can effectively predict the prognosis of CRC has a very important clinical application prospect.
In this study, we screened differentially expressed IRGs that are closely related to CRC through bioinformatics analysis of The Cancer Genome Atlas (TCGA). Next, the IRGs that were significantly associated with prognosis were further screened. Differentially expressed tumor-associated transcription factors (TFs) were searched, and a correlation network was constructed to reveal the relationship between TFs regulating immune genes. Then, immune-related prognostic models were constructed by integrating IRGs of CRC. Besides, we verified that the risk model can be used as an effective independent prognostic indicator.

Patient Data Collection
Colorectal cancer patients (adenocarcinomas) with gene expression profiles and clinical information were obtained from TCGA data portal 1 . Processed RNA-Seq FPKM data of 398 CRC and 39 adjacent normal tissues were downloaded for further analyses.

IRGs and Cancer-Related Transcription Factors
The comprehensive list of IRGs was downloaded from the Immunology Database and Analysis Portal (ImmPort) database 2 , which shares immunology data and provides a list of IRGs for cancer researchers (Bhattacharya et al., 2014). The IRGs that actively participated in the immune process were identified. To investigate the regulatory mechanism of IRGs, we extracted cancer-related transcription factors (CRTFs) for subsequent research. The CRTF data were downloaded from the Cistrome Cancer database 3 , which is a useful database for biomedical and genetic research and includes 318 CRTFs (Mei et al., 2017).

Differential Gene Expression Analysis
To select the IRGs and TFs that contributed to the development and progression of CRC, differentially expressed genes (DEGs) between tumor samples and normal samples were screened using the limma R package. Differential expression analysis was conducted, with an adjusted false discovery rate < 0.05 and | log2(fold change)| > 1 as the thresholds. Differentially expressed IRGs were identified as overlaps between the IRG list and the DEG list. Differentially expressed TFs were identified as overlaps between the TFs list and the DEG list. Heatmaps were generated using the "pheatmap" R package, and volcano plots were also displayed using the "ggplot2" R package.

CRTF-IRG Regulatory Network
In order to evaluate how differentially expressed CRTFs regulate prognosis-related IRGs, we studied the correlation between them. The core method is the Pearson test. The critical standard is set to a correlation coefficient > 0.4, P < 0.001. This step is performed using the Cor. test function in R, and the correlation coefficient and P value are calculated by Cor. test. To make the situation clearer, Cytoscape was used to build a visual regulatory network.

PPI Network Construction and Module Analysis
The PPI network was predicted using an online database search tool STRING 4 (Franceschini et al., 2013). Analyzing functional interactions between proteins may provide insights into the mechanisms of CRC development and progression. In this study, prognostic-related PPI networks of IRGs and CRTFs were constructed using the STRING database, and interactions with composite scores > 0.4 were considered statistically significant. Kyoto Encyclopedia of Genes and Genomes signaling pathways 4 http://string-db.org, version 11.0 and biological functions of genes were analyzed using functional clustering carried by STRING.

Gene Set Enrichment Analysis
Gene Set Enrichment Analysis (GSEA) 5 was used to analyze the GO term of the genes that make up the signature.

Construction of the Immune-Related Signature for CRC
To control the quality of the data, after excluding patients who lacked survival information or survived for less than 90 days, 5 https://pypi.org/project/gseapy/ 334 samples were subsequently analyzed. Transcriptomic analysis of RNA measured by FPKM values was performed using log2based conversion. Based on the differentially expressed IRGs, Kaplan-Meier analysis was first performed to screen prognostic immune genes. Then, immune-related prognostic signature (IPS) was constructed by multivariate Cox regression to calculate the risk score for each patient. Risk scores were acquired based on expressions of genes multiplied by a linear combination of regression coefficients obtained from the multivariate Cox regression analysis. P < 0.01 was regarded as significant.

Survival Analysis
According to the optimal cutoff value obtained by the "survminer" R package, CRC patients were classified into low risk and high risk according to their risk scores. To investigate the prognostic value of the prognostic model in CRC patients, univariate Cox analysis was implemented by the "survival" R package and "survminer" R package. A time-dependent receiver operating characteristic (ROC) curve was plotted to assess sensitivity and specificity using the "timeROC" R software package (Heagerty et al., 2000). The area under the curve was calculated from the ROC curve.

Association Analysis Between 18-IRGs and Clinical Parameters
Association analysis of clinical characteristics of 18 key prognostic IRGs in the model was performed using the t test. To transform the data types into binary variables, 398 CRC patients were grouped according to different clinical characteristics. In terms of age, 65 years old was chosen as the cutoff point. The stage was divided into stages I and II and stages III and IV. The T stage was divided into T1-2 and T3-4. M stage was divided into M0 or M1. N stages N0 and N1-2.
TIMER Database Analysis of the Correlation Between Immune-Related Markers and Immune Cell Infiltration TIMER database 6 is a comprehensive resource for systematical analysis of immune infiltrates across different cancer types (Li et al., 2017). The abundance of six immune infiltrates was estimated by the TIMER algorithm (B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and dendritic 6 https://cistrome.shinyapps.io/timer/   cells). We used the TIMER database to analyze the correlation between the prognostic model of CRC patients and six tumorinfiltrating immune cells.

Statistical Analysis
Overall survival (OS) was defined as the main outcome. Univariate cox regression analysis and multivariate cox regression analysis were performed to evaluate the prognostic effect of the immune signature and various clinicopathological features including age, clinical stage, grade, and TNM stage. Statistical analyses were performed using R software (version 3.5.1). The heatmap was generated using the "pheatmap" R package. Unless otherwise specified, a two-sided P < 0.05 was considered statistically significant.

Differentially Expressed IRGs and CRTFs in CRC
Compared with normal tissues, there were 5,938 DEGs in CRC tissues, of which 3,936 were up-regulated and 2,002 were down-regulated in these samples. The difference between tumor tissue and normal tissue can be seen through the heatmap and the volcano map ( Figures 1A,B). Compared with normal tissues, a total of 484 IRGs (173 up-regulated and 311 downregulated) and 71 CRTFs (46 up-regulated and 25 downregulated) were differentially expressed in CRC tissues. The heatmaps showed that CRC samples can be distinguished from normal samples based on the differentially expressed IRGs and CRTFs (Figures 1C,D). The volcano plots showed the distribution of differentially expressed IRGs and CRTF between CRC samples and normal controls (Figures 1E,F).

The Mechanism of Prognosis-Related IRGs and CRTF-IRG Regulatory Network
We explored the potential regulatory mechanisms of 18 prognostic-related IRGs, which may reflect the regulatory mechanisms of these gene sets. We selected 30 prognosticrelated IRGs and 71 differential CTRFs for correlation analysis to explore the regulatory mechanism of prognostic-related IRGs. The Cor. test function is used to test the correlation between each CRTF and each IRG. The core method is Pearson test. The correlation coefficient filter is 0.4, and the P value filter is 0.001. The regulatory relationship between these CRTFs and IRGs is revealed in the regulatory network ( Figure 3A). As shown in Figure 3A, NR3C1, MYH11, RUNX1, MAF, CCB7, LMO2, FOXP3, and EPAS1 regulate most of the IRGs related to prognosis and dominate the regulation network. This transcriptional regulatory network reveals the regulatory relationship between these IRGs and CRTFs. Table 2 shows the correlation between IRGs and CRTFs after screening. The PPI network of IRGs and CRTFs was constructed, and the most significant module was obtained. The functional analyses of genes involved in this module were analyzed. Enrichment analysis shows that the genes in this module are mainly involved in cell proliferation and metabolic processes ( Figure 3B).

Hub Gene Selection and Analysis in CRC
Using Cytohubba in Cytoscape, we filtered 33 hub genes that were identified by filtering according to the criterion of degrees > 10 criteria (each node had more than 10 interactions), and the 10 most central genes in the immune gene regulatory network according to node degree were MAF, A2M, CBX7, MYH11, EPAS1, CXCL12, LMO2, S1PR1, FOXP3, and NR3C1 (Figures 4A,B). Gene Ontology (GO) enrichment analysis of genes in the immune gene regulatory network related to prognosis was conducted to explore which signaling pathways were activated. The analysis of the biological processes (BPs) of the central genes using BiNGO in Cytoscape is shown in Figures 4C,D. GO analysis showed that the changes in the BPs of these genes were significantly enriched in the immune process, cell proliferation, immune organ development, and hemopoiesis. Changes in molecular function were mainly focused on TF activity, cytokine activation, and molecular binding. Afterward, the functional enrichment analysis of the key genes of the IRG set was performed by GSEA. Figure 4E shows that the changes in the BP of these genes are significantly enriched in the immune process, cell proliferation, immune organ development, and hematopoiesis. The results of Kaplan-Meier analysis of these hub genes are in the Supplementary Figure 1.

Construction of the Immune-Related Signature for CRC
Multivariate Cox analysis was performed on 30 prognostic IRGs, and 18 genes were finally selected to establish a prognostic model ( Table 3). The risk score is based on the gene expression level multiplied by its corresponding regression coefficient. The regression coefficient was calculated by multivariate Cox regression. The risk score is related to not only the expression level of these genes but also the correlation coefficients. The risk score of each patient is the sum of all the 18 risk prognostic genes in Table 3 multiplied by the corresponding risk factors. The 398 CRC samples were then divided into high-risk groups (n = 199) and low-risk groups (n = 199) based on the median risk score ( Figure 5A). Survival overview and gene expression heatmaps are presented in Figures 5B,C. Survival analysis showed that the OS of patients in the high-risk group was significantly lower than that in the low-risk group (P < 0.0001; Figure 5D). The 5-year survival rate of the high-risk group was 51.1%, and the 5-year survival rate of the low-risk group was 81.4%. The areas under the ROC curves at 1, 3, and 5 years of OS are 0.811, 0.711, and 0.734, respectively, which indicated that the prognostic model showed good sensitivity and specificity ( Figure 5E). In addition, as shown in Supplementary Figure 2, the model after excluding genes with P ≥ 0.05 has advantages in the short-term prognosis (1 year), but the model is not effective in predicting the long-term prognosis.

Immune-Related Prognostic Signature Was an Independent Predictive Marker of OS for CRC Patients
Three hundred ninety-eight CRC patients with clinical information of age, gender, pathological stage, TNM stage, and risk score were selected for further analysis. Univariate and multivariate Cox regression analyses were performed to assess the independent predictive power of immune-related prognostic markers. Univariate analysis showed that pathological The biological process analysis of hub genes was constructed using BiNGO. P < 0.05 was considered statistically significant. (E) Ten-hub-gene enrichment plots from Gene Set Enrichment Analysis (GSEA).   stage (P < 0.001), TNM stage (P < 0.001), and immunerelated prognostic risk score (P < 0.001) were significantly correlated with OS (Table 4 and Figure 6A). After multivariate analysis, the immune-related prognostic risk score was the only independent prognostic factor related to OS (P < 0.005; Table 5 and Figure 6B).

Association Between 18 IRGs, Clinical Parameters, and Prognostic Risk Scores
We analyzed the association between the expression of 18 key prognostic related IRGs in the patient's tumor tissue and the patient's clinical characteristics. The association between NGF, TRDC, CXCL3, CD1B, VIP, F2RL1, FABP4, OXTR, UCN, NOX4, ADIPOQ, and clinical characteristics was found ( Table 6 and Figure 7). NGF is negatively correlated with age, and NGF expression is generally higher in advanced patients. Patients with higher VIP expression generally have higher T and N stages. On the other hand, TRDC, CXCL3, and FRL1 are highly expressed in patients in the early stage and patients with N0 stage.

TIMER Database Analysis
The relationships between the risk score model and immune cell infiltration were studied. The characterization of immune infiltration is very important for exploring the state of the immune microenvironment and studying the interaction between tumors and immunity. We applied the TIMER tool to identify potential relationships between IPS and infiltrating immune cells, including B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and dendritic cells. As shown in Figure 8, the proportions of tumor-infiltrating CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and dendritic cells were closely related to our prognostic risk score (p < 0.05).

DISCUSSION
In recent years, the genetic characteristics of mRNA in cancer patients have attracted people's attention, and studies have revealed its great potential in the prognosis of CRC. In this study, based on the analysis of the TCGA data set, 484 differentially expressed IRGs were screened from 389 HCC and 39 normal tissues. By univariate regression analysis of differentially expressed IRGs, 30 genes were detected to be significantly correlated with OS. To further study the regulatory mechanisms of prognostic IRGs, a tumor-related TF-mediated network was established to reveal key TFs that can regulate these IRGs. Studies have shown that CBX7 played an important role in gastric and pancreatic cancer (Ni et al., 2017(Ni et al., , 2018. In recent years, studies have found that CBX7 was a component of polycomb repressive complex 1, maintaining the stem celllike characteristics of gastric cancer cells by activating the AKT pathway and down-regulating p16 (Ni et al., 2018). MYH11 (also known as SMMHC) encodes a smooth muscle myosin heavy chain, which plays a key role in smooth muscle contraction. The inversion of the MYH11 locus is one of the most common chromosomal aberrations in acute myeloid leukemia (Alhopuro et al., 2008). The MYH11 gene has a single-nucleotide repeat sequence (C8) in the coding sequence, which may be a mutation target for cancer that exhibits microsatellite instability (MSI). The study found that compared with the low microsatellite unstable group, the incidence of MYH11 frameshift mutation was higher in patients with high microsatellite-unstable (MSI) gastric cancer and CRC (Jo et al., 2018). Among these major hub genes, the study of CXCL12 is more comprehensive. It has been reported that the CXCL12/CXCR4 axis is related to tumor progression, angiogenesis, metastasis, and survival (Teicher and Fricker, 2010). Recent studies have found that the    activation of LMO2 is essential for the development of T-cell acute lymphoblastic leukemia (T-ALL) leukemia (Morishima et al., 2019). The SP1PR1 gene plays a role in regulating tumors. Targeting the SphK1/S1P/S1PR1 axis with specific drugs can reduce tumor progression caused by key proinflammatory cytokines, macrophage infiltration, and obesity (Nagahashi et al., 2018). FOXP3 is one of the key TFs controlling the development and function of regulatory T cells. FOXP3 has been extensively studied in human tumors, which is closely related to tumor immunity, and its correlation with T cells in tumors has recently been reported (Wing et al., 2019). The relationship between several other genes and CRC is still unclear. Among them, the role of MAF, A2M, EPAS1, and NR3C1 in CRC is worthy of further investigation. A study of breast cancer showed that the enhanced expression of MAF can mediate bone metastasis of breast cancer, which can be used as a risk index for bone metastasis in breast cancer patients (Pavlovic et al., 2015). The proteins encoded by A2M are protease inhibitors and cytokine transporters. It can inhibit a variety of proteases, as well as inflammatory cytokines, thereby destroying the inflammatory  cascade. Xu's team found that EPAS1 gene is dysregulated in non-small cell lung cancer, which encodes hypoxia-inducible factor 2α and plays an important role in the progression of non-small cell lung cancer (Xu et al., 2018). It is known that EPAS1 is regulated by DNA methylation transcription in CRC (Rawluszko-Wieczorek et al., 2014), but its role in CRC remains to be studied. NR3C1 encodes a glucocorticoid receptor, which can act both as a TF that binds to the glucocorticoid response element in the promoter of the glucocorticoid response gene to activate its transcription and as a regulator of other TFs. Further experimental evidence on the function of these genes in CRC may be of great help to our understanding of the progress of CRC. In recent years, Xie et al. (2017) established a 20-gene prognosis model, which has a good predictive function for CRC prognosis. Another study also constructed a novel four-gene signature for CRC OS prediction based on gene expression data from TCGA, COAD, and READ data sets (Ahluwalia et al., 2019). A recent study exploring the prognostic value of immune cells in the CRC tumor microenvironment determined that tumorinfiltrating immune cells is highly correlated with the progression and prognosis of CRC (Ge et al., 2019). However, these studies do not fully explore the relationship between immune genes and the prognosis of CRC. Our study has the following advantages. First, we used a specialized immunological database to analyze as many IRGs as possible. To our knowledge, this is the first study to explore the relationship between a large number of IRGs and the prognosis of patients with CRC. Second, we obtained some immune-related prognostic genes and established a novel prognostic model related to immunity. This prognostic model showed excellent performance in the prediction of OS based on the TCGA database. According to the in-depth analysis, the immune-related prognostic model was demonstrated to be an independent prognostic indicator after adjusting for other clinical factors. These results indicated that the immune-related prognosis model can be used as an effective marker for the prognosis of CRC patients.
The characterization of immune infiltration is of great significance for studying the interaction between tumor and immunity. Therefore, we explored the relationship between immune-related prognostic models and immune cell infiltration to reflect the state of the immune microenvironment. According to the TIMER database, we found that high-risk patients had higher levels of CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and dendritic cells of infiltration. These results confirmed and extended the discovery that the heterogeneity of immune infiltration is important for the progression of CRC. A recent study reported that the colonic cancer microenvironment uses dendritic cells' plasticity to support cancer progression by enhancing the release of the inflammatory chemokine CXCL1 (Hsu et al., 2018), which is consistent with our results. Neutrophils contribute to the activation, regulation, and effect of immune cells (Mantovani et al., 2011). Existing research reported that tumor-associated neutrophils in CRC produce matrix metalloproteinase 9 vascular endothelial growth factor and hepatocyte growth factor to promote tumor invasion and angiogenesis. In addition, neutrophils also promote the spread of tumor cells by capturing tumor cells in the circulation, thereby promoting their migration to distant places (Mizuno et al., 2019). Studies have reported that macrophages are associated with CRC progression (Wei et al., 2019). Tumor-associated macrophages (TAMs) can induce EMT processes to enhance CRC migration, invasion, and circulating tumor cell (CTC)-mediated metastasis (Wei et al., 2019). The immune model can indicate the infiltration of immune cells to some extent. It may be a promising way to cure CRC by broadening the relationship between immune cells and tumor progression.
Current research provides novel insights into the CRC immune microenvironment and immunotherapy. We conducted functional studies on selected genes to confirm their clinical value. However, the limitation of this study is that it is a retrospective study. Therefore, further prospective research is needed. On the one hand, the predictive capability of this model in CRC requires further testing with the goal of better prognostic stratification and treatment management. On the other hand, we need to further study the biological functions of the 18 IRGs through a series of experiments.
In short, through comprehensive analysis, many IRGs were found to be significantly related to the prognosis of CRC. Besides, we constructed a novel immune-related prognosis model as an independent prognostic indicator of CRC. This prognostic model can also indicate the infiltration of immune cells and prove its key role in the TIME. The current research has deepened our understanding of IRGs in CRC and provided new potential prognostic and therapeutic biomarkers.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.