Molecular Subtypes and CD4+ Memory T Cell-Based Signature Associated With Clinical Outcomes in Gastric Cancer

Background CD4+ memory T cells are an important component of the tumor microenvironment (TME) and affect tumor occurrence and progression. Nevertheless, there has been no systematic analysis of the effect of CD4+ memory T cells in gastric cancer (GC). Methods Three datasets obtained from microarray and the corresponding clinical data of GC patients were retrieved and downloaded from the Gene Expression Omnibus (GEO) database. We uploaded the normalize gene expression data with standard annotation to the CIBERSORT web portal for evaluating the proportion of immune cells in the GC samples. The WGCNA was performed to identify the modules the CD4+ memory T cell related module (CD4+ MTRM) which was most significantly associated with CD4+ memory T cell. Univariate Cox analysis was used to screen prognostic CD4+ memory T cell-related genes (CD4+ MTRGs) in CD4+ MTRM. LASSO analysis and multivariate Cox analysis were then performed to construct a prognostic gene signature whose effect was evaluated by Kaplan-Meier curves and receiver operating characteristic (ROC), Harrell’s concordance index (C-index), and decision curve analyses (DCA). A prognostic nomogram was finally established based on the CD4+ MTRGs. Result We observed that a high abundance of CD4+ memory T cells was associated with better survival in GC patients. CD4+ MTRM was used to stratify GC patients into three clusters by unsupervised clustering analysis and ten CD4+ MTRGs were identified. Overall survival, five immune checkpoint genes and 17 types of immunocytes were observed to be significantly different among the three clusters. A ten-CD4+ MTRG signature was constructed to predict GC patient prognosis. The ten-CD4+ MTRG signature could divide GC patients into high- and low-risk groups with distinct OS rates. Multivariate Cox analysis suggested that the ten-CD4+ MTRG signature was an independent risk factor in GC. A nomogram incorporating this signature and clinical variables was established, and the C-index was 0.73 (95% CI: 0.697–0.763). Calibration curves and DCA presented high credibility for the OS nomogram. Conclusion We identified three molecule subtypes, ten CD4+ MTRGs, and generated a prognostic nomogram that reliably predicts OS in GC. These findings have implications for precise prognosis prediction and individualized targeted therapy.


INTRODUCTION
Gastric cancer (GC) is the fifth most diagnosed malignancy and is the third highest cause of cancer mortality worldwide (1). The incidence of GC greatly varies among regions, with more than 70% of cases occurring in developing countries, mainly in Eastern Asia (2). The prognosis of GC patients is still not optimistic owing to genetic heterogeneity and the difficulty of early-stage screening, especially in China (3). Therefore, the identification of effective biomarkers is of great importance to better evaluate tumor progression, predict overall survival and enhance therapeutic efficacy.
After years of in-depth research, the scientific understanding of tumor progression has become more comprehensive and recognizes single malignant cells and the very complex niche called the tumor microenvironment (TME). The TME has a considerable impact on the occurrence and development of GC (4). Disorders of the immune system can enable tumor cells to evade immune surveillance. Molecular profiles of immune cells and immune-related genes (IRGs) within their TME represent promising candidates for predictive and prognostic biomarkers (5,6). Recently, T cell immunity has been an area of active research. T cells progressively lose function and become exhausted during cancer; however, effective T cell responses are essential to ultimately controlling tumors (7). Several studies have revealed a relationship between T cell immunity and tumor development. This has been found in lung cancer (8), breast cancer (9), and ovarian cancer (10). CD4 + memory T cell has been reported to be an important role in TME. In colorectal cancer, it has been suggested in more infiltrated than normal tissue (11); in triple-negative breast cancer, CD4 + memory T cell enrichment score seem higher in invasive tumors (12); in lung adenocarcinoma, it seemed relative hypometabolism and favorable prognosis (13), but the relationship is not yet clear in the case of GC. Subsequently, through the bioinformatics tool of CIBERSORT and Kaplan-Meier survival curves, we explored the relationship between immune infiltration and outcome of patients with gastric cancer according to the gene expression profiles from the GEO database and found that prognosis were closely associated to memory CD4 + T cells.
Collectively, in this study, we investigated the effects of CD4 + memory T cells in GC patients. We explored the potential role of CD4 + memory T cells, CD4 + memory T cell-related genes, and molecular subtypes in GC using bioinformatics models. The results will contribute to the development of precision therapy strategies for gastric cancer patients.

Datasets and Patients
The public microarray data sets and corresponding clinical data of GC patients were retrieved and downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm. nih.gov/geo/). Clinical data contained the age, gender, pathologic TNM stage, and survival information. Three datasets were selected to merge into a single cohort for further analysis: GSE34942, GSE57303, and GSE62254, for a total of 426 gastric cancer samples. Another three datasets (GSE26899, GSE84437, and GSE26901) and the Cancer Genome Atlas (TCGA) transcriptome data which was downloaded from https://portal.gdc.cancer.gov/repository, were performed as external validation. Series matrix files and data tables of the microarray platform were downloaded from the GEO website. The preprocessing of data was as follows: (1) GC samples without clinical survival information were removed; (2) data on normal GC tissue samples were removed; (3) and only the expression profile of immune-related genes was included.

Estimation of Immune Cell Type Fractions
Batch effects and noise were inherent in the three datasets (GSE34942, GSE57303, and GSE62254), which were from multiples studies spanning diverse cell lines and different platforms; therefore, combat normalization in the "sva" R package was used to co-normalize the three datasets into a single cohort. To quantify the proportion of immune cells in the GC samples, we uploaded the normalize gene expression data with standard annotation to the CIBERSORT web portal (https://cibersort.stanford.edu/), and the algorithm was run using 1000 permutations and the LM22 gene signature as previously described (14). Only samples that had a CIBERSORT output of P < 0.05 were considered in the subsequent Kaplan-Meier (K-M) analysis (Supplementary Figure 2).

WGCNA Network Analysis
The WGCNA R package was used to build a gene co-expression network to mine their module membership associated with immune cells. Immune-related gene (IRG) data were obtained from the InnateDB (https://www.innatedb.ca/). The soft threshold parameter was first applied to ensure a scale-free network. To further identify the functional blocks in the co-expression network of the immune-related genes, the topological overlap measure (TOM) was then performed to calculate the correlation between genes. Finally, connecting modules with the immune cells identified the key module that was most associated with OS.

Bioinformatics Analysis
After the key module had been identified, the genes were input to Metascape (https://metascape.org/) for gene annotation and gene list enrichment analysis (15), which included biological process (BP), cellular component (CC), molecular function (MF), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The top 20 terms were selected for visualization; however, more than 20 GO terms or pathway annotations were identified. Unsupervised clustering analysis via the "ConsensusClusterPlus" R package was used to perform consensus molecular subtyping of immune subtypes (16). Differentially expressed gens (DEGs) analysis of the three clusters which took the intersection after comparing the two groups separately including intersection of up-regulated IRGs and intersection of down-regulated IRGs, was carried out using the "limma" R package, according to the thresholds of |log2-fold-change| > 0.5 and an adjusted false discovery rate (FDR) P-value of < 0.01. The differentially expressed IRGs (DEIRGs) were obtained by intersecting the list of previously acquired immune-related genes with the list of DEGs. The "clusterProfiler" R package was used to perform Gene Ontology function enrichment, which has been descried in detail in a previous study (17,18).

Construction and Validation of Immunoscore Prognostic Model
Univariate Cox regression analysis was used to calculate the hazard proportions for genes of the yellow module; these were considered statistically significant at p<0.05. Least absolute shrinkage and selection operator (LASSO) regression analysis was performed with the "glmnet" R package (19) to select the most useful gens with the best predictive performance using 10-fold cross validation. An immunoscore model of GC patients was then established based on linearly combining the multiplication of the Cox coefficient (b) derived from the LASSO regression analysis by its scale expression value, i.e., immunoscore = S Cox coefficient of gene Xi × scale expression value of gene Xi. Patients were classified into high-or low-risk groups according to the optimal cutoff value. The cutoff value was determined based on the association with overall survival (OS) using the "survminer" R package. Time independent receiver operating characteristic (ROC) curves were used to depict the sensitivity and specificity of the survival prediction based on the immunoscore, with quantification of the area under the curve (AUC) using the "timeROC" R package.

Construction and Validation of Nomogram Model
Multivariate Cox regression analysis was employed via the "rms" R package to determine independent prognostic factors, resulting in an immunoscore-based prognostic nomogram with three factors (p < 0.05); only patients with complete clinical data were included. The predictive accuracy of the nomogram was measured and compared through the AUC of the ROC curve, Harrell's concordance index (C-index), and decision curve analysis (DCA).

Composition of Immune Cells in GC
The proportions of 22 immune cell types in GC assessed by the CIBERSORT algorithm are displayed in a box plot ( Figure 1A) and heatmap ( Figure 1B). The 22 tumor immune cell types were weakly or moderately correlated with each other ( Figure 1C).

Identification and Functional Enrichment Analysis of CD4 + MTRM
The K-M survival curves with log-rank tests for the 22 immune cell types are presented on Supplementary Figure 2. These results indicated that high CD4 memory resting T cells, low CD4 memory activated T cells and low M1 macrophages were significantly associated with poorer OS, p = 0.002, p < 0.001, p = 0.029, respectively ( Figure 2A). WGCNA was then performed to select modules of highly correlated genes related to external sample characteristics (prognostic immunocytes). After placing the IRGs with similar expression patterns into modules by average linkage clustering and determining the soft threshold parameter, we finally identified 8 modules; each colour represents a different module ( Figures 2B, C). In addition, the relationships between genes and immunocytes in each modules were identified and are displayed in a heatmap ( Figure 2D). The yellow module is identified as the CD4 + memory T cell-related module (CD4 + MTRM). This module was most negatively associated with resting CD4 + memory T cells, while it was most positively associated with activated CD4 + memory T cells and M 1 macrophages, as assessed by the heatmap and scatter plot scores ( Figure 3A). Interaction network of gene in the CD4 + MTRM were shown in Figure 3B. These results suggested that the genes in CD4 + MTRM were closely related with OS; therefore, CD4 + MTRM was chosen for further analysis. We loaded the genes in the CD4 + MTRM into Metascape to explore the underlying biological processes. The Metascape results revealed functional categories mainly related to immune regulation, such as the negative regulation of the immune system process and antigen processing and presentation ( Figures 3D, F). Moreover, the enriched processes were highly associated and clustered into an intact network ( Figures 3C, E).

CD4 + MTRM-Based Clusters Were Significantly Associated With Prognosis, Clinicopathological Characteristics, and Immunocytes
The abundance of tumor-infiltrating immunocytes varied considerably by individual. To gain greater insight into the molecular heterogeneity of GC, we performed unsupervised consensus analyses with the k-means algorithm of the yellow module patient samples based on the immunocyte proportion ( Figure 4B). Three distinct molecular clusters were identified (kmeans = 3, Figure 4D). The cell proportions of each immune subtype are shown in Figure 4A. To further elucidate the clinical significance of the identified clusters, we explored the correlation between the cluster and clinicopathological features. The clusters were related with distinct patterns of survival in the Kaplan-Meier analysis ( Figure 4C). For example, cluster 1, defined by high levels of activated CD4 + memory T cells and M 1 macrophages and low levels of CD4 + memory resting T cells (all p < 0.001), was significantly associated with a better prognosis compared with the other clusters (p = 0.0056).

Construction and Validation of Immunoscore Prognostic Model
Each gene in the CD4 + MTRM was analyzed by univariate Cox regression analysis. We identified 105 genes that were significantly associated with the prognosis of gastric cancer ( Figure 6A and Supplementary Table 1) that were defined as CD4 + memory cell-related genes (CD4 + MTRGs). In the subsequent LASSO regression analysis and multivariate Cox analysis ( Figures 6B, C), 10 CD4 + MTRGs were identified ( Table 1), which indicated that these CD4 + MTRGs would be selected to establish a prognostic model. The downregulated expression of IL-1, CTSW, NR1H3 and CCR8 (with HR < 1) indicated these molecules as tumor suppressors, whereas the upregulated expression levels of LY86, RABGEF1, CYFIP2 and SERINC3 (with HR > 1) indicated these molecules as oncogenes.
The patients were separated into training and validation cohort in a ratio of 1:1 using the stratified randomization method. We constructed a prognostic model in training cohort that divided patients into high-and low-risk groups based on the optimal cutoff value of the immune-risk score that was calculated by the "survminer" R package ( Figure 6D) Figure 7A).
To determine if the immune-risk score model is solid in different populations, the same formula was applied to the validation cohort and also to the entire cohort. The patients were then divided into high-or low-risk groups using the cut off value obtained from training cohort. Consistent with the findings in the training cohort, in both the validation cohort and entire cohort the K-M curves illustrated that the high-risk group was associated with a notably poorer prognosis than the low-risk group (all p < 0.001). The results revealed that the predictive potential of the immune-risk score model is applicable in different populations. The prognostic accuracy of the immunoscore in the validation cohort and the entire cohort was also evaluated; the AUC achieved 0.  Figures 7B, C). Furthermore, three external datasets GSE26899, GSE84437, and GSE26901 and the TCGA transcriptome data were used to confirm the association between the immunoscore prognostic model and survival outcomes in GC patients. Immune-risk scores were calculated with the same formula for each patient. Patients were divided into high-and low-risk groups according to the optimal cutoffs identified for each dataset. The KM survival curves revealed significant difference in OS between groups in both datasets. High-risk groups had markedly poorer outcomes than low-risk groups (Supplementary Figure 5).
All these suggested that the 10 DEIRG signature had high sensitivity and accuracy and could be used for monitoring survival. The expression of five immune checkpoint genes (PD-L1, CTLA4, HAVCR2, IDO1, PD1, PD-L2, and TIGIT) were assessed in the high-and low-risk groups (Supplementary Figure 3B). The patients in the low-risk group were more enriched in these immune checkpoint genes (all p < 0.05) and therefore might be more responsive to immunotherapy.

Relationship Between Immune Infiltration and the Prognostic Signature
To further explore whether CD4 + MTRGs reflected the status of GC TME, an association analysis was used to evaluate the relationship between CD4 + MTRGs in the prognostic model and immune cell infiltration. The risk factors based on the model were positively associated with naive B cells (r = 0.296, P < 0.001), activated mast cells (r = 0.137, P = 0.005), M2

Construction and Validation of CD4 + MTRG-Clinical Nomogram Model
Several clinicopathological variables are independent features in GC patient prognosis; these include tumor site, tumor size, and TNM stage (20,21). Therefore, we used clinicopathological variables and the risk score to construct a nomogram to obtain a more accurate prediction of GC prognosis. First, we used univariate and multivariate Cox regression analysis to identify three independent OS factors: Lauren histologic type, stage, and risks score ( Figure 8A). A CD4 + MTRG-clinical nomogram was then developed based on these independent factors for OS are shown in (Figure 8C) Figure 8B). The C-index was 0.73 (95% CI: 0.697-0.763) for the OS nomogram. We also established calibration curves and the DCA of the nomogram at 1, 3, and 5 years ( Figures 8D, E). These results presented high credibility.

DISCUSSION
Gastric cancer is a prevalent malignant tumor with high recurrence. The prognosis of GC exhibits a wide range, from less than 5 months to over 10 years (1,22). Precise prognosis prediction and risk stratification can help to determine which patients would benefit from more radical treatment, such as immunotherapy. The efficacy of adjuvant therapy for GC was controversial for many years until Macdonald and Smalley et al. (23,24). found that postoperative chemotherapy and chemoradiotherapy was a rational standard therapy strategy for GC. Unfortunately, due to tumor heterogeneity, the prognosis can vary widely among patients of the same GC stage who undergo the same adjuvant therapy (25,26).  1 years = 0.774 (68.75%−86.1%)   AUC at 3 years = 0.774 (71.14%−83.68%)   AUC at 5 years = 0.806 (74.31%−86.8%    Therefore, a sensitive and reliable prognostic signature is desired to identify patients who might benefit from adjuvant treatment. Tumor progression develops in the complex tissue microenvironment on which they rely to support their proliferation, infiltration and metastasis (27)(28)(29). Different from cancer cells, stromal cell types of TME are stable at the genetic level, which indicates that the TME is a potential therapeutic target (30). Furthermore, accumulating research has confirmed that the TME plays a significant role in predicting patient prognosis (31)(32)(33). Owing to this particular insight of the TME, this study selected key genes by screening for gene modules that were significantly correlated with prognosticrelated immunocytes. Immune subtypes were then identified and classified, and a risk score model was established on the basis of   10 CD4 + MTRGs in the key gene module to evaluate patient prognosis. The efficacy of this prediction strategy was proved in two internal datasets. Nomograms can fit several independent prognostic factors, including molecular and clinicopathological features, which are widely used to assess prognosis in clinical oncology (34). With the ability to calculate an individual numerical probability of a clinical event, nomograms may be more effective than individual prognostic factors for prognosis prediction. Three immune molecular subtypes were identified in the present study, cluster 1/2/3, by unsupervised clustering analysis based on the expression of the most aberrant immune genes belonging to the yellow module. There were significant differences among the three immune subtypes, including differences in immune status, biological processes and prognosis. Patients in the cluster 1 subtype with the best prognosis suffered from a hyperactivated immunocompetent status. Functional enrichment analyses were applied to the DEIRGs among the three subtypes to elucidate underlying mechanisms. The DEIRGs were found to be significantly associated with T cell activation and cytokinecytokine receptor interaction, consistent with previous reports. CD8 + cytotoxic T cells bind MHC I-presented antigens; these antigens enable the T cells to target tumor cells. Additionally, CD4 + T cells have complex and important biological functions in the TME. Clinical research has indicated that combined immunotherapy strategies are effective in treating metastatic cancers because they promote T cell activation (35). Gastric cancer is associated with immune system evasion. Immune checkpoints are inhibitory pathways that can prevent tissue damage by controlling the intensity of the physiological immune response. Especially when the immune system is fighting an infection, these inhibitory pathways are essential for maintaining self-tolerance and physiological homeostasis. In addition, immune checkpoint pathways may also cause immune escape of cancer cells (36). We evaluated the distribution of immune checkpoint gene expression among the three clusters. The subtype with the highest expression of programmed death ligand 1 (PD-L1) provided the best prognosis, and a similar conclusion was obtained in previous research (37,38). However, others have indicated that programmed cell death protein-1 (PD-1)/PD-L1 expression in cancer cells is significantly associated with poor prognosis (39,40), and Kawazoe et al (41) found that PD-L1 had no effect on gastric cancer prognosis. To achieve precision therapy, a larger cohort and more controlled research should be applied to clarify the function of PD-L1 in GC.
A novel CD4 + MTRG signature that can predict OS in GC patients was constructed through the use of univariate Cox regression analysis and LASSO regression analysis. This CD4 + MTRG signature was an independent prognostic factor of GC. Patients with a high immune risk score had significantly poorer outcomes than those in the low-risk group. IL-7, CTSW, NR1H3 and CCR8 were downregulated; these were considered to be protective genes. LY86, RABGEF1, CYFIP2 and SERINC3 were upregulated and were related with a poor outcome. IL-7 has been found to play an anti-tumor role in melanoma (51). By contrast, others have shown that IL-7 might have a pro-tumor function. By limiting p27kip, IL-7 was shown to promote lung cancer proliferation (52) and accelerate bladder cancer invasion and migration (53). Nathan suggested that blocking the CCR8-CCL1 interaction, alone or combined with other immune checkpoint inhibitors, was a therapeutic strategy for malignant diseases (54). However, the role of IL-7 and CCR8 in gastric cancer development has not yet been clarified. Furthermore, there are few studies of the other eight genes in the ten-gene signature. The potential biological function of these genes in GC requires further clarification through experimental research.
A nomogram is a practical and intuitive evaluation approach. Here, we established a nomogram with meaningful AUC values based on the expression levels of genes in a selected panel. The nomogram was used to evaluate the deterioration and outcome of patients in this study, which is more economical and clinically practical than whole-genome sequencing. DCA and calibration curves showed the efficacy of this nomogram. A ten-gene signature and clinicopathological variables were integrated into the graphical scoring system, which was easy to understand. This scoring system could be used to facilitate individual treatment and determine a medical strategy. To be the best of our knowledge, this is the first study to report the feasibility and accuracy of a risk assessment model based on the identified ten CD4 + MTRGs for predicting GC prognosis. These findings provide novel ideas for risk assessment in gastric cancer patients.
Despite the significant results obtained in this study, several limitations must be acknowledged. First, all data series were obtained from the GEO database in which the population distribution is mainly Caucasians, Africans, and Latinos. Therefore, caution should be taken in extrapolating the findings to patients with Asian heritage. In addition, several clinical features were not available from these databases, including severity, complications, and details of the individual treatment of each patient; therefore, we did not conduct a longitudinal analysis. Furthermore, it will be necessary to validate the tengene signature using external datasets. Finally, as a retrospective bioinformatic analysis study, the potential functional mechanisms of these molecular subtypes and CD4 + MTRGs need to be further verified in basic experimental research and clinical trials.