Relationship Between CNVs and Immune Cells Infiltration in Gastric Tumor Microenvironment

Gastric cancer (GC) is a highly fatal and common malignancy of the digestive system. Recent therapeutic advancements have significantly improved the clinical outcomes in GC, but due to the unavailability of suitable molecular targets, a large number of patients do not respond to the immune checkpoint inhibitors (ICI) therapy. To identify and validate potential therapeutic and prognostic targets of gastric cancer, we used the “inferCNV” R package for analyzing single-cell sequencing data (GSE112302) of GC and normal epithelial cells. First, by using LASSO, we screened genes that were highly correlated with copy number variations (CNVs). Therefrom, five gene signature (CPVL, DDC, GRTP1, ONECUT2, and PRSS21) was selected by cross-validating the prognosis and risk management with the GC RNA-seq data obtained from GEO and TCGA. Moreover, the correlation analyses between CNVs of these genes and immune cell infiltration in gastric cancer identified CPVL as a potential prognostic marker. Finally, CPVL showed high expression in gastric cancer samples and cell lines, then siRNA-mediated silencing of CPVL expression in gastric cancer cells showed significant proliferation arrest in MGC803 cells. Here, we conclude that CNVs are key regulators of the immune cells infiltration in gastric TME as well as cancer development, and CPVL could potentially be used as a prognostic and therapeutic marker in gastric cancer.


INTRODUCTION
Gastric cancer is categorized as the fourth most common and third cause of cancer-related mortality worldwide (Bray et al., 2018). In recent years, the survival rate in many cancers has been improved due to the application of neoadjuvant chemotherapy, but in GC it is still worst because of its diagnosis at advanced stages, drug resistance, recurrence (Pernot et al., 2015;Ilson, 2017;Zurleni et al., 2018), high tumor heterogeneity, and poor immune cell infiltration in TME, which seriously hinder the prognosis and therapeutic outcomes (Meacham and Morrison, 2013;Weinberg, 2014;Van Cutsem et al., 2016). Therefore, it is crucial to thoroughly explore the heterogeneity of the GC, the mechanism of immune cell infiltration, and the new therapeutic targets for gastric cancer.
Currently, transcriptomic analyses revealed a variety of biomarkers that provide a base to determine the complexity and heterogeneity of tumors and identify new therapeutic targets (Cieslik and Chinnaiyan, 2018). The large-scale copy number variations (CNVs) through single-cell RNA-seq (scRNA-seq) have distinguished malignant cells from the normal cells (Patel et al., 2014;Müller et al., 2016;Tirosh et al., 2016). In general, the CNVs may regulate the function of somatic cells thus it is useful to use as the potential tumorigenesis markers (Beroukhim et al., 2010) and may affect immunotherapy (Kacew et al., 2019). On the other hand, the regulation of immune cell infiltration in TME plays an important role in the occurrence, progression, therapeutics and prognosis of cancers including GC (Siegel et al., 2017;Waniczek et al., 2017). However, the role of CNVs in immune cell infiltration and tumor development specifically in gastric cancer is not well illustrated.
Certainly, the scRNA-seq is an effective method for analyzing the heterogeneity of the complex biological systems such as TME (Chung et al., 2017;Cochain et al., 2018). Therefore, we quantified CNVs from scRNA-seq data of the gastric cancer and surrounding normal tissues to differentiate malignant cells from the normal epithelial cells. The differentially expressed genes (DEGs) in GC tissues were screened by the least absolute shrinkage and selection operator (LASSO) Cox regression model (McEligot et al., 2020). The expression of the hub genes and their CNVs in gastric cancer were further assessed for their relationship with immune cell infiltration and prognosis in the gastric tumor. Lastly, we determined the expression of levels of all hub genes in gastric cancer tissues and cell lines, then based on the expression pattern of the genes we selected CPVL to explore its role in cellular proliferation in gastric cancer cell lines.

Data Mining and Processing
The scRNA-seq data of a total of 707 tumors and paired normal cells (GSE112302), and RNA-seq data (GSE84437) of 357 gastric cancer tissues were downloaded from Gene Expression Omnibus (GEO: https://www.ncbi.nlm.nih.gov/geo/) database. The transcriptome sequencing data of 407 individuals (32 normal and 375 gastric cancer tissue samples) along with complete clinical information were downloaded from The Cancer Genome Atlas (TCGA: https://portal.gdc.cancer.gov/). The complete study design and sample information are shown in Figure 1. The Seurat (Loraine et al., 2015) in RStudio (Stuart et al. , 2019) was used to filter and standardize the tumor samples (355 tumor cells in total) in GSE112302 single-cell sequencing data. Genes expressed in less than three samples, and mitochondrial (MT) genes expressing >5% samples were filtered out. Principal component analysis (PCA) and tSNE were used for the dimensional reduction (Satija et al., 2015), and different cells' clusters were identified and annotated according to the marker genes (resolution = 0.5). The raw data of GSE84437 and the gastric cancer data from TCGA were standardized using the "limma" package (Smyth et al., 2005).

Epithelial Cell Reanalysis
The CNVs in macrophages (n = 23) and epithelium (n = 332) were evaluated by "infercnv" R package (Patel et al., 2014). The genomic locations of the differential genes with a high rate of copy number variations (CNVs) were determined and compared their average expression level with their relative expression. The "limma" package was used to find the differentially expressed genes among tumor epithelial cells with high and low CNVs (|log FC|> 2, p < 0.05).

LASSO Regression Analysis
We constructed a prediction model based on the CNV scores of five signature genes by performing a LASSO regression analysis using the "glmnet" package in R. LASSO regression was applied to the module(s) with p-value of cnv-scores less than 0.05 to determine the signature gene, and the analyses were repeated 1,000 times to cross-validate and to select the best lambda value of the hub gene.

Construction of the Prognostic Model
The risk scores of the potential genes from both TCGA transcriptome and GSE84437 data of gastric cancer were calculated by the formula Risk score = βgene1 × exprgene1 + βgene2 × exprgene2 + · ···· + βgene n × exprgene n. By considering the median risk score as a critical value, the samples were divided into high-risk groups and low-risk groups. Overall survival was determined by using R packages "survival" and "survminer" while R package "survival ROC" was used to draw the receiver operating characteristic curve (ROC), and Area Under Curve (AUC) to show the accuracy of the prognostic model. Singlefactor COX analysis and multi-factor COX analysis were used to analyze the relationship between risk scores and clinical characteristics. R packages "rms" were used to build a nomogram model to estimate the factors related to overall survival and to calculate the risk scores. The sum of risk scores was used to predict 3-years and 5-years survival rates.

Immune Relevance of Hub Genes
Kaplan-Meier plotter (http://kmplot.com/analysis/) is an online database for analyzing the survival rate of mRNA or miRNA in various cancers. Using this, we obtained the survival curve of the hub genes in gastric cancer, and their correlation with the infiltration of the immune cells (neutrophils, macrophages, dendritic cells, B cells, and CD4/CD8 T cells) in the tumor microenvironment was analyzed by TIMER (https://cistrome. shinyapps.io/timer/) (Li et al., 2017). The expression levels of the hub genes were also analyzed in the pan-cancer to find the difference between normal tissues and various cancers. The differences in the genes' expression and the relationship of hub genes with immune cell infiltration and immune checkpoints (PDCD1, CD274, PDCD1LG2, and CTLA4) were also determined.

Gastric Cancer Tissue Sampling, RNA Extraction and Q-RT-PCR Analyses
A total of seven gastric cancer tissue and adjacent normal tissue samples were obtained from patients undergoing partial gastrectomy at the Fifth Affiliated Hospital of Zhengzhou University. Informed written consent was obtained from all patients before recruitment into the study, and all sampling and experimental procedures were approved by the ethics committee of the fifth affiliated hospital of Zhengzhou University. Fresh tissues were washed thrice with PBS, crushed and treated with TRIzol (Invitrogen, Carlsbad, United States) for RNA extraction. The cDNA samples were prepared by using the Rever Tra Ace qPCR RT Kit (Osaka, Japan) and mRNA expression was performed using SYBR Premix Ex Taq (Tokyo, Japan) and specially designed primers ( Table 1). GAPDH was used as internal control and the relative transcriptional expression level was calculated according to the 2^ΔCt approach.

Correlation of CPVL With Immune and Matrix Scores
The "Estimate" package was used to evaluate the immune score and matrix score in pan-cancer. The correlation between CPVL, immune, and matrix scores was calculated by spearman text and the "ggplot2" package was used to draw scatter plots.

siRNA-Mediated Gene Knockdown
The siRNA targeting CPVL and scrambled negative control siRNA were provided by Shanghai GenePharma (Shanghai, China), the sequences of siRNA are listed in Table 1. The MGC803 cells were transfected with siRNA using Lipofectamine 2000 (Invitrogen, Carlsbad, CA, United States) at a final concentration of 80 nM. The efficiency of siRNA knockdown was subsequently confirmed using qPCR.

Cell Proliferation Assay
Cell Counting Kit-8 (DOJINDO, Kumamoto, Japan) was used according to the manufacturer's instructions. After 24-48 h of transfecting cells with siRNA-CPVL,~5,000 cells were seeded into each well of the 96-well plate, followed by adding 10 μL of

Primer Name
Sequence

The Prediction of CPVL Related miRNA and Upstream lncRNA
The StarBase (http://starbase.sysu.edu.cn/) is an online database for studying correlation and predicting the regulatory relationships between mRNA, miRNA, and lncRNA in tumors (Li et al., 2014). First, we predicted the candidate miRNAs associated with CPVL by taking |R|> 0.1 and p-value <0.05 as a baseline, however, miRNAs showing higher significance were chosen and validated by qPCR in gastric cancer tissues and cell lines.

Statistical Analysis
R package "Seurat" and "infercnv" were used to perform quality control on single-cell sequencing data, and to filter and calculate cell cnv-scores. The expression levels of genes in transcriptome data were normalized by log2 transformation. COX regression analysis was used to establish the relationship between gene expression and risk score, R package "survival ROC" for survival prediction models, and the "rms" package was used to build nomogram models. The "Estimate" package was used to evaluate the immune score and matrix score of pan-cancer, and the Spearman test was used to calculate the correlation between CPVL and the immune score and matrix score. p < 0.05 was considered as significant.
The experimental data were statistically analyzed and plotted by GraphPad 8.0 software, and other measurement data were described by means ± standard deviation, all using the student t-test.

CNV and Related Genes Within the Tumor Cell Through scRNA-Seq Analyses
First of all, the quality checks were performed for the scRNA-seq GSE112302 dataset (Supplementary Figures S1A,C), and selected the top 1,500 genes fulfilling our basic selection criteria (Supplementary Figure S1B). For 355 tumor cells, we performed principal component analysis (PCA) and clustered the cells using graph-based clustering (n = 20) (Supplementary Figures S1D-G). Furthermore, we annotated the cells including epithelial cells and macrophages showing marker genes (Figures 2A,B). It has been known that the transformation of normal cells to cancer cells is closely related to large-scale chromosomal aberrations (Beroukhim et al., 2010). Therefore, the CNVs in different genes were analyzed to differentiate between normal epithelial cells and cancer cells, and we found that cluster2 and cluster3 have comparatively low CNVs than other clusters 0, 1 and 4 ( Figure 2C). Furthermore, we analyzed the differences between the tumor cells expressing high and low CNVs, showing a large number of differential genes in both groups ( Figures 2D,E). These results indicate that CNVs are one of the key contributors to GC heterogeneity.

Hub Genes Correlated With CNV and Prognosis
To find out the role of hub genes in the overall survival of gastric cancer we performed a multivariate Cox regression analysis, among 21 hub genes, the eight genes (KCNJ3, CPVL, ONECUT2, SLC19A3, DDC, PRSS21, F12, and GRTP1) were designated as independent indicators of poor prognosis. Based on these findings we constructed a prediction model for the clinical prognosis of GC patients and validated it by the ROC curve [0.625 as an area under the curve (AUC)] ( Figures 4A,B). Next, the survival information of hub genes and clinical features of the patients were combined to construct the nomogram ( Figure 4C). The total score of each prognostic factor was represented as 1, 3, and 5-years survival rates of the patients. To find out whether the risk signatures were independent predictors of prognosis, the Cox univariate analysis was performed showing that the age (p = 0.033), stage (p = 0.002), N stage (p = 0.022), and risk score (p < 0.001) were significantly linked with OS. While in Cox multivariate analysis only age (p < 0.001) and risk score (p < 0.001) were significantly linked with OS ( Figures 4D,E).
Using the GSE84437 dataset, we identified five genes (CPVL, ONECUT2, DDC, PRSS21, and GRTP1) as independent indicators of poor prognosis. Based on that we constructed and validated a prediction model for clinical prognosis, ROC curve (AUC = 0.626) ( Figures 4F,G), and the relationship between multiple factors and survival rate was analyzed by nomogram ( Figure 4H). The single factor analysis in GSE84437 showed that age (p = 0.041), T stage (p < 0.001), N stage (p < 0.001), and risk score (p < 0.001) were significantly associated with OS. However, the Cox multivariate analysis showed a significant association between the T3-4 stage (p < 0.001), N1-3 stage (p = 0.002), and risk score (p < 0.001) with OS ( Figures 4I,J). A venn diagram was drawn for these five survivalrelated genes (Supplementary Figure S2A), expressed in all cell clusters (Supplementary Figure S2B). Next, we plotted the relationship between the top five hub genes and survival in gastric cancer on the Kaplan-Meier plotter (http://kmplot.com/ analysis/) ( Figure 5A) showing a significant correlation between low expression of CPVL, ONECUT2, and PRSS21 with prolonged survival of the patients. On the other hand, the low expression of GRTP1 showed an association with poor prognosis, and the expression of DDC did not affect the prognosis of GC patients.

Correlation Between Hub Genes and Immune Cells Infiltration
The copy number variations are one of the well-known factors affecting the infiltration of immune cells in tumor microenvironment (Campos-Carrillo et al., 2020). Therefore, we studied the correlation between five hub genes and the infiltration of immune cells such as neutrophils, macrophages, dendritic cells, B cells and CD4/CD8 T cells in GC TME. Among all genes, CPVL and GRTP1 showed a significant positive and negative correlation with immune cell infiltration, respectively. Other genes did not affect the infiltration of immune cells in GC TME ( Figures 5B-F).
PD1/PD-L1 and CTLA-4 are important immune checkpoints for tumor immune escape, playing significant roles in tumor immunotherapy (Guo et al., 2021). Among hub genes, only CPVL was found to have a positive and ONECUT2 negative correlation with the immune checkpoints (Supplementary Figure S3).

The Relationship Between Hub Genes, Copy Number Variation and Immune Cell Infiltration
Next, we explored the correlation between the expression of the genes and their CNVs with immune cell infiltration in gastric TME. For this purpose, we analyzed the somatic mutations of the hub genes through TIMER [deep deletion (-2), arm-level deletion (-1), diploid/normal (0), arm-level gain (1), and high amplification (2)] and the infiltration of the multiple immune cells. Interestingly, we observed that the copy number variations (CNVs) in hub genes significantly affect the infiltration of immune cells (Figures 6A-E). Apart from DDC, all other genes including CPVL, ONECUT2, PRSS21, and GRTP1 positively regulate the infiltration of the immune cells. These findings revealed that CNVs are also major players in the immune regulation of the gastric cancer TME.

Effect of CPVL on Proliferation of Gastric Cancer Cells
We performed the expression of five hub genes in gastric cancers and normal tissue samples which showed significantly high expression of CPVL in cancer samples, and cell lines ( Figures  7A-C) which was consistent with the RNA sequences obtained from TCGA. To further explore the role of CPVL in gastric cancer, two different siRNAs were used to silence the expression of CPVL in MGC803 cells ( Figure 7D). Subsequently, we examined the effect of the CPVL-knockdown on the proliferation of MGC803 cells. The results showed a significant reduction in the proliferation of MGC803 cells after silencing CPVL ( Figure 7E).

Relationship of CPVL With Immunity in Pan-Cancer
Among five hub genes, we focused on CPVL because it showed the highest degree of correlation with immune cell infiltration and immune checkpoints PD-1/PD-L1 and CTLA-4 in gastric cancer. Thus, we comprehensively explored the CPVL in pan-cancer, and its high expression showed a strong association with the poor prognosis in a variety of cancers (Figure 8). Total out of 33 pancancers, CPVL was found to have high expression in 29 cancers (Supplementary Figure S4), also showing a significant correlation with the immune and matrix scores. These findings show that CPVL has a universal effect on the immune cell infiltration and immune function in TME of various cancers.

The Prediction of CPVL Related miRNA and Upstream lncRNA
In starBase (http://starbase.sysu.edu.cn/), we have obtained four potential upstream miRNAs (hsa-miR-196a-5p, hsa-miR-7-5p, hsa-miR-196b-5p, and hsa-miR-561-5p) correlated with CPVL ( Table 2) and among them, we selected hsa-miR-196b-5p as a potential target because of its highest degree of negative correlation with CPVL. Moreover, we predicted the upstream lncRNAs of hsa-miR-196b-5p, and obtained 68 possible lncRNAs, and five lncRNAs showing a negative correlation with hsa-miR-196b-5p and positive correlation with CPVL in gastric cancer were filtered out (Table 3). Furthermore, the expression levels of selected miRNAs and lncRNAs were determined in gastric cancer cell lines. The results showed that lncRNAs AL158207.2 and AL122035.2 were significantly reduced in gastric cancer samples, while the expression levels of the four miRNAs in gastric cancer cell lines were significantly higher than the normal controls (Supplementary Figure S5). Thus, based on these results, we believe that these lncRNAs and miRNAs could be used to construct ceRNA networks and studying other functions.

DISCUSSION
Gastric cancer is considered as one of the most deadly cancers, ranked as the third leading cause of cancer-associated deaths worldwide (Chen et al., 2014). Exploring the heterogeneity within the tumor can help to deeply understand the microenvironment of gastric cancer (Yan et al., 2018). Recently, it has been validated that CNVs (deletion, insertion or SNPs) in genes are one of the important factors affecting cellular functions (Nakamura, 2009). In a variety of tumors, including gastric cancer, a large number of CNVs are potentially involved in the occurrence and development of tumors (Leary et al., 2008;Despierre et al., 2014;Horpaopan et al., 2015;Xu et al., 2015). A study using 183 gastric cancer samples identified drugs' targeted genes exhibit a high ratio of copy number gains (CNG) (Labots et al., 2014). It is well-known fact that the CNVs play integral roles in the screening of new prognostic FIGURE 8 | The risk rate of CPVL gene in pan-cancer. CPVL is associated with patients' survival in BLCA, KIRC, LCC, LIHC, MESO, SKCM, STAD, UCEC, and UVM (p < 0.05). The hazard ratio greater than 1 indicates a high-risk gene(s), meaning that patients with high expression of such gene(s) have a worse prognosis; the hazard ratio less than 1 indicates a low-risk gene(s) for particular cancer, meaning that patients with high expression have a better prognosis.  targets and individualized treatment for patients based on the heterogeneity of their tumors (Hou et al., 2016). Believing that, single-cell sequencing of tumors could serve as a useful tool for exploring tumor heterogeneity (Hou et al., 2016). In the current study, we considered the CNV-score of the cells to differentiate the malignant cells from epithelial cells in GC. Abnormally high expressing genes with CNV-score in the tumor cells were screened out, and top genes were analyzed for their roles in overall survival in gastric cancer patients. Furthermore, we also verified that the level of CNVs and highrisk scores related to the poor prognosis of gastric cancer patients.
The tumor immune process and the host's immune protection mechanism are divided into three phases: elimination, equilibrium, and escape phases (Lazar et al., 2018). Specifically, at the early stages of tumorigenesis and development, the immune system can kill tumor cells to inhibit the development and progression of cancers. With the progression of the tumor, the surviving cancer cells become smarter and can change some of their characteristics to avoid killers' immune cells thus proliferating and metastasizing. We found that high expression of CPVL, ONECUT2, DDC, PRSS21, and GRTP1 increase the risk of gastric cancer. Interestingly, the expression of CPVL, ONECUT2, PRSS21, and GRTP1 was linked with the higher ratio of CNVs which further suppressed the level of immune cell infiltration. These results partially explain that the low degree of immune cell infiltration in GC may affect the efficiency of ICIs. High expression of ONECUT2 and PRSS21 have been reported to be associated with poorer prognosis in various cancers including prostate cancer, hepatocellular carcinoma, and gastric cancer Pollan et al., 2021;Shen et al., 2021;Sun et al., 2021).
In the current study, CPVL showed higher expression than the other genes, thus we selected it for further analyses. CPVL is a serine carboxypeptidase that was first characterized in human macrophages (Mahoney et al., 2001). Though, the function of CPVL remains unclear in a variety of tumors. So far CPVL has not been deeply studied in gastric cancer. However, CPVL induced apoptosis in glioma cells through the IFN-γ/STAT1 signaling pathway . In gastric cancer cell lines, we showed for the first time that CPVL on cell proliferation, suggesting that CPVL may be a new prognostic target for gastric cancer. Further comprehensive studies are required to explore its deep function and underlying mechanisms in regulation of gstric cancer.

CONCLUSION
The high risk-scores of CPVL, ONECUT2, DDC, PRSS21, and GRTP1 could be used to determine the degree of malignancy and prognosis in gastric cancer patients. The increased expression of CPVL, ONECUT2, PRSS21, and GRTP1 may indicate the advanced stage of GC, as well as a low level of immune cell infiltration in gastric TME. In gastric cancer cell lines, we determined that CPVL regulate the cellular proliferation, showing that CPVL could be an important predictor in gastric cancer development.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

ETHICS STATEMENT
The sampling and data collection procedures from human participants were reviewed and approved by The ethical committee of the Fifth Affiliated Hospital of Zhengzhou University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
FL: Conceptualization of the study and prepared first draft. HW: Analyzed the data, and revised the first draft. IB: Revised and reviewed manuscript, logical reasoning and language corrections. CG, FR, YT, and BL: Drafted the manuscript. PZ and YM: Conceived idea of the study, supervised overall work and reviewed and revised final draft. All authors read and approved the submission of the final manuscript.

ACKNOWLEDGMENTS
We greatly acknowledge the participation of the patients into our study, and doctors and paramedic staff of the fifth affiliated hospital of zhengzhou university for their help in provision of samples. We are also highly thankful to our colleagues who are not listed as authors of this manuscript, for their support in the completion of this study. We also acknowledge the role of institutes and departments involved in this study.