Single-Cell Transcriptome Analysis Reveals RGS1 as a New Marker and Promoting Factor for T-Cell Exhaustion in Multiple Cancers

T-cell exhaustion is one of the main reasons of tumor immune escape. Using single-cell transcriptome data of CD8+ T cells in multiple cancers, we identified different cell types, in which Pre_exhaust and exhausted T cells participated in negative regulation of immune system process. By analyzing the coexpression network patterns and differentially expressed genes of Pre_exhaust, exhausted, and effector T cells, we identified 35 genes related to T-cell exhaustion, whose high GSVA scores were associated with significantly poor prognosis in various cancers. In the differentially expressed genes, RGS1 showed the greatest fold change in Pre_exhaust and exhausted cells of three cancers compared with effector T cells, and high expression of RGS1 was also associated with poor prognosis in various cancers. Additionally, RGS1 protein was upregulated significantly in tumor tissues in the immunohistochemistry verification. Furthermore, RGS1 displayed positive correlation with the 35 genes, especially highly correlated with PDCD1, CTLA4, HAVCR2, and TNFRSF9 in CD8+ T cells and cancer tissues, indicating the important roles of RGS1 in CD8+ T-cell exhaustion. Considering the GTP-hydrolysis activity of RGS1 and significantly high mRNA and protein expression in cancer tissues, we speculated that RGS1 potentially mediate the T-cell retention to lead to the persistent antigen stimulation, resulting in T-cell exhaustion. In conclusion, our findings suggest that RGS1 is a new marker and promoting factor for CD8+ T-cell exhaustion and provide theoretical basis for research and immunotherapy of exhausted cells.

INTRODUCTION T-cell exhaustion (Tex), a hyporesponsive state of T cells with increased inhibitory receptors, decreased effector cytokines, and impaired cytotoxicity, was originally described in CD8+ T cells during chronic lymphocytic choriomeningitis virus (LCMV) of mice (1). In recent years, the phenomenon of Tex has also been found in cancers (2,3), which is one of the main reasons of tumor immune escape (4). It has been reported that exhausted T cells in cancers share many similarities with that in chronic infection (5) and play a significant role in tumorigenesis (6). Studies show that exhausted T cells can be used as one of the main targets of immunosuppression therapy to save T cell from exhaustion and reactivate the cytotoxicity of T cells, providing a new opportunity for clinical immunotherapy (7). Nevertheless, due to the complexity and heterogeneity of cancers, the concrete mechanisms and molecules of T-cell exhaustion in cancers have not been fully elucidated.
Currently, single-cell RNA sequencing (scRNA-seq) has clearly revealed some new mechanisms and phenomena of cancer with the advantages of high accuracy and reproducibility (8)(9)(10). Using single-cell transcriptome profiling, we can identify new types of immune cells which cannot be revealed at the original tissue level and can construct a developmental trajectory for immune cells which can reveal the heterogeneity (11). These new findings are useful to better understand the immune system and its mechanism of action on tumors. Notably, this technology makes it possible to explore complicated tumor microenvironment including tumorinfiltrating lymphocytes (TILs) in melanoma, head and neck cancer, breast cancer, and glioblastoma cancer (12)(13)(14)(15). Thus, using advantage of scRNA-seq to analyze T cells and obtain the hallmarks of exhausted T cells can bring a new therapeutic strategy on clinical cancer treatment.
Due to the vital role of CD8+ T cells in eliciting antitumor responses (16), we integrated single-cell transcriptome data from colorectal cancer (CRC), hepatocellular cancer (HCC), and nonsmall cell lung cancer (NSCLC) to analyze CD8+ T cells in various cancers in the present study. Focusing on CD8+ T-cell exhaustion-associated clusters, we identified RGS1 as a new marker and promoting factor for T-cell exhaustion in multiple cancers with poor prognosis and showed highly positive correlation with the well-known genes associated with T-cell exhaustion. RGS1 protein highly expressed in tumor tissues was also verified in the immunohistochemistry (IHC) experiment. Our findings could facilitate in understanding the mechanism during the formation and development of T-cell exhaustion and provide theoretical basis for research and immunotherapy of exhausted cells.

Data Acquisition
The single-cell gene expression matrices including raw count and TPM data were obtained from the GEO database: GSE108989 (CRC), GSE99254(HCC), and GSE98638(NSCLC), and we isolated CD8+ T cells from peripheral blood (P), adjacent normal (N), and tumor tissues (T).

Quality Control and Data Processing
The raw count expression matrices were processed by R package Seurat v3.2.0 (http://satijalab.org/seurat/). To filter out the lowquality cells, we excluded cells with <600 and >10,000 detected genes (17). Counts were log normalized and scaled by linear regression against the number of reads with function NormalizeData and ScaleData. The highly variable genes (HVGs) were generated with FindVariableFeatures. Principal component analysis (PCA) was performed on the top 2,000 HVGs using function RunPCA. The appropriate PCs were selected for graph-based clustering with functions FindNeighbors and FindClusters. For visualization of clustering analysis, we performed uniform manifold approximation and projection for dimension reduction (UMAP) using RunUMAP function in Seurat. To eliminate the obvious effect from different patients, we performed standard normalization and variable feature selection after acquiring the data. Next, the function FindIntegrationAnchors was performed to find a set of anchors, which is used to integrate the data by the function IntegrateData. Also, the function AddModuleScore was used to calculate scores of gene list in different cell types.

Cell Type Annotation
Differentially expressed genes (DEGs) of each cluster were identified based on Wilcoxon rank-sum test using function FindAllMarkers compared with the rest of the clusters. In brief, for each cluster, only genes that met these criteria were considered cluster-specific DEGs (1): log2FC >0.25; (2) expressed >25% in either of the two groups of cells; (3) adjusted p-value <0.05. The top DEGs were selected to annotate each cluster based on the canonical markers from previous studies; also, the CellMarker databases (18) and R package SingleR v1.4.0 (19) were performed to further improve the accuracy of the annotation.

Trajectory Analysis
To explore the potential functional changes of CD8+ T cell of different clusters for each cancer, we performed development trajectory analysis by R package Monocle v2.18.0 (20) with the cluster-specific genes of each cluster. Dimensional reduction and cell ordering were performed using reduceDimension and orderCells functions with default parameter.

Weighted Gene Coexpression Network Analysis
In order to identify the highly linked genes in specific clusters, weighted gene coexpression network analysis (WGCNA) was performed with functions in the R package WGCNA (23). To attenuate the effects of noise and outliers, we constructed pseudocells (24) which were calculated as averages of 10 cells randomly chosen within each cluster. The function pickSoftThreshold was used to calculate the soft power parameter and blockwiseModules to construct coexpression network. Finally, corPvalueStudent was used to mine modules related to specific cell types.

Survival Analysis
According to the median of gene expression values, cells were divided into high and low groups, then the survival curve was shown using the Kapla-Meier curve with a log-rank test by GEPIA2 (http://gepia2.cancer-pku.cn/) to illustrate the relationship between differential genes and overall patient survival. The terms with p-value <0.05 were identified as significant. Additionally, multiple hypothesis testing (FDR) was performed to the significant p-value using the function p.adjust.

Gene Set Variation Analysis
For generated gene set, we performed gene set variation analysis (GSVA) to the data in the present study and validation dataset using R package GSVA (25) with the default parameter, to evaluate the effect of distinguishing Tex cells.

The Basic Expression of mRNA and Protein in Normal and Cancer Tissues
The RNA-seq data were downloaded from the TCGA database (https://portal.gdc.cancer.gov/) and then calculated into TPM value. According to the annotation, samples were grouped into normal and four stages of cancer groups. The differential expression of mRNA in the different groups was performed by Wilcoxon test with p-value <0.05.
The protein expression level in normal and cancer tissues was analyzed using the Human Protein Atlas (HPA) database (https://www.proteinatlas.org). Immunohistochemistry pictures were downloaded from the Tissue Atlas and Pathology Atlas.

Immunohistochemistry
Immunohistochemistry was carried out on human liver cancer tissue microarray (Shanghai Outdo Biotech Co., Ltd., Shanghai, China). The tissue sections were first dried at 63°C for 1 h, dewaxed, and rehydrated before epitope retrieval by heating at 100°C in 10 mM sodium citrate (pH 6.0) for 5 min in EDTA solution for 20 min. The sections were cooled down to room temperature for 30 min. The tissue sections were treated with 3% hydrogen peroxide for 20 min to eliminate the endogenous peroxidase and alkaline phosphatase activity in the tissue. After cooling down to room temperature, the sections were treated by blocking agents for 10 min. The sections were incubated with individual primary antibody (RGS1, Invitrogen, Waltham, MA, USA; Product #PA5-86730) diluted 1:1,000,800 overnight at 4°C, followed by secondary antibodies at room temperature for 30 min. 3,3′-Diaminobenzidine (DAB) was then applied as a substrate to reveal the antigen. Hematoxylin was used for counterstaining. The stained images were counted by ImageJ software (26), and the optical density (OD) value was used for quantification.

Single CD8+ T-Cell Transcriptome Landscape
We obtained single-cell transcriptome data of human T cells from the GEO database, including 12 patients from CRC (27), six patients from HCC (28), and 14 patients from NSCLC (29). After strict quality control and filtration, we collected 4,010, 1,752, and 4,439 CD8+ T cells from peripheral blood (P), adjacent normal (N), and tumor tissues (T) (Supplementary Figures S1A, B; Supplementary Table S1).
To further explore the function of each cluster, we performed GO enrichment analysis using the cluster-specific genes ( Figure 1D). In all three cancers, Tn cells were enriched in Tcell activation, T-cell differentiation, and ribosome biogenesis; Teff cells were enriched in cellular defense response, positive regulation of cytokine production, and T-cell-mediated cytotoxicity; Tex cells were enriched in negative regulation of immune system process, T-cell apoptotic process, and response to hypoxia; Trm cells were enriched in antigen receptormediated signaling pathway and leukocyte chemotaxis; MAIT cells were enriched in interferon-gamma production; and Pre_exhaust cells were enriched in terms associated with Teff and Tex cells, including cellular defense response and negative regulation of immune system process, showing the transitional characteristics during T-cell exhaustion.

Establishment of Coexpression Network
Tex cells, as shown above, played a negative role in immune system process, and it was suggested that Pre_exhaust cells were a transitional stage from Teff to Tex cells. To find out the highly linked genes associated with T-cell exhaustion, we used the R package WGCNA to construct the weighted coexpression network in Teff, Pre_exhaust, and Tex cells (Figure 2A Table S3), 35 genes were found in three Tex cellrelated modules and overexpressed in Tex cells ( Figure 2B; Table 1), which were defined as "Candidate" gene set, including exhaustion marker PDCD1, CTLA4, HAVCR2, TOX, and TIGIT. It is noteworthy that the housekeeping gene glyceraldehyde-3phosphate dehydrogenase (GAPDH), which can catalyze an important energy-yielding step in glycolysis metabolism, was also upregulated and enriched in the highly linked DEGs, thus we speculated that the glycolysis progress was disordered in Tex cells.
As shown in Figure 2C, poor overall survival was correlated with higher Candidate gene set expression in multiple cancers including liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), head and neck squamous cell carcinoma (HNSC), kidney chromophobe (KICH), skin cutaneous melanoma (SKCM), thymoma (THYM). In addition, the GSVA scores of Candidate gene set in Tex cells were much higher than those of the other clusters in CRC, HCC, and NSCLC, so were in the other independent validation datasets ( Figure 2D). GSE116390 dataset (30) was composed of four types of tumor-infiltrating CD8+ T cells, which were exhausted, memory-like, naïve, and effector memory-like (EM-like) subsets, from B16 melanoma tumor-bearing mice, and GSE123813 dataset (31) contained six types of tumor-infiltrating CD8+ T cells, which were from 11 patients with advanced basal cell carcinoma, showing higher score in the exhausted related clusters. As indicated above, the Candidate gene set was enriched in exhausted CD8+ T cells with poor prognosis and was able to distinguish Tex cells from the other CD8+ T cells in different cancers, indicating that the GSVA score of these 35 genes might be an effective prognostic marker or a marker to identify Tex cells.  Figure 3A), which may contribute to the origin of T-cell exhaustion, including the canonical exhaustion marker PDCD1 which encodes an inhibitory receptor (32), CD69 with the capability to mediate the cell retention (33), Cbl Proto-Oncogene B(CBLB) whose deletion can inhibit CD8+ T-cell exhaustion (34), hypoxia-inducible factor-1 (HIF1A) which can stably be expressed in hypoxia condition, consistent with the biological process of Tex cells ( Figure 1D). Obviously, regulator of G protein signaling 1 (RGS1) showed the greatest fold change in Tex cells across three cancers and also across different patients, showing the greatest fold change in Pre_exhaust cell in HCC and NSCLC ( Figure 3A; Supplementary Figure S2), illustrating its potential roles during T-cell exhaustion. RGS1 was expressed highly in whole Tex cells compared with Teff cells (Figure 3B), eliminating that the high fold change of RGS1 in Pre_exhaust and Tex cells was not caused by partial cells with abnormally high expression value but high expression in whole cells. In order to evaluate the role of RGS1 in tumorigenesis, we analyzed the expression levels of RGS1 between tumor and normal tissues in the TCGA database ( Figure 3C). RGS1 expression was significantly upregulated in multiple cancers including BRCA, cholangiocarcinoma (CHOL), esophageal carcinoma (ESCA), glioblastoma multiforme (GBM), HNSC, kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), LUAD, rectal carcinoma (READ), and stomach adenocarcinoma (STAD) when compared with the normal samples. We also added and compared the expression levels of RGS1 across different cancer stages and found that RGS1 expression was associated with stage in some cancer types, such as RGS1 expression was higher in stages II, III, and IV vs. stage I in STAD. This result revealed that RGS1 was likely a key tumorigenesis regulator in multiple cancers and may be associated with prognosis. Conspicuously, the prognosis analysis was analyzed in 33 TCGA cancer types (Supplementary Table S4). RGS1 expression was significantly correlated to poor prognosis in seven cancers, including LIHC, adrenocortical carcinoma (ACC), pancreatic adenocarcinoma (PAAD), sarcoma (SARC), SKCM, STAD, and THYM ( Figure 3D), suggesting that RGS1 was a potential prognostic factor in the survival of the above cancers. Apart from that, the protein level of RGS1 in HPA database showed the immunohistochemical (IHC) staining of RGS1 was negative staining in normal tissues and positive in liver cancer tissues, demonstrating that RGS1 was significantly expressed in cancer tissues than in normal liver tissues ( Figure 3E). Additionally, we performed IHC verification to quantify the protein expression of RGS1 using the local clinical samples of liver cancer and normal tissues ( Figures 3F, G), similarly, RGS1 protein displayed stronger staining in hepatocarcinoma, in line with the statistical result (p = 7.1e−5). The clinicopathological information of the patient samples and protein expression value are provided in Supplementary Table S5. These results showed that RGS1 was highly expressed in Tex cells in cancers, upregulated in tumor tissues in mRNA and protein level, and with poor prognosis in multiple cancers, which indicated its potential key role in T-cell exhaustion or cancer progress, and RGS1 might be an effective prognostic marker or a marker to identify Tex cells.

Correlation Between RGS1 and Candidate Gene Set
To discover the relationship between RGS1 and Candidate gene set as their potential roles in exhausted T cells, we calculated the correlation coefficient in single cell and tissue level (Figure 4). In three-cancer single-cell datasets, it was obvious that RGS1 showed positive correlation with 35 genes (up to 0.3~0.8), indicating that consistency of coexpression patterns between these genes. In the bulk RNA sequencing datasets, the correlation coefficients were almost positive, especially high with PDCD1, CTLA4, TNF receptor superfamily member 9 (TNFRSF9), HAVCR2, TOX, and TIGIT, which further confirmed the potential key roles of RGS1 in Tex cells. The negative correlations were mainly with the One interpretation of the above inconsistent result is that the bulk transcriptome change (which mixes many immune and nonimmune cells together) likely reflects an overall expression in these cells and does not discriminate specific T-cell states, which further highlights the advantages in defining and studying T-cell exhaustion state by using single-cell sequencing data.

DISCUSSION
T-cell exhaustion is characterized by loss of effector functions, continuously high expression of numerous inhibitory receptors, epigenetic and transcription profile changes, and dysregulated metabolism. Exhaustion CD8+ T cells are associated with suppressive immune microenvironment and poor overall survival in various cancer types, such as, in invasive bladder cancer (35) and clear-cell renal cell carcinoma (36). In addition, increased exhausted CD8+ T-cell subpopulations predict PD-1 blockade resistance response in melanoma (37). Accumulating evidences support exhausted T cells are possible to be rescued in cancer immunotherapy. Anti-PD1 antibodies, including atezolizumab and nivolumab, can renew the activity of exhausted CD8 T cells through preventing PD-1-mediated attenuation of proximal TCR cascades (38,39) and can affect metabolic reprogramming to reinvigorate T cells (40). However, one study found an association between increased accumulation of one CD8+ T-cell exhaustion phenotype and clinical benefit, suggesting exhausted T cells may comprise heterogenous cell population with distinct responsiveness to intervention and the standard definition of exhaustion cells is unclear in the context of treatment (41). Thus, understanding molecular mechanism of Tcell exhaustion and comprehensively exploring potential markers associated with T-cell exhaustion is essential to precisely define T-cell exhaustion and establish rational immunotherapeutic interventions.
In this study, combining with single-cell RNA sequencing, which can facilitate to detect the transcriptome on the level of single cell (42), we can shed light on the complication of tumorinfiltrating T cells. In order to explore the key genes associated with T-cell exhaustion in multiple cancers, we performed transcriptomic analysis of single CD8+ T cells isolated from three cancers, including CRC, HCC, and NSCLC and identified different cell types, thereunto, Pre_exhaust and Tex cells overexpressed exhaustion markers and enriched in the negative regulation of immune progress. In the comparison with Teff cells, RGS1 showed almost the greatest fold change in Pre_exhaust and Tex cells of three cancers with poor prognosis and displayed highly positive correlation with the well-known genes associated with T-cell exhaustion.
In the WGCNA analysis, we identified a Candidate gene set consisting of 35 DEGs, including exhaustion markers such as PDCD1, CTLA4, HAVCR2, TOX, and TIGIT. Apart from that, genes involved in cell cycle and DNA replication also included, MCM7, MCM5, MCM3, PCNA, stathmin 1 (STMN1), titin (TTN), and TBC1 domain family member 4 (TBC1D4), suggesting that exhausted cells still retained the ability of proliferation, which was also observed in chronically infected models (43). Functionally, the Candidate gene set was able to distinguish Tex cells from the other subtypes of CD8+ T cells in different cancers, and higher GSVA scores of Candidate gene set showed poor prognosis in multiple cancers.
In addition, several DEGs appeared in Pre_exhaust and Tex cells, suggesting the role in the formation and development of Tcell exhaustion, in which RGS1 showed almost the greatest fold change. RGS1 encodes a member of the regulator of G-protein signaling family, which can act as a GTPase-activating protein (GAP), increasing the rate of conversion of the GTP to GDP, driving G-protein into its inactive GDP-bound form, hence attenuating or turning off G-protein-coupled receptor signaling   (44). RGS1 is highly expressed in immune cells including T cells (45), B cells (46), natural killer (NK) cells (47), dendritic cells (48), and monocytes (49), suggesting a role for RGS1 in immune cell regulation. RGS1 inhibits the chemokine-induced lymphocyte migration (50) because chemokine-dependent activation of G-protein-coupled receptors can cause the activation of heterotrimeric G-protein subunits resulting in enhanced cell migration and adhesion (51), which has been found in Treg cells (45). In the present study, RGS1 was highly expressed in tumor tissues and correlated with shorter overall survival, which also appeared in several previous studies, including multiple myeloma (52), melanoma (53,54), nonsmall cell lung cancer (55), gastric cancer (56), diffuse large B-cell lymphoma (57), and so on. However, the role of RGS1 in CD8+T cells especially in Tex cells has not been reported. In addition, RGS1 protein, located at the cytoplasm and membrane, is enriched in tumor tissues compared with normal tissues according to the IHC staining in the HPA database and verification experiment, further verifying its pathogenicity.
Considering the ability to block cell migration of RGS1, we speculate that RGS1 can mediate the cell retention to lead to the persistent antigen stimulation of T cells, which resulted in T cell exhaustion with the overexpression of inhibitory genes such as PDCD1 and HAVCR2 (58). Additionally, RGS1 was identified as a HIF-dependent hypoxia target that dampens cell migration and signal transduction (59), indicating its role in exhausted T cells might be caused by hypoxia condition ( Figure 1B). RGS1, the most upregulated gene in Pre_exhaust and Tex cells and a potential marker for T-cell exhaustion, was excluded from the Candidate gene set. This phenomenon happened in other T cell-exhaustion-related genes as well, such as CD69 and CBLB. CD69, upregulated in Pre_exhaust and Tex cells, is an early activation marker of T cells (60). It can mediate the cell retention via the interaction with sphingosine-1-phosphate receptor 1 (S1PR1) which acts as a central mediator of lymphocyte output (61), leading to the persistent antigen stimulation of T cells, which resulted in T-cell exhaustion by overexpression of PDCD1 and HAVCR2 (62). CBLB, also upregulated in Pre_exhausted and Tex cells, whereby its deletion can inhibit CD8+ T cell exhaustion and promote chimeric antigen receptor T-cell function (34). Considering the positive correlation between RGS1 with Candidate gene set in single cells and tissues and its high expression in Tex cells, it is important and necessary to further study RGS1 mechanism in T-cell exhaustion.
In summary, our findings suggest that the GSVA score of the 35 Candidate gene set could be an effective prognostic marker or a marker to identify Tex cells. RGS1, as the most upregulated gene in Pre_exhaust and Tex cells, might play key roles in T-cell exhaustion or cancer progress. As a HIF-dependent hypoxia target, RGS1 might be upregulated by hypoxia, and further mediate the cell retention by inhibiting chemokine-induced lymphocyte migration. The current study could provide theoretical basis for research and immunotherapy of exhausted cells, while further studies are essential to fully elucidate the concrete mechanism of RGS1 during CD8+ T-cell exhaustion.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
All authors contributed significantly to the work and the preparation of the manuscript. HD conceived the study. YB analyzed the data with assistance from ZC and wrote the manuscript. MH collected the data and performed IHC experiment. JW reviewed the manuscript. All authors contributed to the article and approved the submitted version.