Development and Validation of a Prognostic Signature Based on Immune Genes in Cervical Cancer

Background Cervical cancer is one of the most common types of gynecological malignancies worldwide. This study aims to develop an immune signature to predict survival in cervical cancer. Method The gene expression data of 296 patients with cervical cancer from The Cancer Genome Atlas database (TCGA) and immune-related genes from the Immunology Database and Analysis Portal (ImmPort) database were included in this study. The immune signature was developed based on prognostic genes. The validation dataset was downloaded from the Gene Expression Omnibus (GEO) database. Result The immune signature namely immune-based prognostic score (IPRS) was developed with 229 genes. Multivariate analysis revealed that the IPRS was an independent prognostic factor for overall survival (OS) and progression-free survival (PFS) in patients with cervical cancer. Patients were stratified into high IPRS and low IPRS groups, and those in the high IPRS group were associated with better survival, which was validated in the validation set. A nomogram with IPRS and stage was constructed to predict mortality in cervical cancer. Conclusions We developed a robust prognostic signature IPRS that could be used to predict patients’ survival outcome.


INTRODUCTION
With an estimated 570,000 new cervical cancer cases and 311,000 deaths worldwide in 2018, cervical cancer is one of the most common types of gynecological malignancies and ranks as the fourth most frequently diagnosed cancer and the fourth leading cause of cancer-related death in women (1). There are two primary histological types of cervical cancer: cervical squamous cell carcinoma and cervical adenocarcinoma. In less developed countries, cervical cancer among women is the leading cause of cancer death, and nearly 90% of cervical cancer deaths occurred in developing countries (1). Although the incidence is gradually decreasing owing to the identification of HPV as an etiologic factor and the introduction of a specific vaccine, the prognosis of advanced stage disease is extremely poor (2). Various biomarkers, especially genetic markers, have been shown to be closely related to prognosis (3)(4)(5)(6)(7). Thus, identifying patients with poor prognosis and high mortality is an important basis for additional clinical therapy.
The immune system has been found to be a determining factor during cancer initiation and progression and immunotherapy has shown great promise for some cancers. Meanwhile, evidence has shown that immunotherapy plays an important role in cervical cancer because the immune reaction to HPV may inhibit further progression in early-stage cancer (2). Further evidence has preliminarily confirmed that several immune prognostic signatures could be used to predict the prognosis of cervical cancer (8,9). Thus, immune prognostic signatures have therapeutic potential in cervical cancer.
In this study, we aimed to develop a new immune signature with immune prognostic genes. The new immune signature was developed in a training set and validated in a testing set. In an independent cohort, we proved the signature stability and reliability. Further, we used subgroup analysis and sensitivity analysis to validate the new immune signature. Additionally, with the new immune signature and clinical characteristics, we built a nomogram for patient prediction for clinical research.

Study Population and Eligibility Criteria
We collected the cervical cancer gene expression profiles of primary tumor tissue samples from public datasets, including a cohort GSE44001 from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) and another from The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/). Only patients with follow-up duration and status were included. In this study, the outcomes were overall survival (OS) and progression-free survival (PFS). Finally, 596 cervical cancer cases including 296 from TCGA and 300 from GEO were included. The study design is presented in Figure 1.

Gene Expression Data Preprocessing
For the GEO dataset, gene microarray data and clinical information were downloaded from GEO. The missing gene expression was filled by k-nearest neighbors with R package "impute" (10). For the TCGA dataset, RNA sequencing data (FPKM value) of gene expression and clinical information were also downloaded. Then, the gene expression of each gene in the two datasets was transformed into a z-score.

Immune-Related Genes Definition
The comprehensive list of immune-related genes was download from the Immunology Database and Analysis Portal (ImmPort) database (https://immport.niaid.nih.gov), which includes 17 immune categories about molecular function (11). By matching with the variables from GSE44001 and TCGA, a total of 1267 immune-related genes were included in this study.

Development of the Immune-Based Prognostic Signature for Cervical Cancer
We developed the signature called immune-based prognostic score (IPRS) through a two-stage strategy. First, the univariable Cox proportional hazards regression analysis was used to assess the association of 1267 immune genes with OS in TCGA. Those genes with significant prognosis were extracted for further analysis. Then, we defined the immune-based prognostic signature similar to GGI (12): where i is the risky gene whose Cox coefficient is positive, and j is the protective gene whose Cox coefficient is negative.

Validation of the IPRS
To obtain a uniform cut-off value to divide patients into highscore and low-score groups, the cut-off of IPRS was determined by using the " surv-cutpoint" function of the R package "survmier," which repeatedly tested all potential cut-off points to determine the maximum rank statistic (13). Then, the IPRS was further validated in GSE44001.

Functional Enrichment Analyses
To further investigate the gene molecular mechanisms in IRRS, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses was performed by using R package "clusterProfiler". The P values adjusted by False-Discovery Rate (FDR) <0.05 were considered significant.

Construction and Validation of a Predictive Nomogram
We used a multivariate cox proportional hazard model to determine independent prognostic factors that were used to establish a nomogram with the R package "rms" (14). The calibration curves were used to determine whether the nomogram was suitable for clinical use.

Statistical Analysis
Categorical measurements were described as count and percentage, while continuous measurements were presented as mean ± SD. Kaplan-Meier (KM) survival curves were drawn using "survmier" and compared between subgroups using the log-rank test. Multivariate cox proportional hazard models were used to estimate the hazard ratios of variables and determine independent prognostic factors. The C-index was estimated using "survival" in the R package. All statistical analyses were performed in R version 3.3.4 (http://www.r-project.org/). A twosided P<0.05 was considered to indicate statistical significance.

Development and Definition of the IPRS
According to the inclusion criteria, a total of 596 cervical cancer patients including 296 patients from TCGA and 300 patients from GSE44001 were included in this study ( Table 1). In the training set, 143 protective genes and 86 risky genes among the 1267 immune-related genes were associated with OS (Table S1).
Then, we used these genes to develop IPRS whose cut-off was -74.30 to stratify patients into high IPRS and low IPRS groups in the study.

Validation of the IPRS
For the TCGA cohort, the KM curves indicated that the high IPRS group was associated with better OS, while the low IPRS group was associated with poor OS (Figure 2A). After adjusting for age, stage, and grade, IPRS (HR: 4.07, 95% CI: 2.29-7.23) remained an independent prognostic factor in the multivariable Cox model (Table S2). In addition, for GSE44001, adjusting for stage and low IPRS showed a 2.67-fold (HR: 2.67, 95% CI: 1.11-6.39) higher risk than the high IPRS group ( Figure 2B, Table S3).

Subgroups and Sensitivity Analysis for IPRS
To evaluate the prognostic value of IPRS, we performed a sensitivity analysis according to age, stage, grade. We found that IPRS was still significant in all subgroups and the prognosis of high IPRS is better than low IPRS. These results indicated that IPRS was potentially prognostic factor ( Figure 3). In addition, the survival curves of PFS in the TCGA cohort also showed that high IPRS was associated with better prognosis, although some information of time and status related to PFS is missing ( Figure S1, Table S4).  Pathway Enrichment Analysis Enrichment analysis of the 229 genes identified 94 significant KEGG pathways, and the top six most significant enriched pathways by these prognostic genes were: cytokine-cytokine receptor interaction, natural killer cell mediated cytotoxicity, T cell receptor signaling pathway, rheumatoid arthritis, type I diabetes mellitus, and Th1 and Th2 cell differentiation (Figure 4).

Comparison With Clinical Characteristics and Constructing Nomogram
We compared the predictive accuracy of IPRS with the clinicopathological characteristics including age, stage, and grade in TCGA. We found that compared with age (0.556), stage (0.629), and grade (0.514), IPRS had the highest C-index (0.685). In addition, stage was also a prognostic factor for predicting cervical cancer survival. We constructed a nomogram with IPRS and stage by which clinicians predict mortality in cervical cancer patients ( Figure 5A). Furthermore, calibration curves indicated good predictive performance of the nomogram ( Figure 5B, C).

DISCUSSION
Numerous evidence has shown the association between immune system or immune-related genes and patient prognosis in several solid tumors including breast cancer, non-small cell lung cancer, and ovarian cancer (15)(16)(17). In recent years, researchers have studied the survival benefits of immunotherapy in cervical cancer (2). In addition, some evidences also demonstrated the immune related signatures could predict the prognosis of cervical cancer (18). In this study, we established IPRS, a robust prognostic signature, based on 229 immune genes. Our results indicated that IPRS was significantly associated with cervical cancer patients' OS in the TCGA cohort. The patients were stratified into high IPRS and low IPRS groups based on cut-off -74.30 of IPRS. High IPRS was associated with better survival, which suggests that clinicians should pay more attention to low IPRS. After adjusting for clinicopathological factors, IPRS remained an independent prognostic factor in the multivariable Cox model. In addition, we found that IPRS was also significantly correlated with PFS, and the prognosis of high IPRS group was also better than that in the low IPRS group. Further, we validated the predictive value of IPRS in the validation set and concluded that patients in the high IPRS group had better survival, which was similar to that of the training set. Further, the IPRS could further stratify patients in different clinically defined groups into subgroups with different survival outcome. These results indicated that IPRS is a robust prognostic signature. Moreover, we performed a KEGG enrichment analysis for the 229 immune prognostic immune genes, which proved the FIGURE 4 | The X-axis represents the Percent of Genes. The Y-axis represents the KEGG pathway terms and the KEGG pathway terms were assigned to four KEGG categories. association between cancer and clinical application potential. Cytokine-cytokine receptors are associated with inflammation, angiogenesis, and chemotaxis processes, and they inhibit tumor development and progression in addition to being effective in cancer treatment (3,8). The natural killer cell-mediated cytotoxicity pathway is associated with NK cell activation, which controls tumor growth and kills tumors (19,20). T cell receptor signaling pathway was related to the activation of T-cells in response to a cancer antigen (21). In addition, many studies showed that rheumatoid arthritis could increase the incidence of cervical cancer, and the prevalence of HPV infection was higher among women with autoimmune disease (22,23). The incidence of cervical cancer was higher among women with Type I diabetes mellitus (24). MAPK signaling pathway and PI3K-Akt signaling pathway both lead to increased cancer cell invasiveness and facilitated cancer progression (25,26).
Further, we found that IPRS and stage were independent prognostic factors. To improve the clinical application value of IPRS, a nomogram was constructed to predict mortality and instruct clinicians to adopt effective treatment measures.
Our study has several strengths. First, we developed an IPRSbased immune prognostic gene that proved to be an independent prognostic factor. Second, based on IPRS, patients could be stratified into high and low IPRS groups to benefit from different treatment. Third, we constructed a nomogram to predict mortality more accurately so that cervical cancer patients accept more immediate treatment. However, our study also has some limitations. First, the sample size used in the study is not large, which led to some related results such as cut-off values to change when reproducing results in other data. Second, we used public datasets for this study, and although IPRS was an independent prognostic factor, we could not validate its value in the actual dataset.

CONCLUSION
We developed a robust prognostic signature IPRS, which could be used to predict patients' survival outcome. Further studies are necessary to validate the prognostic value of IPRS in cervical cancer.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.