Development of an Immune-Related Gene Signature for Prognosis in Melanoma

Melanoma remains a potentially deadly malignant tumor. The incidence of melanoma continues to rise. Immunotherapy has become a new treatment method and is widely used in a variety of tumors. Original melanoma data were downloaded from TCGA. ssGSEA was performed to classify them. GSVA software and the "hclust" package were used to analyze the data. The ESTIMATE algorithm screened DEGs. The edgeR package and Venn diagram identified valid immune-related genes. Univariate, LASSO and multivariate analyses were used to explore the hub genes. The "rms" package established the nomogram and calibrated the curve. Immune infiltration data were obtained from the TIMER database. Compared with that of samples in the high immune cell infiltration cluster, we found that the tumor purity of samples in the low immune cell infiltration cluster was higher. The immune score, ESTIMATE score and stromal score in the low immune cell infiltration cluster were lower. In the high immune cell infiltration cluster, the immune components were more abundant, while the tumor purity was lower. The expression levels of TIGIT, PDCD1, LAG3, HAVCR2, CTLA4 and the HLA family were also higher in the high immune cell infiltration cluster. Survival analysis showed that patients in the high immune cell infiltration cluster had shorter OS than patients in the low immune cell infiltration cluster. IGHV1-18, CXCL11, LTF, and HLA-DQB1 were identified as immune cell infiltration-related DEGs. The prognosis of melanoma was significantly negatively correlated with the infiltration of CD4+ T cells, CD8+ T cells, dendritic cells, neutrophils and macrophages. In this study, we identified immune-related melanoma core genes and relevant immune cell subtypes, which may be used in targeted therapy and immunotherapy of melanoma.


INTRODUCTION
Melanoma still remains a potentially deadly malignant tumor at the beginning of the 21st century. The incidence of melanoma unfortunately continues to rise, while the incidence of many tumor types is declining (1). Melanoma is mainly seen in young and middle-aged people, and the median age at diagnosis is 57 years old. It has been observed that the incidence increases linearly from 25 to 50 years old and then slows down, especially in women (2). Although most patients have localized disease at the time of diagnosis and are treated by surgically removing the primary tumor, many patients develop metastasis (3). It is generally understood that the normal function of a healthy immune system can protect and prevent the development of malignant tumors, and people with a genetically compromised immune system may have increased susceptibility to tumors (4). Immunotherapy has become a new treatment method and is widely used in a variety of tumors, such as gastric and esophageal cancer, pancreatic cancer and ovarian cancer (5)(6)(7). Experiments have shown that immune stimulation can participate in the treatment of melanoma (8). Targeted therapy for specific genes is also a research hotspot (9). Combining targeted therapy and immunotherapy is an important strategy to treat melanoma (10)(11)(12). Therefore, screening immunerelated biological targets has become particularly important.

MATERIALS AND METHOD Data Collection
RNA sequence and clinical data of melanoma were collected from TCGA (13). We downloaded the expression profiles of mRNAs (level 3) in cases including tumor tissues and normal tissues from TCGA database (http://cancergenome.nih.gov/) on april 15, 2019. The sequenced data were obtained from Illumina HiSeqRNASeq. The corresponding clinical information of patients was also downloaded from TCGA database. ssGSEA groups TCGA melanoma transcriptome data. From the results of Bindea et al (14), we used a set of marker genes for immune cell types. We utilized 29 immune data sets (including immunerelated pathways, immune cell types and immune-related functions) and the ssGSEA method with the R software gene set variation analysis (GSVA) package to operate the related expression pathways, penetration levels of different immune cells and Activity of immune-related functions. The melanoma The immune cells were highly expressed in the high immune cell infiltration group (Immunity_H), and the low expression in the low immune cell infiltration group (Immunity_L). The Tumor Purity, ESTIMATE Score, Immune Score and Stromal Score were illustrated along with the grouping information. (B) There is a statistical difference of the Tumor Purity, ESTIMATE Score, Immune Score and Stromal Score between the high immune cell infiltration cluster and the low immune cell infiltration cluster (C, D) The expression of HLA family genes, TIGIT, PDCD1, LAG3, HAVCR2, and CTLA4 in the high immune cell infiltration cluster (red) were significantly higher than that of the low immune cell infiltration cluster (green) (E) The statistical graph shows the difference in the proportion of each immune cell between the high immune cell infiltration cluster (red) and the low immune cell infiltration cluster (green). (F) Survival difference between high immune cell infiltration cluster and low immune cell infiltration cluster. *P < 0.05, **P < 0.01, ***P < 0.001. samples from TCGA were divided into low-and high-immune cell infiltration cluster by "hclust" package (15). GSE15605 from the GEO database including 58 melanoma samples was recruited for external validation.

Verification of Effective Immune Grouping
The ESTIMATE algorithm was for identification of the differentially expressed genes (DEGs) in the melanoma expression profile data. The ESTIMATE algorithm was used to analyze the Immune Score, Stromal Score, Tumor Purity and ESTIMATE Score, and cluster heat maps and statistics were drawn for effective grouping.

Selection of Immune-Related Genes in Melanoma
TCGA data was divided into high-and low-immune cell infiltration cluster. According to the standards of p <0.05 and| log2FC |> 2, we used the edgeR package to analyze DEGs. We used the same criteria to perform differential analysis on cancer groups and para-cancer groups to screen immune-related cancerous genes. The Venn diagram identified real immunerelated genes from the above two analyses.

Screen Prognostic Genes and Tap Their Characteristics
We utilized Univariate, lasso and multivariate analysis to dig out the correlation between the OS of patients and the expression level of immune-related genes. We calculated the regression coefficient and hazard ratio (HRs) of each gene, and finally the satisfactory mRNAs was identifed.

Construct a Prognostic Model of Immune-Related Genes
The prognostic risk scoring model of melanoma patients in training cohort is a collection of each optimal prognosis mRNA expression level and relative regression coefficient weights calculated from the multivariate model as the following method: Risk Score(patient) Relying on the median risk score, all patients in the cohort were classified into high-and low-risk groups. Kaplan-Meier survival curves of the two groups were completed. We proposed ROC curves (16) to evaluate the specificity and sensitivity of the model. We also conducted a multivariate analysis of several clinical characteristics of melanoma patients to check the independence of the prognostic models without their clinical characteristics.

Verify the Effect of Prognostic Models
With the cut-off values calculated from the training cohort, we compared the risk scores from the testing and entire cohort and then patient can be classified into high-or low-risk groups. Kaplan-Meier curve, Time-dependent ROC and Cox multivariate analysis were all conducted. Based on the clinicopathological characteristics, we conducted a stratification analysis of the entire cohort samples.

Confirmation of Hub Immune Related Genes
The "rms" package established the nomogram and calibrate curve, checking the accuracy and the consistency index between the predicted probability and the actual observation frequency. We next displayed the results in the calibration curve, in order to represent the performance of nomogram.

Analysis of Correlation With Immune Cell Infiltration
Immune infiltration data can be obtained from the tumor obtained from immune estimation resource (TIMER) database (17). We rely on the Pearson correlation coefficient to calculate the degree of correlation between immune infiltration and risk score. Meanwhile, we used the tumor-immune system interactions and drugbank (TISIDB) database to investigate the expression of these core immune-related genes in different molecular subtypes of cutaneous melanoma (18).

Differentiated Grouping of Melanoma Tissue
We obtained melanoma samples and normal skin tissue samples from the TCGA database. We used ssGSEA to analyze the transcriptome data of melanoma tissue samples to assess the immune cell infiltration state. After controlling for the enrichment of multiple immune cell types, melanoma samples were divided into high and low immune cell infiltration clusters according to the degree of immune infiltration ( Figure 1A). To test the authenticity of the above grouping scheme, we used the ESTIMATE algorithm to analyze the expression profile of melanoma and calculated the immune score, ESTIMATE score, stromal score, and tumor purity. The results suggested that the tumor purity of the high immune cell infiltration group was lower than that of the low immune cell infiltration cluster. In contrast, the values of the ESTIMATE score, immune score and stromal score were higher in the high immune cell infiltration cluster than in the low immune cell infiltration cluster ( Figure 1A). The box chart shows that the high immune cell infiltration cluster had significantly higher immune score, ESTIMATE score and stromal score and lower tumor purity than the low immune cell infiltration cluster ( Figure 1B). There were more immune components in the high immune cell infiltration cluster than in the low immune cell infiltration cluster, but the tumor purity of the high immune cell infiltration cluster was lower, and the expression levels of TIGIT, PDCD1, LAG3, HAVCR2, CTLA4 and the HLA family were also higher in the high immune cell infiltration cluster ( Figures 1C, D). The CIBERSORT method was used to analyze the above two clusters and showed that there were more types of immune cells in the high immune cell infiltration cluster ( Figure 1E). Survival analysis demonstrated that patients from the low immune cell infiltration cluster had worse prognosis than patients in the high immune cell infiltration cluster ( Figure 1F).

Analysis of DEGs With High and Low Immune Cell Infiltration
Based on the cutoff, which was |log2FC| > 2 and FDR < 0.05, we identified 1120 DEGs between the low and high immune cell infiltration clusters, which included 1116 upregulated DEGs and 4 downregulated DEGs (Figure 2A). We conducted a Venn analysis based on the immune genes from the import database and the DEGs from the high and low immune cell infiltration clusters. Then, we found 388 overlapping genes ( Figure 2B), which were considered to be real DEGs.

Prognosis Models of Immune Cell Infiltration-Related DEGs
After integrating clinical information into gene expression profiles, we obtained 453 samples. We randomly selected 228 samples as the training cohort and the remaining 225 samples comprised the test cohort. All the samples together are referred to as the entire cohort. Then, we built a prognostic model with each cohort. In the training cohort, based on p < 0.05, univariate Cox regression analysis identified 171 genes ( Table 1). The LASSO Cox regression algorithm was performed next (Figures 3A, B). Finally, multivariate Cox proportional hazards regression analysis was conducted, and the risk scores were calculated ( Figure 3C). IGHV1-18, CXCL11, LTF and HLA-DQB1 were identified as immune cell infiltration-related DEGs. The risk score was calculated using the following formula: -0.000600085×IGHV1-18-0 . 0 3 2 2 4 2 1 8 3 × C X C L 1 1 + 0 . 0 0 3 7 7 6 3 9 4 × L T F -0 . 0 0 7 8 9 3899×HLA-DQB1. The survival status and risk score calculated by the prognostic model are illustrated in Figure  4A. Samples were classified into low-and high-risk clusters according to the median risk score. Survival analysis indicated      that low-risk patients had significantly longer overall survival times than high-risk patients ( Figure 4B). ROC curve analysis showed that the specificity and sensitivity were highest when the risk score was 0.72, 0.72, and 0.696 according to the 1-, 3-, and 5-year survival of the area under the receiver operating characteristic curve (AUC) value, respectively ( Figure 4C). For the testing cohort, the risk score and survival status indicated by the prognostic model are displayed in Figure  4D. Samples were divided into low-and high-risk clusters according to the median risk score. Survival analysis indicated that low-risk patients had significantly longer overall survival times than high-risk patients ( Figure 4E). ROC curve analysis showed that the specificity and sensitivity were highest when the risk score was 0.669, 0.622, and 0.599 according to the 1-, 3-, and 5-year survival of the area under the AUC value, respectively ( Figure 4F). For the entire cohort, the risk score and survival status are illustrated in Figure 4G. Samples were classified into low-and high-risk clusters according to the median risk score. Survival analysis indicated that low-risk patients had significantly longer overall survival times than high-risk patients ( Figure 4H). ROC curve analysis showed that the specificity and sensitivity were highest when the risk score was 0.694, 0.67, and 0.647 according to the 1-, 3-, and 5- year survival of the area under the AUC value, respectively ( Figure 4I). The univariate model of the training, testing and entire cohorts is shown in Figures 5A-C, while the multivariate model of the training, testing and entire cohorts is shown in Figures 5D-F. The results all demonstrated that the prognostic model has independent and moderate prognostic power for immune cell infiltration. Taking the median risk score as the standard, we divided the sample of the entire cohort into a highrisk cluster and a low-risk cluster. Based on different clinical factors, we conducted a survival analysis of the two groups of samples. In the subgroup analysis stage II, stage III, stage IV, age ≤ 60, age > 60, female, male, with tumor and free of tumor, patients in the high-risk group had shorter overall survival times than those in the low-risk group ( Figure 6).

Construction of the Predictive Nomogram
To predict the survival rate of melanoma patients from a clinical point of view, we constructed a nomogram using TCGA data to estimate the likelihood that the OS will last for 1, 3, and 5 years. We used the following six independent prognostic factors to predict the nomogram: age, AJCC stage, grade, histological type, risk score and tumor status ( Figure 7A). The calibration chart shows that the effectiveness of the nomogram was very good, and the 45°line represents the best predicted case. ( Figure 7B).

Validation of the Screened Genes by qRT-PCR and External Melanoma Database
Compared with the normal melanocytes, IGHV1-18, CXCL11 and HLA-DQB1 were highly expressed in melanoma cell line A375,  Because the 4 genes IGHV1-18, CXCL11, LTF and HLA-DQB1 are associated with tumor immunity, we used the TIMER database to analyze the correlation between the prognosis of these 4 genes and the infiltration of immune cell subtypes in melanoma ( Figure 10). The correlation value of B cells with the risk score was −0.241, and the correlation value of CD4+ T cells with the risk score was −0.235. The correlation value of CD8+ T cells with the risk score was -0.422. The correlation values of dendritic cells with the risk score was −0.511. The correlation value of macrophages with the risk score was −0.255, and the correlation value of neutrophils with the risk score was −0.442. The above results suggest that the prognosis of melanoma is significantly negatively correlated with infiltration by these immune cell subtypes. In addition, compared with the normal control, the expression of IGHV1-18, CXCL11 and HLA-DQB1 were higher in the patients with cutaneous melanoma, while the expression of LTF was lower ( Figure S2). We divide cutaneous melanoma into four subtypes (BRAF-mutant, NF1-deficient, NRAS-mutant and triple wild-type). We found that the expression of CXCL11 (P = 0.1), LTF (P = 0.28), and HLA-DQB1 (P = 0.67) had no significant relation to the subtypes of cutaneous melanoma through TISIDB database ( Figure S3).

DISCUSSION
Melanoma is the most invasive form of skin cancer, and the incidence continues to rise worldwide. Although intense intermittent sun exposure is the main risk factor for melanoma, family history of melanoma, genetic susceptibility, environmental factors, and immunosuppression are other factors that affect the incidence (19). In recent years, immunotherapy and targeted therapy of specific factors have been increasingly investigated the immune infiltration and tumor mutation burden in different risk groups (22). In this study, we focused on the immune infiltrating status in melanoma and selected IGHV1-18, CXCL11, LTF and HLA-DQB1 from immune cell infiltration cluster as immune cell infiltration-related DEGs through the analysis of differences in melanoma samples and the construction of prognostic models. In addition, we further explored the correlation of the immune cell infiltration-related DEGs with the specific immune cell subtypes, which may provide more details for the exploration of the mechanisms by which DEGs regulate the development and prognosis of melanoma.
The CXCL9, -10, -11/CXCR3 axis is involved in inflammatory responses, leukocyte trafficking, adaptive resistance, hematopoiesis, cancer cell transfer and angiogenesis. Tokunaga et al. found that the CXCL9, CXCL10, and CXCL11/CXCR3 axis can be used as novel tumor treatment targets (23). C-X-C motif chemokine 11 (CXCL11) is regarded as the dominant CXCR3 agonist and can be induced by IFN-g and type I interferons (24). CXCL11 has been found uniquely expressed in the melanoma with rich lymphocyte, and may play a potential role in the construction of tumor microenvironment by recruiting activated T-cells (25). Kremenovic et al. revealed that CXCL11, as a myeloid activation (MA) signature gene, had a positive correlation with the presence of M1 macrophages, mature dendritic cells (DC) and CD8 + T cells in cutaneous melanoma patients (26).
The lactoferrin (LTF) gene, located at 3p21.3, acts as a tumor suppressor gene in diverse tumors. Zhang et al. demonstrated that LTF is dysregulated in nasopharyngeal carcinoma cell lines (27). Yi HM and others discovered expression, genetic and epigenetic alterations of the LTF gene in nasopharyngeal carcinoma cell lines (28). Wei et al. found that in B16-F10 melanoma metastasis model, the metastatic rate was higher in the LTF knockout mice (29). LTF may play a protective role in melanoma metastasis by inducing differentiation and apoptosis of myeloid-derived suppressor cells (MDSCs) and up-regulating TLR9 expression.
Polymorphisms of human leukocyte antigen (HLA) genes are thought to be associated with the susceptibility to a variety of malignancies and involved in the progress of carcinogenesis, tumor proliferation and immune escape (30). HLA-DQB1 is more extensively studied in gastric cancer and cervical cancer (31,32). HLA-DQB1 * 0301 has been reported to be closely associated with the risk of melanoma development and progression (33). As far as we know there are indeed few reports on IGHV1-18 in melanoma. IGHV1-18 is commonly expressed in normal B cells, and the tumor or inflammatory conditions can affect B cells, which may result in mutations in the heavy chain clone gene and influence the antibody gene family usage preference (34,35). Although IGHV1-18 has not been reported in melanoma, current studies suggest that the dynamic balance of B cells and antibodies may be related to the occurrence, development and prognosis of melanoma. In  melanoma, B-cells can be polarized to produce IgG4, which has low anti-tumor efficacy and may represent a possible mechanism of tumor escape (36). In addition, although it is generally believed that Ig is produced only by B lymphocytes, recent studies have reported that IgG can also be produced by non-B cells, such as epithelial cancer cells. For example, compared with normal epithelial cells, IgG from cancer cells often show unique V(D)J rearrangement or mutation hotspots (37). Therefore, further research on IGHV1-18 changes in melanoma patients may be helpful for the diagnosis and prognosis of melanoma. We have included this part of discussion in our revised manuscript accordingly. Immunotherapy, along with surgery, radiation therapy and chemotherapy, is rapidly becoming the standard treatment for cancer. In recent years, it has been demonstrated in a variety of tumor types that the level of immune cell infiltration is inversely related to tumor purity but positively correlated with responsiveness to immune checkpoint inhibitors, which results in better prognosis and immune response (38,39). Our results showed that the status of overall increased infiltrating immune cells in melanoma has the potential to predict clinical prognosis. Melanoma could be divided into "hot" and "cold" status (enrich in or lack of immune cells infiltration), and the hot status is likely to correlate with antigen processing and higher expression of interferons, TNF and chemokines pathways (40). We further analyzed the infiltrating immune cell subtypes which correlated with the prognosis of melanoma. CD8+ cytotoxic T lymphocytes (CTLs) are the preferred tool for targeting tumors, and effective antitumor immunity also requires CD4+ T cells (41). Experiments have shown that CD8+ T cells and CD4+ T cells play a role in the treatment of breast cancer, colon cancer, etc. (42,43), especially in melanoma (44,45). Enhanced dendritic actin network formation is clearly proven to have an effect on melanoma (46). Samaniego R and others found that macrophage expression can predict human primary cutaneous melanoma progression (47). Protumor activities of macrophages have also been detected in the progression of melanoma (48). Forsthuber A and others found that CXCL5 played a role as a regulator of neutrophil function in cutaneous melanoma (49). Soler-Cardona A and others also confirmed that this mechanism is related to lymph node metastasis (50). The above results indicate that our screening and prediction about immune cell subtypes are reliable, which is beneficial to further research on melanoma immunotherapy.
Nevertheless, our study remains certain limitations. First, the data on which the prediction model was established were obtained from available public databases, though we validated it in melanoma cell lines through qRT-PCR and other external datasets, the immunohistochemistry staining of the protein level associated with DEGs and infiltrating immune cell in tumor tissues also deserves further validation. In addition, the immune cell types were identified by marker genes, but the expression level of them may not constant per cell, and hence, the cell number may be incompletely relevant to the expression level of marker genes (51). Further, a more comprehensive analysis of more types of immune cells and the stromal cells should be a focus of future research.