Edited by: Katherine Chiappinelli, George Washington University, United States
Reviewed by: Shuji Sumitomo, Kobe City Medical Center General Hospital, Japan; Arutha Kulasinghe, Queensland University of Technology, Australia
*Correspondence: Zhenfang Xiong,
†These authors have contributed equally to this work and share first authorship
This article was submitted to Cancer Immunity and Immunotherapy, a section of the journal Frontiers in Oncology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
To find new immune-related prognostic markers for non-small cell lung cancer (NSCLC).
We found GSE14814 is related to NSCLC in GEO database. The non-small cell lung cancer observation (NSCLC-OBS) group was evaluated for immunity and divided into high and low groups for differential gene screening according to the score of immune evaluation. A single factor COX regression analysis was performed to select the genes related to prognosis. A prognostic model was constructed by machine learning, and test whether the model has a test efficacy for prognosis. A chip-in-chip non-small cell lung cancer chemotherapy (NSCLC-ACT) sample was used as a validation dataset for the same validation and prognostic analysis of the model. The coexpression genes of hub genes were obtained by pearson analysis and gene enrichment, function enrichment and protein interaction analysis. The tumor samples of patients with different clinical stages were detected by immunohistochemistry and the expression difference of prognostic genes in tumor tissues of patients with different stages was compared.
By screening, we found that LYN, C3, COPG2IT1, HLA.DQA1, and TNFRSF17 is closely related to prognosis. After machine learning, we constructed the immune prognosis model from these 5 genes, and the model AUC values were greater than 0.9 at three time periods of 1, 3, and 5 years; the total survival period of the low-risk group was significantly better than that of the high-risk group. The results of prognosis analysis in ACT samples were consistent with OBS groups. The coexpression genes are mainly involved B cell receptor signaling pathway and are mainly enriched in apoptotic cell clearance. Prognostic key genes are highly correlated with PDCD1, PDCD1LG2, LAG3, and CTLA4 immune checkpoints. The immunohistochemical results showed that the expression of COPG2IT1 and HLA.DQA1 in stage III increased significantly and the expression of LYN, C3, and TNFRSF17 in stage III decreased significantly compared with that of stage I. The experimental results are consistent with the previous analysis.
LYN, C3, COPG2IT1, LA.DQA1, and NFRSF17 may be new immune markers to judge the prognosis of patients with non-small cell lung cancer.
Non-small cell lung cancer is the most common type of lung cancer, accounting for about 80–85% of the total lung cancer (
The gene expression profile data set GSE14814 was obtained from the National Biotechnology Information Center (NCBI) GEO (Gene Expression Omnibus) database, which contained 62 samples of untreated cancer patients with non-small cell lung cancer and 62 patients with non-small cell lung cancer after chemotherapy. Samples of 62 untreated patients with non-small cell lung cancer were evaluated for tumor purity scores and immune scores using the Estimate software, and the distribution of immune score was plotted using the ggstatsplot software package in R language. The NSCLC-OBS patients were divided into high and low groups based on the immune scores, using the R language limma package to compare the two groups to obtain differentially expressed genes. “Ggplot2” “heatmap” package in the R software package was used to draw the volcano map and heat map of the difference gene expression.
For the clinical information of patients with differential genes combined, the unidimensional COX regression analysis was used to reduce the dimension for the first time, and then the second dimension reduction was carried out by 1000 minimum depth method in random forest to obtain the marker with the highest prognostic value gene. Random forest is a method of machine learning, and its purpose is to obtain prognostic value. The largest variable has more accurate results than the conventional multi-factor COX hazard ratio model. The model kernel is a non-linear model, which is closer to the biological reality than the linear model kernel of multi-factor COX regression. In order to obtain a wider range of random forest explanatory variables, and due to the inaccurate biological fitting of single-factor COX regression, the first dimension reduction result was selected, with a
Then, GSEA analysis of prognostic marker was performed, and GO and KEGG data sets on the GSEA official website were used for functional enrichment and pathway enrichment. The significant enrichment criteria are that: the absolute value of NES is greater than 1, and the NOMp value is less than 0.05.
According to the previous calculation results of COX, GSEA, and random forest model, the immune prognosis model construction and risk assessment were performed for the patients in the OBS group, and the risk scores were ranked. The survminer package in R language was used to find out the best cutting point, and the patients were divided into high and low risk groups for subsequent evaluation of the model efficiency. The expression of five key prognostic genes in the high-risk group and the low-risk group was analyzed and compared, and a heat map was drawn. Based on the results of the high and low risk grouping, the prognostic difference of the K-M survival curve by comparing high and low risk group was plotted, and then the ROC analysis was performed to verify the above results.
The constructed prediction model was applied to a sample of NSCLC-ACT patients, and the score calculations were performed on ACT patients. The expression of five key prognostic genes in the high-risk group and the low-risk group was analyzed and compared, and a heat map was drawn. K-M survival analysis and ROC calculation model accuracy were performed respectively.
The GSVA package was used to score the relative infiltration of 24 immune cells in NSCLC-ACT patients, and the NSCLC-ACT patients were divided into high and low risk groups according to the risk grouping results of the random forest model. The infiltration of 24 immune cells in high and low risk groups was compared. Hubgene co-expressed genes were also calculated based on pearson analysis (screening method
100 patients diagnosed with stage I and stage III non-small cell lung cancer were selected from our hospital and were divided into two groups. The selected patients had underwent surgery. The general conditions of the two groups of patients are shown in
Basic information of patients in the stage I lung cancer group and stage III lung cancer group.
Group | Gender | Age | Cancer type | Stage | TNM | Radiotherapy or chemotherapy | |
---|---|---|---|---|---|---|---|
Male | Female | ||||||
Stage I lung cancer group | 26 | 24 | 52.8. ± 1.7 | 22 cases of LUSC and 28 cases of LUAD | I | T1N0M0 | None |
Stage III lung cancer group | 27 | 23 | 54.6 ± 3.2 | 24 cases of LUSC and 26 cases of LUAD | IIIA | T1N2M0 | None |
|
>0.05 | >0.05 | >0.05 | / |
Data analysis was performed using SPSS 20.0 statistical software. Comparison of gene expression levels in different tissues was performed using the χ2 test and independent sample t test, and clinical pathological characteristics were analyzed using the χ2 test. Survival analysis was performed using Kaplan-Meier method.
We divided 62 NSCLC-OBS samples into high and low score groups according to their tumor purity and immune scores using the estimation software and the “ggstatsplot” package in R. The violin distribution map of the immune scores (
A single factor COX regression analysis was performed on the differential genes based on the patient’s clinical information, and a total of 95 variables were obtained. After including them in the secondary dimension reduction analysis, we obtained LYN, C3, COPG2IT1, HLA.DQA1, and TNFRSF17 (
According to the previous calculation results, we identified LYN, C3, COPG2IT1, HLA.DQA1, and TNFRSF17 as genes that affected immune prognosis. According to the difference in the expression of these five genes, we divided 62 NSCLC-OBS samples into high and low risk groups (
This prediction model was applied to 62 NSCLC-ACT samples. The results showed that the high and low-risk groups divided by the model had significant differences (
The NSCLC-ACT patient samples were evaluated to obtain the relative infiltration scores of 24 immune cells in the NSCLC-ACT patient group. According to the prediction model, the ACT patients were divided into high and low risk groups. After comparing the 24 types of immune cell infiltration in the high and low risk groups, we found that hubgene of the prediction model was most expressed in fibroblasts, but there was no significant difference in immune infiltration between the high and low risk groups in 24 immune cells (
According to pearson calculation, 100 genes co-expressed with LYN, C3, COPG2IT1, HLA.DQA1, and TNFRSF17 were obtained (p < 0.05, sorted from small to large). Functional enrichment and pathway enrichment analysis of the five hubgenes and 100 co-expressed genes showed that the co-expressed genes were mainly involved in B cell receptor signaling pathway. And these genes were mainly enriched in biological processes such as apoptotic cell clearance, Leishmaniasis, Hematopoietic cell lineage and Intestinal immune network for IgA production (
Except for the staging of lung cancer, there was no significant difference in general information between the two groups of patients (
Expression of five prognostic related genes in the stage I lung cancer group and stage III lung cancer group.
Gene | Stage I lung cancer group | Stage III lung cancer group |
|
---|---|---|---|
LYN | 9.3 ± 0.62 | 2.5 ± 0.36 | <0.01 |
C3 | 8.9 ± 0.37 | 2.1 ± 0.35 | <0.01 |
COPG2IT1 | 3.6 ± 0.59 | 7.3 ± 0.26 | <0.05 |
HLA.DQA1 | 4.3 ± 0.87 | 10.5 ± 1.12 | <0.05 |
TNFRSF17 | 7.8 ± 0.86 | 3.2 ± 0.36 | <0.05 |
Expression levels of five key prognostic genes in tumor tissues of two groups of patients with NSCLC.
Non-small cell lung cancer (NSCLC) accounts for about 80–85% of all lung cancers, and the main pathological types include adenocarcinoma and squamous cell carcinoma. For the most part, the treatment of NSCLC depends on the stage at which it is treated. Patient’s surgery resection rate is about 25% (
We performed immune assessment on untreated NSCLC patient samples and grouped them for differential gene analysis, single-factor COX regression analysis, Receiver Operating Characteristic (ROC) analysis and survival analysis. We identified LYN, C3, COPG2IT1, HLA.DQA1, and TNFRSF17, these five genes may be important immune prognostic genes in NSCLC. We modeled these five genes and applied this model to chemotherapy-treated NSCLC samples, and the results showed that the survival and survival of patients identified as high risk group were significantly lower than those of low-risk group, and ROC analysis of the model shows that AUC values are greater than 0.9, which proves that the model has great credibility. Some literatures show that LYN in allergic airway inflammation can reduce inflammatory cell infiltration and levels of IL-13 and IL-4 and downregulate allergen-induced airway inflammation (
The B cell receptor (BCR) pathway has been identified as a potential therapeutic target in a number of common B cell malignancies. We found that screened five prognosis-related genes were all closely related to immunity response and mainly involved in B cell receptor signaling pathway. Tyrosine protein kinase Lyn is a protein encoded by the LYN gene in humans and a member of the Src protein tyrosine kinase family. Among various hematopoietic cells, Lyn has become a key enzyme involved in the regulation of cell activation. In these cells, a small amount of LYN is associated with cell surface receptor proteins, including B cell antigen receptor (BCR), CD40 or CD19. Yang et al. (
Yamamoto et al. (
Mutations in the HLA class II genes could lead to loss of expression of HLA-DR and HLA-DQ in diffuse large B-cell lymphoma (
In conclusion, our research indicates that LYN, C3, COPG2ITL, HLA.DQAL, and TNFRSFL17 are potential prognostic markers for non-small cell lung cancer. The results of immunohistochemical experiments of patients’ pathological tissues are consistent with this conclusion. These prognosis-related genes are mainly enriched in B cell receptor signaling pathways and are highly related to PDCD1, PDCD1LG2, LAG3, and CTLA4 immune checkpoints; this suggests that immunotherapy may improve the prognosis of non-small cell lung cancer patients by regulating these prognosis-related genes.
Publicly available datasets were analyzed in this study. This data can be found here: the Gene Expression Omnibus (GSE14814).
The studies involving human participants were reviewed and approved by The Ethics Committee of the First Affiliated Hospital of Nanchang University. The patients/participants provided their written informed consent to participate in this study.
JX research the design and drafted the manuscript. HN experiment implementation. JH: help modify articles. XW: literature search. KL experiment implementation. LT help modify articles and collate references. ZX review and revision of the manuscript and writing guidance. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.