Analysis of the Prognostic Value and Gene Expression Mechanism of SHOX2 in Lung Adenocarcinoma

Background: Detection of SHOX2 methylation has been used to assist in the early diagnosis of lung cancer in many hospitals as SHOX2 may be important in the tumorigenesis of lung cancer. However, there are few studies on the mRNA expression, methylation, and molecular mechanism of SHOX2 in lung cancer. We aimed to explore the role of SHOX2 in lung adenocarcinoma (LUAD). Methods: First, we examined the differential expression of SHOX2 mRNA and methylation in cancerous and normal tissues using databases. Second, we analyzed the relationship between SHOX2 expression and common clinical parameters in LUAD patients. Third, we further explored the methylated level and its specific location of SHOX2 and the mainly factors of SHOX2 gene expression. Finally, we screened the correlatively expressed genes to analyze the pathways from the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes using DAVID. Results: We found that the mRNA expression of SHOX2 was higher in multiple cancers, including LUAD and lung squamous cell carcinoma (LUSC), than in normal tissues. Among LUAD patients, SHOX2 expression was higher in patients of middle–young age, with smoking history, in advanced stages, and with nodal distant metastasis. In addition, our results showed that patients with high expression of SHOX2 are prone to recurrence, poor differentiation, and poor prognosis. Thus, we identified that SHOX2 might be an oncogene for LUAD progression. The main factor influencing the high expression of SHOX2 mRNA may be DNA methylation, followed by copy number variation (CNV), but not by gene mutations in LUAD. Unexpectedly, we found that SHOX2 undergoes hypomethylation in the gene body instead of hypermethylation in the promoter. Additionally, SHOX2 has cross talk in the PI3K–Akt signaling pathway and ECM–receptor interaction. Conclusion: SHOX2 is highly expressed in most cancers. SHOX2 gene expression might be mainly regulated by methylation of its gene body in LUAD, and its high expression or hypomethylation indicates poor differentiation and poor prognosis. SHOX2 could be involved in PI3K–Akt and other important cancer-related signaling pathways to promote tumorigenesis.


INTRODUCTION
Much progress has been made in the research field of lung cancer occurrence and progression. However, epidemiological investigations have shown that lung cancer has higher morbidity and mortality than other malignant cancers in China (Chen et al., 2016;Wu et al., 2019). The alarming statistical results of lung cancer remain mainly attributed to the late detection, late diagnosis, late treatment, and unsatisfactory therapeutic effect. In lung cancer, driver gene alterations are critical in the whole process of tumorigenesis, recurrence, and metastasis. More representative driver genes as biomarkers for accurate early diagnosis and prognostic evaluation and therapeutic targeted molecules could improve treatment efficiency and increase mortality. Therefore, the identification of these molecular biomarkers for lung cancer is of great clinical significance.
Short stature homeobox 2 (SHOX2) is highly orthologous to murine Shox2 and human SHOX (Blaschke et al., 1998;De Baere et al., 1998;Marchini et al., 2016;Hu et al., 2018). According to current studies on Shox2, SHOX, and SHOX2, SHOX2 is considered to play a critical role in idiopathic short stature and various types of cardiac arrhythmias (Mortensen et al., 2012;Marchini et al., 2016). However, in recent decades, many researchers have found that SHOX2 also plays an important role in multiple cancers, including lung cancer (Schneider et al., 2011;Hong et al., 2014;Sun et al., 2019). Interestingly, methylation detection of SHOX2 in sputum, blood, alveolar lavage fluid, and tissue of patients has been used to screen early lung cancer patients (Zhao et al., 2015;Weiss et al., 2017;Zhang et al., 2017;Shi et al., 2020). It is worth noting that there is little research on the potential prognostic influence and molecular mechanism of SHOX2 in lung cancer, especially in non-small-cell lung cancer (NSCLC) [mainly lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC)]. The clinical significance of SHOX2 research as an early diagnostic gene is beyond doubt.
In our research, data from The Cancer Genome Atlas (TCGA) were mined using several online analysis tools to evaluate the SHOX2 expression profile in NSCLC. The main factors of the SHOX2 gene expression mechanism, including mutation, copy number, and methylation, were then investigated. The relationship between SHOX2 expression and the clinical characteristics of LUAD was determined using publicly accessible databases. Finally, potential co-expressed genes and functional networks were analyzed to provide direction for further investigation into the mechanism of how SHOX2 works in lung cancer.

Oncomine Database Analysis
Oncomine is a large oncogene chip database that can be used to analyze and compare gene expression between tumors and normal tissues, as well as gene mutation profiles and their correlation with clinical characteristics (Rhodes et al., 2007).
We used the Oncomine database to determine differences in the mRNA expression of the SHOX2 gene between tumors and normal tissues in various cancers. The results of this analysis are addressed with a p-value of 0.05, a fold change of 2, all gene rankings, and mRNA data type.

GEPIA
Gene expression profiling interactive analysis (GEPIA) is a web tool that provides key interactive analysis and customization capabilities, including tumor/normal differential expression profilometry, profile mapping, pathological staging, patient survival analysis, similar gene assay analysis, and dimensionality reduction analysis (Tang et al., 2017). We used GEPIA to address the differential expression of SHOX2 in all common cancers. We applied the following cut-off criteria: using the ANOVA method, |log2 FC| > 1, p-value < 0.01, and log2(TPM +1) for log-scale, matching TCGA and GTEx normal data, and adding all cancer tissue names.

GEO Microarray Analysis
The Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih. gov/geo/) is a database that stores chips, second-generation sequencing, and other high-throughput sequencing data worldwide (Edgar et al., 2002). An mRNA expression dataset (GSE33532) for early stage NSCLC was downloaded from the GEO database. GSE33532 stored mRNA expression information of 80 NSCLC tissue samples and 20 matched distant-normal samples from 20 patients. Microarray expression profiling was normalized by log2 transformation before analysis. These 20 patients were included in the validation dataset. The histological types of these NSCLC samples included adenocarcinoma, squamous cell carcinomas, and mixed carcinomas. Finally, we used GraphPad Prism 7 software to analyze and plot the collected data using unpaired t-test statistical methods. The p-value < 0.05 was considered statistically significant.

SurvivalMeth Analysis
SurvivalMeth (http://bio-bigdata.hrbmu.edu.cn/survivalmeth/) is a web server that is freely available to address the prognostic information of cancer-associated methylation, based on TCGA, CCLE, and GEO . We used SurvivalMeth to investigate the effect of SHOX2 DNA methylation-related functional elements on protein expression and LUAD prognosis. We applied the following cut-off criteria: the chosen disease of lung adenocarcinoma, the chosen experimental platform of 450 K (Illumine Infinium HumanMethylation450 BeadChip), the chosen gene symbol of SHOX2, transcript-related elements including "1st exon, 3′UTR, 5′UTR, gene body, TSS1500, and TSS200," CpG island-related elements including "Island, N_Shelf, N_Shore, S_ Shelf, and S_Shore," unrestricted repeat element-related element and CTCF-binding region, statistical methods using t-test, a threshold value of 0.05, and median group strategy. database of genomic, transcriptomic, and clinical and phenotypic data (Kent et al., 2002;Zweig et al., 2008;Navarro Gonzalez et al., 2021). To investigate the dominant factors influencing SHOX2 expression in LUAD, UCSC Xena was used to analyze the relationship between the methylation level, copy number, mutation, Kaplan-Meier survival analysis, and mRNA expression of SHOX2 in TCGA LUAD samples. We set the following conditions: sample types including solid tissue normal and primary tumor, gene expression using RNAseq-IlluminaHiSeq, DNA methylation using Methylation 450K, copy number using (gene-level)-gistic2, and somatic mutation using (SNP and INDEL)-MC3 public version. Additionally, we used this tool to analyze SHOX2 expression patterns in different pathological types.

UALCAN Analysis
UALCAN (http://ualcan.path.uab.edu) is an online database that provides free visual figures of gene expression, survival analysis, correlation analysis, and gene DNA promoter region methylation data, grouped by clinicopathological features between normal and tumor tissues, based on TCGA data (Chandrashekar et al., 2017). To explore the relationship between SHOX2 and the clinicopathological features of NSCLC patients, the expression and methylation levels of SHOX2 in NSCLC and adjacent normal tissues were identified using UALCAN.

MethSurv Analysis
MethSurv (https://biit.cs.ut.ee/methsurv/) used methylation group data from the "Cancer Genome Map" to perform survival analysis on the methylation patterns of CpG to achieve a preliminary assessment of tumor biomarkers based on methylation (Modhukur et al., 2018). We used MethSurv to further explore the relationship between SHOX2 methylation and several clinical characterizations in a heat map.

cBioPortal Database Analysis
The cBioPortal database (v3.5.4) (http://www.cbioportal.org) was used to perform the cancer genomic analysis (Gao et al., 2013;Jing et al., 2019). We chose two datasets, "Lung Adenocarcinoma (TCGA, Firehose Legacy)" and "Lung Adenocarcinoma (TCGA, Nature 2014)," to analyze the co-expressed SHOX2 genes. We then set the following parameters: select genomic profiles [mutations, putative copy number alterations from GISTIC, and mRNA expression z-scores relative to diploid samples (RNA Seq V2 RSEM)], select patient/case set (all samples), and enter genes (user-defined list, SHOX2). Subsequently, we entered the co-expression interface and acquired the genes correlated with SHOX2 in satisfactory samples from these datasets. These data were downloaded, and the next operating steps were followed.

Using the Venn Diagram Tool
We selected the first 300 SHOX2 correlated genes from the previous steps with a higher Spearman correlation coefficient and p < 0.05. According to the applied cBioPortal database, one gene list was named "TCGA, Firehose Legacy" and the other gene list was named "TCGA, Nature 2014." We input the two lists into the upload lists. Then, we acquired the Venn diagram (http:// bioinformatics.psb.ugent.be/webtools/Venn/) of SHOX2 coexpressed related genes.

Functional and Pathway Enrichment Analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID) (version 6.8; https://david.ncifcrf.gov/) is a biological information database that integrates biological data and analysis tools to provide systematic and comprehensive annotated biological function information for large-scale gene or protein lists to help users extract biological information from them (Huang da et al., 2009). We used DAVID to perform Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of the selected common SHOX2 co-expressed related genes screened by the two datasets. GO enrichment analysis consisted of cellular component (CC), biological process (BP), and molecular function (MF) analyses. Following these steps, we uploaded these genes, selected the identifier "OFFICIAL_GENE_SYMBOL," selected "Homo spaiens," selected "Gene List," submitted the list, and finally obtained functional annotation charts. We then acquired the GO functional enrichment and KEGG pathway enrichment results. Calculated via Fisher's exact test, a p-value < 0.05 and a count >2 were considered statistically significant.

SHOX2 mRNA Expression Profile in Multiple Cancers
To determine the expression level of SHOX2 in various types of tumors, we examined the differential expression of SHOX2 between tumor tissue and paired normal tissue using Oncomine and GEPIA online analysis tools. The Oncomine database has a total of 320 unique analyses for SHOX2. There are 52 significant unique analyses among them that showed SHOX2 upexpression, while 15 analyses showed SHOX2 downexpression ( Figure 1A). SHOX2 expression was higher in most cancers, including sarcoma, brain and CNS cancer, head and neck squamous carcinoma (HNSC), and lung cancer. Meanwhile, it was downregulated in breast cancer, leukemia, and many other cancers ( Figure 1A). For further verification, we used the GEPIA web tool to compare the differential expression of SHOX2 between cancerous and normal tissues. As shown in Figure 1B, higher SHOX2 expression was observed in glioblastoma (GBM), HNSC, and LUSC. In addition, lower SHOX2 expression was observed in breast invasive carcinoma (BRCA), acute myeloid leukemia (LAML), and testicular germ cell tumors (TGCTs). In summary, the results from the two databases were consistent, supporting the evidence that SHOX2 should be an oncogene.

Analysis of SHOX2 Expression in LUAD/ LUSC and Normal Tissues
In order to explore SHOX2 mRNA expression and its relationship with clinical features, we addressed the GSE33532 data using GraphPad Prism 7 and used UALCAN to analyze the relative clinical data of LUAD/LUSC patients from TCGA. The SHOX2 mRNA expression profiling dataset of NSCLC showed that SHOX2 is highly expressed in both LUAD and LUSC tissues compared to paired normal tissues ( Figure 2A). The difference in SHOX2 expression was statistically significant. This result was verified by an expanded sample size analysis using the online analysis database UALCAN. As shown in Figure 2B, we obtained consistent results.

The Relationship of SHOX2 Expression With the Clinical Characteristics of LUAD Patients
In recent decades, the incidence of LUAD has been higher than that of LUSC. Moreover, there are few reports on the relationship between SHOX2 expression and the clinicopathological features of LUAD. Therefore, we used the UALCAN online analysis tool to address SHOX2 mRNA expression in different subgroups having multiple clinicopathological features from LUAD samples from TCGA. The SHOX2 mRNA expression level was significantly higher in LUAD patients than in healthy people in subgroup analyses based on sex, age, smoking habits, stage, nodal metastasis, and TP-53 mutation status ( Figures 3A-F). In addition, subgroup analyses showed that SHOX2 expression was higher in patients aged 20-40 years, with a history of smoking, in stage 2, and with nodal distant metastasis, compared to other clinical features ( Figure 3). The pathology of disease can help clinicians to clearly diagnose the disease and take reasonable and effective treatment measures. UCSC Xena was used to analyze the changes of SHOX2 expression in different histological subtypes of invasive LUAD from a cohort of TCGA LUAD (n 706) ( Figure 3G). The results showed that SHOX2 expression varied significantly among different histological subtypes of LUAD (one-way ANOVA, p 0.0001271, f 4.098). SHOX2 expression was higher in subtypes prone to lymph node metastasis, prone to recurrence, with poor differentiation, and with poor prognosis.
Furthermore, to confirm the prognostic value of SHOX2 in LUAD, UCSC Xena was searched to investigate the effects of SHOX2 expression, methylation, and CNV on overall survival (OS) and disease-specific survival (DSS). It was confirmed that high SHOX2 mRNA expression was significantly associated with decreased OS and DSS time in LUAD (p < 0.05); DNA methylation and copy number variation had no significant effect on OS (p > 0.05), and a higher copy number indicated a shorter DSS (p < 0.05). The SHOX2 expression level and DNA methylation level had no significant correlation with DSS (p > 0.05) (Figure 4).
Therefore, these results suggest that SHOX2 might be a driving factor in the development of LUAD and may act as a potential diagnostic and prognostic indicator in LUAD. Influencing-Factor Analysis of SHOX2 Gene Expression in LUAD To determine the main influencing factors of SHOX2 gene expression, we used several online tools performing mutation, methylation, and CNV alteration analysis. We used three methylation analysis tools (UCSC Xena, SurvivalMeth, and MethSurv) to probe the SHOX2 methylation level in LUAD patients, from different perspectives. The methylation level of SHOX2 was examined in TCGA LUAD patients, based on age, race, ethnicity, alive or dead events, SHOX2 mRNA expression, CpG island-related elements, and transcript-related elements. From Figure 5 and Table 1, we can conclude that DNA methylation of SHOX2 was elevated in LUAD tissues compared with that in normal lung tissues. As shown in Figure 5, SHOX2 DNA was only locally methylated. To explore the DNA-methylated specific site of SHOX2 and verify the acquired results, we conducted an in-depth study using TCGA LUAD patient data in SurvivalMeth and MethSurv. In LUAD tissues, SHOX2 DNA methylation mainly occurred in the gene body, at the hypomethylation level (p < 0.05) ( Figures 5 and  6, Tables 1 and 2). According to previous research results, hypomethylation of the gene body leads to the high expression of oncogenes, which is opposite to the hypermethylation of the gene promoter for tumor suppressive genes (Yang et al., 2014). Therefore, hypomethylation of the SHOX2 gene body might be one of the main factors causing mRNA expression. In conclusion, in LUAD tissue, the mRNA expression of SHOX2 might be correlated with the DNA methylation level, followed by CNV, but not with gene mutation ( Figure 5).

Enrichment Analysis of Correlated Genes
With SHOX2 in LUAD Finally, to explore the possible signaling pathways from the genes correlated with SHOX2, we analyzed mRNA sequencing data from 816 LUAD patients in two TCGA research cohorts: Firehose Legacy (586 samples) and Nature 2014 (230 samples). As shown in the Venn diagram ( Figure 7A), 139 significantly correlated genes with SHOX2 appeared in both cohorts (p < 0.05). This suggests that SHOX2 has a widespread impact on the transcriptome. We then performed functional enrichment analysis of SHOX2 and these 139 common significantly correlated genes using the DAVID database. The top 10 GO terms of these genes, according to the gene counts, are shown in Figure 7B, including biological processes (BPs), cellular components (CCs), and molecular functions (MFs). The BP of SHOX2 was mainly associated with positive regulation of cell migration, cell division, skeletal system development, and mitotic nuclear division. The CC of SHOX2 was mainly located in the extracellular space. The MF of SHOX2 mainly focuses on the structural constituents of the extracellular matrix and serine-type endopeptidase activity. Surprisingly, the significant KEGG pathways included the PI3K-Akt signaling pathway, ECM-receptor interaction, protein digestion and absorption, and amebiasis (p < 0.05).

DISCUSSION
Early detection, early diagnosis, and early treatment are of key importance in preventing disease development. At present, conventional diagnostic methods for lung cancer include computed tomography imaging and cellular/histopathological examination. Emerging molecular diagnostic methods are being increasingly applied in clinical practice because they are more sensitive and objective. DNA methylation alteration is considered an early predictor of cancer and can be detected during the early stages of tumorigenesis Dirks et al., 2016;Wang et al., 2018;Srisuttee et al., 2020). Because of the hypermethylation of SHOX2 in lung cancer, methylation detection has been applied to assist in the early diagnosis of unknown pulmonary nodules (Kneip et al., 2011;Konecny et al., 2016;Weiss et al., 2017;Shi et al., 2020). Many studies have reported that SHOX2 is highly expressed in many cancers. The specific role of SHOX2 in lung cancer patients and the molecular mechanism of its action are unknown. As reported in detail in our previous review , we synthesized the results of existing studies on SHOX2 and its related homonymous genes in mice, lung cancer and other tumors and concluded that SHOX2 might play an important role in tumorigenesis, metastasis, and recurrence in lung cancer. Here, we used bioinformatics analysis to verify the above conclusions.
In our study, we found that the mRNA expression of SHOX2 was higher in multiple cancers than in paired normal tissues, including LUAD and LUSC tissues. Among the LUAD patients, SHOX2 expression was higher in patients of middle-young age, with smoking history, in advanced stages, and with nodal distant metastasis. In addition, our results showed that patients with high expression of SHOX2 are prone to recurrence, poor differentiation, and poor prognosis. Here, we identified that SHOX2 plays a negative role in LUAD progression and could act as an oncogene. Next, we performed SHOX2 methylation analysis. Finally, we found that SHOX2 undergoes hypomethylation in the gene body. The main factors for SHOX2 mRNA expression are the DNA methylation and CNV, but not gene mutations in LUAD. Additionally, SHOX2 has cross talk in the PI3K−Akt signaling pathway and in ECM-receptor interactions.
The potential role of SHOX2 in patients with lung cancer may have been explained by many previous studies. We identified that the mRNA expression of SHOX2 could assist in indicating subsets of LUAD with worse prognosis and survival. Zhang and Zhou also reported that SHOX2 was an indicator for identifying subgroups with worse prognosis in lower-grade gliomas, which is consistent with our result (Zhang Y.A. et al., 2016). Additionally, there have been other similar reports of SHOX2 methylation as a predictor of malignancy (Leiro et al., 2019;Xu et al., 2020).
Tumor cell genomes showed global hypomethylation and local hypermethylation. Local hypermethylation may occur on the promoter CpG island of a tumor suppressor gene to inhibit the expression of this gene, or on the gene body CpG island of an oncogene to induce high expression of this gene (Yang et al., 2014;Gagliardi et al., 2018). Downregulated TSGs or upregulated oncogenes are critical in tumorigenesis. According to recent research, SHOX2 is considered an oncogene. Unexpectedly, the present majority of related reports declare that SHOX2 methylation occurs on the promoter CpG island, which can seem incomprehensible. To explain the discrepancy of high methylation of SHOX2 in cancer tissue and to identify whether methylation occurs at the gene promoter, we conducted further research to try to explain this phenomenon. We used UCSC XENA, SurvivalMeth, and MethSurv to obtain methylation data for all methylation sites of SHOX2 and then analyzed them statistically using Excel. Interestingly, we observed that the SHOX2 methylation is in a hypomethylated state, and methylation mainly occurs at the gene body instead of at the gene promoter. In summary, SHOX2 is highly expressed, and its genomes are hypomethylated in LUAD, which might be the main mechanism of gene expression. Therefore, the current description of SHOX2 promoter hypermethylation may not be rigid. The regulation of gene expression is the molecular basis of cell differentiation, morphogenesis, and ontogenesis in vivo. The influencing factors of gene expression include one or two of the following alterations in most cases: local mutation, CNV, DNA methylation, and the expression level of master transcription factors on another chromosome. We found that hypomethylation of the SHOX2 gene body may be one of the main mechanisms driving the upregulation of SHOX2, and CNV is also one of them. Schneider et al. (2011) and may agree with our findings.
According to our enrichment analysis of SHOX2, the BP of SHOX2 was mainly associated with positive regulation of cell migration, cell division, skeletal system development, and mitotic nuclear division, and the significant KEGG pathways included the PI3K−Akt signaling pathway, ECM−receptor interaction, protein digestion and absorption, and amebiasis. These findings are FIGURE 6 | Heat map of methylation levels of the SHOX2 methylated probe. Red to blue: high expression to low expression. Various colorful side-boxes are used to characterize the event, age, race, ethnicity, UCSC_refGene_Group, and relation to UCSC_CpG_island. Note: DNA methylation level is represented by data value (0 represents unmethylated, 0.25-0.3 represents hypomethylated, 0.5-0.7 represents hypermethylated, and 1 represents completely methylated).
Frontiers in Molecular Biosciences | www.frontiersin.org June 2021 | Volume 8 | Article 688274 consistent with the fact that SHOX2 is considered an oncogene and a diagnostic and predicted biomarker. This is critical for understanding how SHOX2 expression alter and how SHOX2 lead to cancers such as LUAD.
In this study, we combined multiple online bioinformatics data analysis platforms and tools to provide a statistical analysis of the correlation between SHOX2 expression and clinical characteristic parameters, clinical prognosis, DNA methylation, CNV, mutations, co-expressed genes, and related signaling pathways in LUAD. SHOX2 is highly expressed in most cancers. SHOX2 gene expression may be mainly regulated by methylation of the gene body in LUAD, and its high expression or hypomethylation indicates poor differentiation and poor prognosis. SHOX2 is involved in PI3K−Akt and other important cancer-related signaling pathways to promote tumorigenesis.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
NL, YZ, MT, and JH conceptualized and designed the idea and analyzed and interpreted the data. JH provided administrative support. NL, YZ, MT, BL, DZ, YL, XR, XZ, LL, and HW contributed to the provision of study materials or patients and collected and assembled the data. NL and YZ wrote the manuscript. All authors approved the final manuscript.

FUNDING
This work was supported by the National Natural Science Foundation of China (81572610) and the "Yangfan Plan" for Outstanding Scholars in Guangdong Province (4YF16002G) which all are from JH.

ACKNOWLEDGMENTS
Thanks are due to Bo Zhao (Official Wechat Account: SCIPhD) of ShengXinZhuShou for suggestions on the manuscript.