Long Noncoding RNA SNHG7 Is a Diagnostic and Prognostic Marker for Colon Adenocarcinoma

Numerous studies have shown that long noncoding RNAs (lncRNAs) play a critical role in the malignant progression of cancer. However, the potential involvement of lncRNAs in colon adenocarcinoma (COAD) remains unexplored. In this study, the expression of lncRNA SNHG7 in colon cancer tissues and its correlation with clinical characteristics were analyzed based on data from The Cancer Genome Atlas (TCGA) database. SNHG7 was found to be highly expressed in 17 types of cancer, including COAD. Next, TCGA data were further investigated to identify differentially expressed genes, and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses were performed. In addition, the relationship between SNHG7 expression and clinical features were analyzed. SNHG7 expression was found to be a potentially valuable indicator for COAD diagnosis and prognosis. Finally, gene set enrichment analysis showed that SNHG7 may affect lupus erythematosus and reactome cellular senescence, possibly influencing the prognosis of patients with COAD. Altogether, these results suggest that SNHG7 may be associated with the occurrence and development of COAD, having potential diagnostic and prognostic value.


INTRODUCTION
Colon adenocarcinoma (COAD) is the second most lethal malignancy worldwide, which is currently treated surgically and/or using chemotherapy and radiotherapy (1). Although the overall survival (OS) rate has improved, invasion and metastasis remain the main death cause among patients with COAD (2). Extensive studies have shown that tumor biomarkers are highly sensitive and specific for diagnosing and monitoring tumors (3). Therefore, there is a critical need to identify new diagnostic or prognostic biomarkers and develop novel therapeutic strategies for COAD.
Long noncoding RNAs (lncRNAs) are a class of noncoding RNA transcripts that are over 200 nucleotides in length. The dysregulation of lncRNAs is closely related to various major diseases, including cancer (4). Many studies have shown that cancer-associated lncRNAs are involved in the regulation of tumor proliferation, invasion, and metastasis; thus, are considered to be a class of potential candidate biomarkers for cancer diagnosis and therapy (5). For example, the lncRNA HOTAIR is an oncogene that is upregulated in breast cancer tissues and is closely related to poor prognosis and tumor metastasis (6). MALAT1 (metastasisassociated lung adenocarcinoma transcript 1) is a lncRNA that was originally found to be abundantly expressed in metastatic carcinoma cells and to be significantly upregulated in various types of cancer, such as breast cancer (7) and non-small cell lung cancer (8), being suggested as a prognostic biomarker and potential therapeutic target for metastatic cancers (9). H19 is an estrogen-regulated lncRNA transcript whose aberrant expression is closely associated with cell proliferation and migration in a variety of cancers, such as gastric, gallbladder, and pancreatic cancers (10). Although lncRNAs have been broadly recognized to play important regulatory roles in human cancers, few have been demonstrated to function in COAD, and most of their mechanisms are largely unknown.
Small nucleolar RNA host genes (SNHGs) are newly recognized lncRNAs that have oncogenic roles in various cancers (11). Members of the SNHG family have been shown to regulate cellular proliferation, apoptosis, invasion, and migration in multiple cancers (12). LncRNA SNHG7 is closely related to the occurrence, development, and carcinogenesis potential of numerous cancers, including lung, gastric and cervical cancer, as well as renal cell carcinoma and hepatocellular carcinoma (13)(14)(15). Nevertheless, few reports have explored the impact of SNHG7 on COAD. This study aimed to investigate the relationship between the expression of SNHG7 and the prognosis of COAD using bioinformatics tools.

Data Collection
RNA sequencing data from 521 COAD samples and associated clinical information were obtained from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). Another RNA sequencing data of 698 COAD samples and clinical information were also included for validation. RNA sequencing data were converted from fragments per kilobyte per million (FPKM) to the transcripts per million reads (TPM) format, and compared according to the corresponding clinicopathological information. As all data collected was publicly available, informed consent and ethical approval were not necessary to obtain.

Clinical Significance and Correlation of SNHG7 Expression in COAD Patients
To clarify the association between SNHG7 expression and clinical features of COAD, Wilcoxon signed-rank test and logistic regression were performed. The detailed clinicopathological characteristics of the patients with COAD are listed in Table 1.
To assess the predictive potential of SNHG7 for COAD diagnosis, SNHG7 expression in COAD and normal tissues was compared using receiver operating characteristic (ROC) analysis. COAD and corresponding normal tissue data were obtained from the TCGA database. The analysis was performed using the R package "pROC" (version1.17.0.1), and the visualization was achieved using "ggplot2" (version 3.3.3).
Kaplan-Meier analysis, and univariate and multivariate Cox regression analyses were used for prognosis analysis. Nomograms were created using the R packages "rms" (version 6.2-0) and "survival" (version 3.2-10). R (v3.6.3; R Foundation for Statistical Computing, Vienna, Austria) was used to conduct all statistical studies, with p-values below 0.05 deemed significant.

Screening of Differentially Expressed Genes (DEGs), and Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Analyses
COAD gene expression data in the HTSeq-TPM format were obtained from TCGA for analysis. SNHG7 coexpressed genes were screened using Pearson correlation coefficients (|r| > 0.4 and p < 0.001). To explore the possible biological functions and signaling pathways affected by SNHG7, the R package "cluster Profiler" was used to perform GO and KEGG analyses of coexpressed genes, with p < 0.05 deemed statistically significant. GO analysis included biological processes (BP), cell composition (CC), and molecular function (MF).

Gene Set Enrichment Analysis (GSEA)
GSEA is a computational method used to determine whether an a priori defined gene set exhibits statistically significant and consistent differences between two biological states (16). In the present study, we elucidated the survival differences between groups with high and low SNHG7 expression using GSEA. Gene set permutations were performed 1,000 times for each analysis. The expression of SNHG7 was used as the phenotypic label. The nominal p-value and normalized enrichment score (NES) were used to identify the pathways enriched for each phenotype.

Expression Profiles of SNHG7 in Pan-Cancer Datasets
Based on TCGA data analysis, we found that SNHG7 was upregulated in 17 of the 33 cancer types investigated, including cholangiocarcinoma (CHOL), prostate adenocarcinoma (PRAD), and thyroid carcinoma (THCA) ( Figure 1A). Further analysis showed that SNHG7 expression was much higher in patients with COAD than that in normal tissues (p < 0.001, Figure 1B). These findings indicate that SNHG7 may play a significant regulatory role in the progression of COAD.

Clinical Correlation Analyses
Clinical information, including sex, age, race, T stage, N stage, residual tumor, perineural invasion, lymphatic invasion, OS, and disease-specific survival (DSS) ( Table 1), for 521 COAD patients was obtained from TCGA database. SNHG7 expression was not only significantly correlated with race (p < 0.05) and residual tumor (p < 0.05), but was also closely correlated with OS (p < 0.05) and DSS (p < 0.01). No correlation was observed between SNHG7 expression and the other clinicopathological characteristics.

Diagnostic Value of SNHG7 in COAD Patients
ROC curves were used to evaluate the potential of SNHG7 expression to identify patients with COAD.   Table 3). N stage (HR: 2.933, 95% CI: 0.218-39.407, p = 0.021) was an independent risk factor for DSS. Based on the significant prognostic factors identified in the Cox regression analysis, prognostic nomograms were designed. Age, pathological stage, perineural or lymphatic invasion, and SNHG7 expression were included in the nomogram to predict OS (C-index = 0.836) ( Figure 3C) and DSS (C-index = 0.875) ( Figure 3D). These results indicated that SNHG7 expression was not only significantly upregulated in COAD but also had prognostic value, suggesting that SNHG7 has important regulatory functions in this type of cancer.
Coexpressed Genes of SNHG7 and Functional Annotation of SNHG7-Associated DEGs in COAD To screen for coexpressed genes of SNHG7, Pearson correlation coefficients were set as |r| > 0.4 and p < 0.001. The top 20 positively and negatively correlated coexpressed genes of SNHG7 are displayed in the form of a heatmap ( Figure 4).
Next, we performed GO and KEGG analysis of SNHG7associated DEGs in COAD. GO analysis demonstrated that genes of GO-BP terms were significantly enriched in viral gene expression, viral transcription, establishment of protein localization to the endoplasmic reticulum (ER), and protein targeting to the ER. For GO-CC terms, the genes were mainly located in the cytosolic part, ribosome, ribosomal subunit, and the cytosolic ribosome. In GO-MF analysis, genes were enriched in the structural constituent of ribosomes, 5′-3′ RNA polymerase activity, DNA-directed 5′-3′ RNA polymerase activity, and RNA polymerase II activity (Table 4; Figure 5A). As shown in Figure 5B; Table 5, KEGG pathway analysis indicated that the top pathways were mainly associated with Huntington's disease, ribosome, amyotrophic lateral sclerosis, spliceosome, and RNA polymerases.

GSEA of SNHG7
To further clarify biological functions of SNHG7 in COAD, GSEA enrichment analysis was performed on the high and low expression datasets of SNHG7. Significant differences (false discovery rate < 0.25, adjusted p < 0.05) were observed in the enrichment of the MSigDB Collection (c2.cp.v7.2.symbols.gmt). The most markedly enriched signaling pathways were screened based on their NES ( Figures 6A, B). The results illustrated that

Validation of Differential Expression, Prognostic and Diagnostic Value of SNHG7 in Other Independent Cohorts of COAD
To validate the prognostic robustness and clinical reproducibility of SNHG7, an independent cohort available at TCGA database, comprising 698 samples, was also analyzed. As shown in Figure 7A, SNHG7 expression was significantly upregulated in the tumor group as compared with normal tissues (p < 0.001). Similarly, ROC curve analysis also indicated that SNHG7 had a very high diagnostic value in COAD (AUC = 0.911, 95% CI: 0.879−0.943) ( Figure 7B). Analysis of the OS ( Figure 7C; HR: 1.55, p = 0.014) and DSS ( Figure 7D; HR: 2.04, p = 0.002) of these patients further suggested that high SNHG7 expression was correlated with poor prognosis in COAD. These results were consistent with the conclusions of SNHG7 in cohort of 521 samples, indicating that the diagnostic and prognostic value of SNHG7 in COAD is credible and reproducible.

DISCUSSION
Currently, the 5-year survival rate of early COAD exceeds 70−90%; nonetheless, the curative effect for advanced COAD is still not ideal, which is mainly due to its high recurrence and metastasis rate (17,18). Therefore, the development of biomarkers aiding early differential diagnosis and predicting COAD progression is of major importance both for research and therapeutic evolution (19). It has been established that lncRNAs may be potential diagnostic and/or prognostic markers for clinical applications. In particular, many lncRNA biomarkers were reported for colorectal cancer (20,21). SNHG7, which is a member of the SNHG family, is differentially expressed in various malignant tumors (13)(14)(15). Noteworthy, recent studies revealed that SNHG7 has a regulatory role in colorectal cancer. For example, SNHG7 is an oncogenic biomarker in COAD, and it interacts with miR-193b (22) and positively regulates GALNT1 levels through sponging miR-216b in colorectal cancer (23). Moreover, SNHG7 and FAIM2 are upregulated in colorectal cancer tissues compared with normal adjacent tissues (24). However, the possible clinical significance and prognostic/diagnostic value of SNHG7   in COAD remain unclear. Therefore, the development of new and effective biomarkers for the prognosis and early diagnosis of COAD would be beneficial to enhance the treatment and prognosis of patients.
To gain a comprehensive understanding of the role of SNHG7 in COAD, we first identified the differential expression of SNHG7 using publicly available pan-cancer data. We confirmed that SNHG7 is differentially expressed in multiple tumors; in particular SNHG7 expression was significantly upregulated in COAD compared with other tumors. These findings suggest that SNHG7 differential expression may be tissue-specific and it may have an important regulatory role in COAD.
To further test our hypothesis, we analyzed the clinical relationship of SNHG7 in COAD by univariate and multivariate Cox regression analyses. We discovered a strong associated between SNHG7 expression and race, residual tumor, OS, and DSS of COAD  patients, with SNHG7 expression appearing to be higher in patients with certain characteristics, such as specific race and with residual tumor. Moreover, we demonstrated that high SNHG7 expression was associated with significantly shorter OS and DSS in COAD patients, but was also an independent risk factor for OS and DSS. Histopathological characteristics have been implicated as prognostic predictors, such as tumor stages, perineural invasion, and lymphatic invasion (25). Our results also confirmed that these three prognostic predictors were closely related to poor prognosis for OS and DSS in COAD patient with high SNHG7 expression. Noteworthily, in line with our SNHG7based predicted outcome, univariate Cox regression analysis showed that high CEA levels, which are an independent prognostic factor and can be used for  TNM staging of COAD, reflected poor prognosis in COAD (26). Hence, the remarkable predictive ability of SNHG7 expression suggests its potential as a prognostic biomarker of poor survival in COAD.
In addition, we explored the potential functions and underlying mechanisms of action of SNHG7 in COAD. GO and KEGG analyses revealed that both ribosome and RNA polymerase were closely related to SNHG7 based on the functional annotation of SNHG7-related DEGs. These results also indicated that SNHG7 expression is closely associated with COAD.
The accuracy of a diagnostic tool is based on the area under the ROC curve; the closer the area under the ROC curve is to 1, the better the diagnostic potential of the tool (27). Our results consistently revealed that high SNHG7 expression led to advanced COAD, indicating that SNHG7 expression had high sensitivity and specificity for COAD diagnosis. Assessment of an independent COAD cohort further confirmed the differential expression of SNHG7, and its diagnostic and prognostic value in COAD, indicating that SNHG7 is reliable and reproducible as a prognostic and diagnostic biomarker of COAD.

CONCLUSIONS
In conclusion, this study demonstrated that COAD is associated with high SNHG7 expression and that SNHG7 is a reliable biomarker for the diagnosis and prognosis of COAD. Hence, these findings may represent new foundations for the development of enhanced diagnostic and prognostic strategies for COAD.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The TCGA database. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
MH: design. CJ and SQ: acquisition of data. CJ, SQ, and TL: analysis and interpretation of data. CJ and MH: writing, review, and/or revision of the manuscript. SQ and MH: study supervision. All authors contributed to the article and approved the submitted version.