SNRPD1/E/F/G Serve as Potential Prognostic Biomarkers in Lung Adenocarcinoma

Objectives: Sm proteins (SNRPB/D1/D2/D3/E/F/G), involved in pre-mRNA splicing, were previously reported in the tumorigenesis of several cancers. However, their specific role in lung adenocarcinoma (LUAD) remains obscure. Our study aims to feature abnormal expressions and mutations of genes for Sm proteins and assess their potential as therapeutic targets via integrated bioinformatics analysis. Methods: In this research, we explored the expression pattern and prognostic worth of genes for Sm proteins in LUAD across TCGA, GEO, UALCAN, Oncomine, Metascape, David 6.8, and Kaplan-Meier Plotter, and confirmed its independent prognostic value via univariate and multivariate cox regression analysis. Meanwhile, their expression patterns were validated by RT-qPCR. Gene mutations and co-expression of genes for Sm proteins were analyzed by the cBioPortal database. The PPI network for Sm proteins in LUAD was visualized by the STRING and Cytoscape. The correlations between genes for Sm proteins and immune infiltration were analyzed by using the “GSVA” R package. Results: Sm proteins genes were found upregulated expression in both LUAD tissues and LUAD cell lines. Moreover, highly expressed mRNA levels for Sm proteins were strongly associated with short survival time in LUAD. Genes for Sm proteins were positively connected with the infiltration of Th2 cells, but negatively connected with the infiltration of mast cells, Th1 cells, and NK cells. Importantly, Cox regression analysis showed that high SNRPD1/E/F/G expression were independent risk factors for the overall survival of LUAD. Conclusion: Our study showed that SNRPD1/E/F/G could independently predict the prognostic outcome of LUAD and was correlated with immune infiltration. Also, this report laid the foundation for additional exploration on the potential treatment target’s role of SNRPD1/E/F/G in LUAD.


INTRODUCTION
Lung cancer is the most well-known sort of malignant tumor worldwide and is the major cause of cancer mortality (Sung et al., 2021). Lung adenocarcinoma (LUAD) has been the most common subtype of Non-small cell cancer (NSCLC) (Anonymous, 2015). LUAD is characterized by a lack of early clinical symptoms, a high rate of distant metastasis and drug resistance, which pose serious challenges to clinical treatment. Currently, treatment methods for LUAD mainly include surgical resection, radiotherapy, chemotherapy, immunotherapy, and molecular targeted therapy (Reck and Rabe, 2017). The best treatment for lung cancer is surgical resection in the early stages of lung cancer. However, the early symptoms of the disease are not obvious, easy to be ignored, which leads to a late diagnosis. The treatment of advanced lung adenocarcinoma is limited, and molecular targeted therapy is a promising choice, as well as immunotherapy. However, because of the lack of effective molecular targets, most drugs remain ineffective in the treatment of LUAD patients, of whom the 5-year survival rate is just 15% . Hence, it is absolutely necessary to recognize effective and dependable biomarkers to determine poor prognoses and direct treatment strategies.
The spliceosome is a ribonucleoprotein (RNP) with a complex ring-shaped structure, which mainly consists of small nuclear ribonucleoproteins (snRNPs), which is involved in splicing the pre rna into mature mRNA. Seven Sm proteins (SNRPB, SNRPD1, SNRPD2, SNRPD3, SNRPE, SNRPF, and SNRPG) and a small eponymous small nuclear RNA (snRNA) compose snRNPs (Chari et al., 2008). Accurate splicing is essential for normal cellular functions like cell proliferation, apoptosis, migration, and invasion. Sm proteins, involved in the formation of anti-Sm antibodies, were significant diagnostic biomarkers in autoimmune diseases, such as systemic lupus erythematosus (SLE). Furthermore, priors works have illustrated that the aberrant expressions of genes for Sm proteins are related to some human cancers, including cervical cancer (Zhu et al., 2020), glioblastoma (Correa et al., 2016), breast cancer (Dai et al., 2021), and hepatocellular carcinoma (Zhan et al., 2020).
Up to now, there have been limited studies investigating the connection between the abnormal expressions of genes for Sm proteins and LUAD. For instance, a report (Liu et al., 2019) showed that SNRPB down-regulation inhibited the growth and metastasis of NSCLC cells via RAB26 down-regulation. In the study directed by Valles et al. (2012), SNRPB and SNRPE were proved to be related to poor survival in LUAD patients. However, the precise functional roles of Sm proteins in LUAD are unclear yet. Our study aimed to present abnormal expressions and mutations of genes for Sm proteins and assess their potential as therapeutic targets via integrated bioinformatics analysis, which may be helpful to the treatment of LUAD patients.

Comparison of Genes for Sm Proteins Expression Levels Between LUAD and Corresponding Normal Tissue
The Oncomine database (https://www.oncomine.org) (Rhodes et al., 2004) was used to obtain the data of mRNA expressions for Sm proteins in various cancers, and these data were analyzed via Student's t-test. And we defined the following threshold: p-value<0.001, fold change >2 and 10% of most highly ranked genes. Furthermore, we compared expression levels of genes for Sm proteins between lung adenocarcinoma samples and normal samples in GSE40791. Moreover, the transcription levels of these genes were validated in the GEPIA 2 database (http://gepia2.cancerpku.cn) (Tang et al., 2017), including tumor and normal samples from the TCGA and the Genotype-Tissue Expression Project (GTEx) database (https://gtexportal.org/home/) and the TCGA-LUAD data performed by R software.
Cell Culture and RT-qPCR BEAS-2B cell line (human bronchial epithelial cell line) was purchased from the American Type Culture Collection (Manassas, VA, United States), A549 cell line was purchased from the Type Culture Collection of the Chinese Academy of Sciences, Shanghai, China. Two types of cells were cultured in DMEM with 10% FBS, penicillin (50 U/ml), and streptomycin (50 U/ml). The cells were incubated at 37°C with 5% CO 2 .
Total RNA of the above cells was extracted using TRIzol (Thermo, United States), and first-strand complementary DNA (cDNA) synthesis from total RNA was carried out using the GoScript Reverse Transcription System (Promega, United States). RT-qPCR was conducted using an AriaMx Real-Time PCR machine (Agilent Technologies, United States) with TB GreenPremix ExTaq II (Takara Bio, Japan). The primer sequences are shown in Supplementary Table S2. RT-qPCR cycle conditions: 3 min at 95°C, 40 cycles of 15 s for 95°C and 60 s for 60°C. Data was normalized to the house-keeping gene GAPDH. The relative gene expression was performed by the 2 −ΔΔCt method (Livak and Schmittgen, 2001).

Correlations Between Genes for Sm Proteins and Clinicopathological Parameters
In our study, the transcription levels of genes for Sm proteins in TP53 mutation of LUAD patients were analyzed with the lung adenocarcinoma dataset using the UALCAN database (http://ualcan. path.uab.edu) (Chandrashekar et al., 2017). Moreover, we explored the correlation between genes for Sm proteins gene expressions and tumor stages via the "Expression DIY module" of GEPIA 2 and selected 50 neighboring genes related to genes for Sm proteins using the "Similar Genes Detection" module.

Survival Analysis
The Kaplan-Meier plotter (http://kmplot.com) (Gyorffy et al., 2013;Nagy et al., 2021), including mRNA expression data and patients' clinical information from GEO and TCGA, is a powerful online tool to further verify the prognostic value of genes in several cancers. The overall survival (OS), free progression (FP), and post-progression survival (PPS) curves of genes for Sm proteins in LUAD were shown via the Kaplan-Meier plotter. Meanwhile, we also explored if the dyregulation of genes for Sm proteins had any impacts on the OS of LUAD patients with smoke history by using the Kaplan Meier plotter. In addition, univariate and multivariate cox regression analysis of the TGCA-LUAD data was performed using the "survival" and the "survminer" R package.

Gene Mutations and Co-Expression Analysis
In this study, the lung adenocarcinoma dataset (TCGA, Firehose Legacy), including data from 230 complete samples of 586 patients, was selected and visualized as the map of gene mutations, expression heatmap, and co-expression map of genes for Sm proteins using the cBioPortal database (http:// www. cbioportal.org) (Cerami et al., 2012;Gao et al., 2013). The z-Score threshold was set to ±1.8.

Constructed Protein-Protein Interaction Network and Selected Hub Genes
The tool of STRING (https://string-db.org/) (Szklarczyk et al., 2019) was used to construct the protein-protein interaction (PPI) network between the seven genes for Sm proteins and their 50 frequently neighboring genes. All PPI pairs with a combined score of >0.4 were extracted. Then, we used the "CytoHubba" plugin (v0.1) (Chin et al., 2014) of Cytoscape (v3.8.2) (https://cytoscape.org) (Otasek et al., 2019) to identify hub genes in that PPI network. Furthermore, the online tool of Metascape (http://metascape.org) (Zhou et al., 2019) was used to analyze the MCODE components of that PPI network.

GO and KEGG Enrichment Analyses
Functional enrichment analyses, including gene ontology (GO) analysis comprising cellular component (CC), molecular function (MF), and biological process (BP), and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis, were performed via the tool of DAVID 6.8 (https:// david.ncifcrf.gov) (Huang da et al., 2009). Subsequently, the specific enriched terms for GO and KEGG enrichment analysis were visualized using the "clusterProfiler" package (Yu et al., 2012) in R software. Moreover, we presented the network of enriched terms colored by cluster ID using the online tool of Metascape.

Associations Between Genes for Sm Proteins and Immune Infiltration
Finally, we explore the associations between genes for Sm proteins and 24 immune cell types in TGCA-LUAD. We investigated dendritic cell (DC), activated dendritic cell (aDC), immature dendritic cell (iDC),plasmacytoid dendritic cell (pDC),

Statistical Analysis
One-way Analysis of Variance (ANOVA) in the GEPIA 2 database, the log-rank test in Kaplan-Meier survival analysis, and the Cox proportional risk models in univariate and multivariate analyses, were used in statistical analysis. R software (v4.0.2) (http://www.r-project.org) was used for analysis in this study. RT-qPCR data was performed in GraphPad Prism (v9.0.2) (San Diego, CA, United States) and presented as the mean ± S.D. Student's T-test was used for statistical analyses between the data pairs where appropriate. *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001 were considered to represent statistical significance.

RESULTS
The mRNA Levels of Sm Proteins Were Significantly Increased in Lung Cancer Tissues and LUAD Cell Lines From Figure 2A and Table 1, we could see that the mRNA levels of SNRPB/D1/D2/E/G were higher in lung cancer than in non-cancer tissues. The results were also consistent with the mRNA levels of Sm proteins were upregulated in LUAD compared to normal lung tissues ( Figure 2B). Moreover, the relative mRNA level of all Sm proteins in A549 cell line is higher than the relative mRNA level of them in normal lung cell lines, BEAS-2B ( Figure 2C). The transcription levels of genes for Sm proteins considerably increased in data of GEPIA 2 (TCGA-LUAD & GTEx), TCGA-LUAD data and pair LUAD data (Figures 3A-C) match the above results.

The Expressions of Genes for Sm Proteins Were Related to Clinicopathological Parameters of LUAD Patients
We research the correlations between the expressions of genes for Sm proteins, TP53 mutation and cancer stages via the UALCAN database and the GEPIA2 database. From Figure 4A, we could find that the mRNA expressions of genes for Sm proteins in LUAD with TP53 mutation were higher than those without TP53 mutation ( Figure 4A). Moreover, as can be seen in Figure 4B, the mRNA levels of SNRPE varied significantly across different LUAD tumor stages (F value = 3.68, Pr (>F)<0.05), whereas SNRPB/D1/ D2/D3/F/G had no significant association with LUAD stages (all Pr (>F) > 0.05).

The Upregulation of Genes for Sm Proteins is Related to Poor Survival Outcomes in LUAD Patients
Using Kaplan-Meier Plotter, we assessed the prognostic values of genes for Sm proteins in LUAD patients. From Figure 5A, we could find out high expressions of genes for Sm proteins were associated with short OS (all p < 0.05). In addition, high mRNA level of SNRPB/D1/D2/F (HR > 1, p < 0.05) were correlated with poor FP and PPS (all p < 0.05). However, highly expressed SNRPD3 was not obviously related to the FP of LUAD patients (p > 0.05).
As is well known, smoking is a common risk factor leading to lung cancer (Jassem et al., 2009). Hence, we explored if the upregulation of genes for Sm proteins had any impacts on the overall survival of LUAD patients. From Figure 5B, in the smoking group, the high expression of SNRPB significantly reduced the OS of LUAD patients. However, in the nonsmoking group, the expression of SNRPB had no statistical significance for the overall survival of patients with lung adenocarcinoma. This indicates that SNRPB and smoking factors together could promote the development of lung adenocarcinoma. However, the upregulation of SNRPD1 (HR: 2.36, p < 0.05 vs HR: 1.63, p < 0.05), SNRPD2 (HR: 3.29, p < 0.05 vs HR: 2.52, p < 0.05), and SNRPF (HR: 3.27, p < 0.05 vs HR: 2.18, p < 0.05) had worsening impacts on the OS of LUAD patients without smoking history. Noteworthy, only high expression of SNRPE (HR: 1.28, p < 0.05) was correlated to poor OS of lung squamous cell carcinoma (LUSC) (Supplementary Figure S1). More research is still needed to interpret potential mechanisms of the above-described events.
Moreover, we attempted to evaluate the independent prognostic values of genes for Sm proteins for OS of LUAD patients. The microarray data and clinical information (Supplementary

Constructed PPI Network and Selected Hub Genes
To explore the possible genes for Sm proteins protein-mediated biological pathways in lung adenocarcinoma, we constructed a protein interaction network based on the seven Sm proteins ( Figure 6E) and their 50 frequently neighboring proteins ( Figure 6F) via the STRING database. Meantime, we used Cytoscape software to find out these hub genes related to genes for Sm proteins-mediated biological pathways. As was shown in Figure 6G, eight hub genes, including PCNA, NUF2, UBE2T, CCDC59, ERCC1, CDCA3, QPCTC, and TROAP, were tightly correlated with the alterations of genes for Sm proteins. Furthermore, we extracted the three most meaningful MCODE components from the PPI network between Sm proteins and their 50 frequently neighboring proteins. As was shown in Figure 6H, Sm core complex, spliceosome, U2-SnRNP, chromosome segregation, chromosome, centromeric region, cell division, and nucleotide excision repair were related to biological function.

GO and KEGG Enrichment Analyses in LUAD Patients
Afterward, genes for Sm proteins and their 50 neighboring genes were analyzed via the tool of DAVID 6.8 for GO and KEGG functional enrichment analysis. The results were shown in Figure 7 and Supplementary Figure S3. Biological processes (BP) included spliceosomal snRNP assembly, histone mRNA metabolic process, negative regulation of nuclear division, nuclear division, and mitotic spindle assembly checkpoint ( Figure 7A). Cellular components (CC) suggested that SNRP genes existed mainly in methylosome, U4 snRNP, U2-type catalytic step 2 spliceosome, SMN-Sm protein complex, U1 snRNP, U2 snRNP, and U12-type spliceosomal complex ( Figure 7A). Molecular function (MF) indicated that SNRP genes were related to the structural constituents of catalytic activity, acting on DNA, ribonucleoprotein complex binding, DNA N-glycosylase activity, histone kinase activity, and ubiquitin-protein transferase regulator activity ( Figure 7A). KEGG pathway enrichment analysis suggested that spliceosome, base excision repair, nucleotide excision repair, cell cycle, dna replication, and mismatch repair were significantly associated with the alterations of sm proteins. The complex process of mRNA/RNA splicing was shown in Supplementary Figure S4. In addition, the KEGG chord plot showed that PCNA was related to 5 vital KEGG pathways. Thus, PCNA may be a very critical gene ( Figure 7B). To better understand the association among different enriched terms, the KEGG network of enriched terms, colored by cluster ID, was constructed via the online tool of Metascape ( Figure 7C).

Associations Between Genes for Sm Proteins and Immune Infiltration
High genes for Sm proteins expression was considerably related to abundance infiltration of immune cell in the tumor immune microenvironment (Figure 8). Thus, further research of genes for Sm proteins in 24 immune cell populations manifested that genes for Sm proteins had positive correlations to Th2 cells, Tgd cells, but an adverse correlation to Th1 cells, mast cells, and NK cells.

DISCUSSION
Lung cancer has been the major cause of tumor occurrence and mortality around the world (Sung et al., 2021). LUAD accounts for half part of lung cancer with an extremely low survival rate . Although the research on the biological characteristics of lung adenocarcinoma has made some progress in the past few decades, the tumor progression mechanism of lung adenocarcinoma is still unclear. Through literature review and online database bioinformatics analysis and experimental verification, this study verified that genes for Sm proteins are highly expressed in lung adenocarcinoma and have a significant impact on the prognosis of lung cancer. SnRNPs, mainly responsible for splicing the pre-RNA into mRNA, are composed of seven genes for Sm proteins and a small eponymous snRNA (Chari et al., 2008). Accurate splicing is essential to ensure normal cellular function like cell proliferation, apoptosis, migration, and invasion. This study covered many aspects, including differential expression analysis, association with clinicopathological parameters, survival analysis, gene mutations, co-expression, functional enrichment analysis, and correlation with immune infiltration, revealing the roles of SNRPB/D1/D2/D3/E/F/G in LUAD.
Previous study reported that SNRPB promotes the tumor formation of NSCLC by regulating RAB26 and SNRPB may predict response to cisplatin-based chemotherapy for NSCLC patients (Liu et al., 2019;Liu et al., 2021). Here, the mRNA expression of SNRPB was found to be upregulated in LUAD compared to non-cancer tissues. Also, SNRPB is highly expressed in cervical cancer (Zhu et al., 2020), glioma (Correa et al., 2016), and hepatocellular cancer (Zhan et al., 2020), which is a prognostic factor. The mechanistic study also reported that overexpressed SNRPB played a carcinogenic role in the progression of LUAD and was mediated by c-Myc (Peng et al., 2020). Moreover, SNRPB was highly correlated with Frontiers in Genetics | www.frontiersin.org March 2022 | Volume 13 | Article 813285 8 TP53 mutation, tumor grades, and cancer stages. Interestingly, our study revealed that SNRPB up-regulation was associated with poor OS in LUAD patients, especially in patients with smoking history. Taken together, our study revealed that SNRPB could promoted tumor progression in LUAD.
It has been generally accepted that SNRPD1 is related to SLE (Riemekasten et al., 1998). However, only a few studies have reported the role of SNRPD1 in human cancers. SNRPD1 downregulation was related to poorer survival in patients with ovarian cancer (Bao et al., 2020). SNRPD1 up-regulation contributed to breast cancer cell proliferation (Dai et al., 2021). Here, we found that SNRPD1 expression was significantly up-regulated in LUAD compared to normal tissues or normal cell lines. Furthermore, SNRPD1 expression was correlated with TP53 mutation, tumor grades, and cancer stages. Overexpression of SNRPD1 could be relevant to the poor OS in LUAD patients. Multivariate Cox regression analysis indicated that SNRPD1 was an independent prognostic factor for poorer OS of LUAD patients. Taken together, all of the above findings support the opinion that SNRPD1 plays an essential role in the carcinogenic effect of LUAD.
SNRPD2 is found closely associated with several cancers, including triple-negative breast cancer (TNBC) (Koedoot et al., 2021) and hepatocellular carcinoma. Interestingly, SNRPD2 and SNRPG were not only major pathogenic genes of Alzheimer's disease but also bridge genes (Tao et al., 2020). However, it is not yet clear to us the specific role that SNRPD2 plays in LUAD. Here, we report SNRPD2 was found overexpressed in LUAD. Additionally, high SNRPD2 expression was correlated with TP53 mutation, tumor grades, and cancer stages. SNRPD3 (smD3), like SNRPD2, also plays a vital role in TNBC. Moreover, several studies have shown that SNRPD3 expression is relevant to breast cancer (Koedoot et al., 2021) and NSCLC (Blijlevens et al., 2020). For instance, it was investigated that silencing SNRPD3 was able to promote TP53 expression and kill NSCLC cells effectively (Olst et al., 2017). In this study, higher SNRPD3 expression was found in LUAD. Moreover, SNRPD3 expression was associated with TP53 mutation, tumor grades, and cancer stages. Multivariate Cox regression analysis suggested a strong association between overexpressed SNRPD3 with the poor OS of LUAD patients, which seemed congruous with the oncogenic role of SNRPD3. SNRPE protein is a core component of Sm proteins, which has been reported in some malignancies, including bladder cancer (Tapak et al., 2015), prostate cancer (Anchi et al., 2012), hepatocellular carcinoma (Jia et al., 2011), and nonsmall cell lung cancer (Valles et al., 2012). As an example, SNRPE was a deregulated RNA metabolism-related genes of LUAD by bioinformatics analysis (Valles et al., 2012). Similarly, in our report, higher SNRPE expression was found in LUAD than in normal lung tissues. Furthermore, SNRPE expression was relevant to TP53 mutation and tumor grades. Overexpressed SNRPE was an independent prognostic factor for poorer OS in LUAD patients by cox analysis. Multivariate Cox regression analysis indicated that SNRPE was an independent prognostic factor for poorer OS of LUAD patients, suggesting that SNRPG played a carcinogenic role in LUAD. Taken together, our study revealed that SNRPE could promoted oncogenesis in LUAD.
SNRPF was a novel biomarker of colorectal cancer via complex gene interaction networks (Sun et al., 2016). However, the role of SNRPF was rarely reported in LUAD. In our study, highly expressed SNRPF was found in LUAD compared to non-cancer tissues. Moreover, abnormal SNRPF expression could be relevant to TP53 mutation, tumor grades, and cancer stages. Multivariate Cox regression analysis indicated that SNRPDF was an independent prognostic factor for poorer OS of LUAD patients. Given the above results, our study showed that SNRPF could play a carcinogenic role in LUAD. Downregulated SNRPG could inhabit glioblastoma cells proliferation by p53 signaling pathway (Lan et al., 2020). In this paper, higher mRNA expression of SNRPG was explored in LUAD. Furthermore, SNRPD1 expression could also be relevant to TP53 mutation, tumor grades, and cancer stages. Multivariate Cox regression analysis indicated that SNRPG was an independent prognostic factor for poorer OS of LUAD patients, suggesting that SNRPG played a carcinogenic role in LUAD.
Notably, in this paper, we assessed the relation between genes for Sm proteins and 22 immune infiltration cells in LUAD. genes for Sm proteins gene expressions were positively related to the infiltration of Th2 cells and negatively related to the infiltration of Th1 cells, mast cells, and NK cells. Furthermore, prior studies also suggested that spliceosome was tightly associated with the immune microenvironment. Nevertheless, further studies are still needed to develop immunosuppressants of individual genes for Sm proteins members and apply them to the diagnostic of LUAD.
In this work, there are a few limitations that need to be recognized. Firstly, the data used to assess the prognostic worth of genes for Sm proteins in LUAD patients were mainly from the TGCA database and GEO database. Although the TGCA and GEO sequencing data were experimentally confirmed, additional large sample studies on LUAD patients from other databases are necessary to validate our results. Secondly, Cox analysis indicated that the expressions of SNRPD1//E/F/G were correlation to shorter OS in LUAD. Consequently, SNRPD1/E/F/G were considered as independent prognostic factors and potential prognostic biomarkers for LUAD. Finally, further experiments in cells and animal models are needed to elucidate the underlying mechanisms of how genes for Sm proteins play roles in LUAD.
Taken together, our work illustrates that SNRPD1/E/F/G could independently predict the prognostic outcome of LUAD and was correlated with immune infiltration. Our findings laid the foundation for further exploration on the promising therapy target's role of Sm proteins in LUAD.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
GL, FL, and MC contributed to the conception of the study; GL and YL performed cell culture and RT-qPCR experiment; GL, FL, and MC contributed significantly to analysis and manuscript preparation; GL, FL, and MC performed the data analyses and wrote the manuscript; GL, FL, MC, YD, and PH helped perform the analysis with constructive discussions, YD and PH revised the manuscript.