Helicobacter pylori Infection–Related Long Non-Coding RNA Signatures Predict the Prognostic Status for Gastric Cancer Patients

Background Helicobacter pylori (H. pylori) is a type I biological carcinogen, which may cause about 75% of the total incidence of gastric cancer worldwide. H. pylori infection can induce and activate the cancer-promoting signaling pathway and affect the occurrence and outcome of gastric cancer through controlling the regulatory functions of long non-coding RNAs (lncRNAs). However, we have no understanding of the prognostic worth of lncRNAs for gastric cancer patients infected with H. pylori. Method We screened differentially expressed lncRNAs using DESeq2 method among TCGA database. And we built the H. pylori infection-related lncRNAs regulatory patterns. Then, we constructed H. pylori infection-based lncRNAs prognostic signatures for gastric cancer patients together with H. pylori infection, via uni-variable and multi-variable COX regression analyses. Based on receiver operator characteristic curve (ROC) analysis, we evaluated the prediction effectiveness for this model. Results We identified 115 H. pylori infection–related genes were differentially expressed among H. pylori–infected gastric cancer tissues versus gastric cancer tissues. Functional enrichment analysis implies that H. pylori infection might interfere with the immune-related pathways among gastric cancer tissues. Then, we built H. pylori infection–related dys-regulated lncRNA regulatory networks. We also identified 13 differentially expressed lncRNAs were associated with prognosis for gastric cancer patients together with H. pylori infection. Kaplan-Meier analysis demonstrated that the lncRNA signatures were correlated with the poor prognosis. What is more, the AUC of the lncRNA signatures was 0.712. Also, this prognostic prediction model was superior to the traditional clinical characters. Conclusion We successfully constructed a H. pylori–related lncRNA risk signature and nomogram associated with H. pylori–infected gastric cancer patients prognosis, and the signature and nomogram can predict the prognosis of these patients.


INTRODUCTION
Gastric cancer (GC) is one of the most common cancers of gastrointestinal system. According to the world cancer report provided by World Health Organization (WHO), there are more than 1 million new cases of gastric cancer at 2018, together with 783,000 attributed deaths, which makes it the third leading cause of cancer-related mortality worldwide (1). Helicobacter pylori (H. pylori) is the type I biological carcinogens, which would cause nearly 75% of the total incidence of gastric cancer worldwide (2). Infection of H. pylori would induce or regulate the oncogenic metabolic pathways, and then affect the occurrences and outcomes of gastric cancer (3). So, H. pylori infection is considered to be one of the most serious risk factor and would affect the prognosis of gastric cancer patients. The identification of prognostic indicators complicated with the effects of H. pylori infection for GC patients is urgently needed.
Long noncoding RNAs (lncRNAs) are new kinds of linear functional RNAs, categorized by long sequence sizes (over 200 nucleotides), which are widely known to be involved in physiological and pathological regulations, such as cell cycle regulation, epigenetic regulation, as well as cell differentiations, and so on (4). Recently, lncRNAs are newly identified as gastric cancer regulators, which would control the activation or inhibition of cancer-related metabolic pathways caused by H. pylori infection (5). However, we have no understanding of the prognostic worth of lncRNAs for gastric cancer patients complicated with H. pylori infection.
Hence, in this study, we first collected H. pylori infection-related genes and identified the expression status, based on the RNA-Sequencing data provided by The Cancer Genome Atlas (TCGA) database. Then, we constructed the aberrant lncRNAs regulatory networks caused by H. pylori infection and extracted the epithelial cell signaling in Helicobacter pylori infection lncRNA regulatory pattern. And based on the lncRNAs profiles and clinical features, we built a multi-lncRNA prognostic signature for GC patients complicated with H. pylori infection that are correlated with poor prognosis. Also, this prognostic prediction model is superior to traditional clinical characters.

Data Collection
The RNA-sequencing data for 163 patients with H. pylori infection information were extracted from The Cancer Genome Atlas (TCGA) database, including lncRNA, miRNA, and mRNA profiles. We set up the gastric cancer tissues without H. pylori infection as a control group. Meanwhile, the clinical information of these patients was also obtained ( Table 1). H. pylori infectionrelated genes were obtained from Gene Set Enrichment Analysis (https://www.gsea-msigdb.org/gsea/index.jsp) ( Figure S1). Immunohistochemical images from the Human Protein Atlas (HPA) (https://www.proteinatlas.org) were used to identify the protein expression levels of five differentially expressed genes.

Construction of H. pylori Infection-Related lncRNAs Regulatory Patterns
Based on R version 4.0.3, "DESeq2" method was used to screen the differentially expressed lncRNAs, miRNAs, and mRNAs. Then, we used The Encyclopedia of RNA Interactomes (ENCORI) database (6) to construct H. pylori infection-related lncRNAs regulatory networks. We also used the STRING database to construct the PPI networks, and the networks were visualized by Cytoscape version 3.8.2.AB.

Functional Enrichment Analysis
Based on R version 4.0.3, we used "clusterProfiler" package to perform gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was also used. Then, we used "pheatmap", "enrichplot", "pathview", and "ggplot2" packages to visualize the results. Screening H. pylori Infection-Related lncRNA Signatures Lasso-penalized COX regression, univariate COX regression, and multivariate COX regression analyses were used to identify the H. pylori infection-related lncRNA signatures. Multivariate Cox regression analysis is based on the results of univariate regression analysis. The risk scores were calculated for each gastric cancer patients. And based on the risk scores, the patients were classified into high-risk (over median number) and low-risk (no more than median number) groups.

The Nomogram
We used "rms" and "regplot" packages to construct the hybrid nomogram, based on R version 4.0.3. A nomogram was constructed integrating the prognostic signatures, for predicting 1-, 2-, and 3-year OS of gastric cancer patients.

Statistical Analysis
The significance of differences between comparative two groups was evaluated by Student's t-test and the Wilcoxon test, respectively. Differences with P value less than 0.05 were considered significant. The data were analyzed using R version 4.0.3.

Functional Analysis of H. pylori Infection-Related Differentially Expressed Genes in Gastric Cancer
The work flowchart for this study is shown in Figure 1. First, we have collected 402 H. pylori infection-related genes from Gene Set Enrichment Analysis (https://www.gsea-msigdb.org/gsea/index.jsp) ( Figure S1) and identified 115 genes, which were differentially expressed among H. pylori-infected gastric cancer tissues (among which 77 were up-regulated and 38 were down-regulated; Figure S2). We selected five genes that are significantly differentially expressed and searched their protein expression levels from the Human Protein Atlas (HPA). Immunohistochemical images from HPA indicated high levels of GBP5, ATP6V1G2 protein in gastric cancer tissues and GBP5, ATP6V1G2 protein were not detected in normal stomach tissues ( Figures S2A-D). However, the expression levels of CXCL8 are opposite ( Figures S2E, F). In addition, HPA indicated low levels of DEPDC1 protein in normal stomach tissues and moderate expression level in gastric cancer tissues ( Figures S2G, H); however, the expression levels of MET are the opposite ( Figures  S2I, J). GO Biological Process annotation identified that these genes participated in "GO:0071219 Cellular response to molecule of bacterial origin," "GO:0002685 Regulation of leukocyte migration," as well as "GO:0007159 Leukocyte cell-cell adhesion" (Figure 2A and Table S3). Following which, KEGG enrichment analysis has shown that these aberrantly expressed genes were significantly involved in "hsa05120: Epithelial cell signaling in H. pylori infection" and "hsa04061: Viral protein interaction with cytokine and cytokine receptor" ( Figure 2B and Table S4). Furthermore, H. pylori infection-related oncogenic signaling pathways, including "hsa04064: NF-kappa B signaling pathway" and "hsa04010: MAPK signaling pathway," were also significantly enriched (Figures 2C, D and S1). So, these items imply that the H. pylori infection can induce and activate the cancer-promoting signaling pathway and affect the occurrence and outcome of gastric cancer.

The H. pylori Infection-Related lncRNAs Regulatory Patterns in Gastric Cancer
Then, we have identified 800 differentially expressed lncRNA among H. pylori-infected gastric cancer tissues (among which 443 were up-regulated and 357 were down-regulated, Table S5). Using The Encyclopedia of RNA Interactomes (ENCORI) database (6), we constructed the H. pylori infection-related lncRNAs regulatory network ( Figure 3A and Table S6), based on the 115 aberrant expressed H. pylori infection-related genes we just identified. This network contains 370 lncRNAs, 92 miRNAs, and 100 genes. Using STRING database version 11.0, the protein-protein interaction (PPI) relationships for the 100 target genes were constructed ( Figure 3B and Table S7). Next, based on the KEGG enrichment items, we mapped the genes on hsa04064 KEGG graph ( Figure 3C) and constructed the "Epithelial cell signaling in H. pylori infection lncRNA regulatory pattern" ( Figure 3D and Table S8). Furthermore, we also identified the PPI relationships for the genes involved in this lncRNA regulatory pattern ( Figure 3E and Table S9). The H. pylori Infection-Based lncRNAs Prognostic Signature To analyze the correlation between H. pylori infection-related lncRNAs and prognosis, we used univariate COX analysis and identified 43 significant H. pylori infection ( Table 2), which would be used in multivariate COX analysis (  Table 3). Also, the risk scores were also calculated to construct a prognostic signature for H. pylori-infected gastric cancer (Table S10).

Survival Analysis Based on High-Risk lncRNA Signatures
Utilizing Kaplan-Meier analysis, we identified that the high-risk lncRNA signatures were associated with poor survival condition (P value < 0.0001, Figure 4A). And the patient risk survival status plot demonstrated that the risk-scores were proportional to the survival of patients inversely ( Figure 4B). Using ROC curve plotting, it was present that the AUC values for the high-risk lncRNA signatures at 1-, 2-, and 3-year survival rates were 0.712, 0.699, and 0.656, respectively ( Figure 4C). It was a heatmap for the risk scores ( Figure 4D).     (Table 4). Also, the correlations among H. pylori infection-related lncRNA prognostic signatures and clinical factors were also exhibited using heatmap ( Figure 5). Finally, using hybrid nomogram, which merged the clinical characteristics, we verified that the H. pylori infection-related lncRNA prognostic signatures were accurate and relatively stable ( Figure 6). Hence, these lncRNA signatures may be used in the prognosis evaluation for H. pyloriinfected gastric cancer patients.

DISCUSSION
The major risk factor associated with gastric adenocarcinoma is infection by H. pylori, and the attributable proportion of H. pylori-infected gastric cancer patients has been estimated to be about 70% (7). Thus, it is particularly critical to explore accurate biomarkers for predicting prognosis of H. pylori-infected gastric cancer patients. In this study, we first identified a novel 13-  lncRNA risk signature that was highly associated with the OS of H. pylori-infected gastric cancer patients based on the TCGA. Overall, we identified 115 H. pylori-related differentially expressed genes in GC patients. KEGG analyses further revealed that the differentially expressed genes mainly participated in GO and identified that these genes participated in the cellular response to molecule of bacterial origin, regulation of leukocyte migration, and leukocyte cell-cell adhesion. KEGG analyses revealed that the genes mainly participated in epithelial cell signaling in H. pylori infection, viral protein interaction with cytokine and cytokine receptor, NF-kappa B signaling, and MAPK signaling pathway. So, these items imply that the H. pylori infection can induce and activate the cancer-promoting signaling pathway and affect the occurrence and outcome of  gastric cancer. In addition, we also identified 13 differentially expressed lncRNAs associated with prognosis for gastric cancer patients together with H. pylori infection. Among them, Zhang et al. indicated that LINC00304 plays a tumor-promotive functional role and promotes PCa cell proliferation and cell cycle by upregulating CCNA1 expression (8). Cheng et al. showed that LINC01526 was among the noninvasive prognostic signatures, which was a preoperative prediction of disease-free survival in GC patients (9). However, the molecular mechanism and effect of other lncRNAs in our 13-lncRNA signature have not been explored and are still uncertain.
Increasing studies showed that lncRNAs play important roles in tumorigenesis and metastasis by regulating the expression of protein-coding genes through transcriptional, posttranscriptional, and posttranslational regulation (10)(11)(12)(13). Also, several reports indicated that LncRNAs may show oncogenic and tumor-suppressive activities in the initiation and progression of GC (14). Yang et al. reported that LINC00152 and H19 were significantly increased in the serum and cancer tissues of patients with GC and may be possible biomarkers for diagnosis and prognosis of GC (15). Another study showed that non-overlapping signatures of a few lncRNAs were abnormally expressed in gastric epithelial cells infected by H. pylori, and these lncRNAs may have a function in the immune system response against H. pylori, which in turn may lead to the occurrence of gastric cancer with H. pylori (16). Several studies indicated the influence of H. pylori infection on the dysregulation of lncRNA expression profiles, including Lnc-SGK1, THAP9-AS1, NR-026827, and AF147447 (17)(18)(19)(20).
Here, the differentially expressed H. pylori-associated lncRNAs in our study were stratified into two groups of highand low-risk scores to explore their potential roles in H. pyloriinfected gastric cancer patients. Our study indicated that patients with H. pylori infection in the high-risk group had a shorter survival than those in the low-risk group, and the signature also displayed a high prediction sensitivity and specificity. In this study, we constructed a nomogram by combining the 13-lncRNA signature and clinicopathological factors in 162 patients from the TCGA, and the nomogram is an effective predictive tool for patients with H. pylori infection.
A large number of studies have explored the prognostic factors of gastric cancer from the level of genes and proteins. The expression of HOXC9 mRNA and HOXC9 protein in gastric cancer tissue was significalntly higher than that in non-gastric cancer tissue (21). CD40, CAP2 proteins, and mRNAs expressions are closely related to prognosis, distant metastasis, and stage of patients with gastric cancer (22,23). The overall survival rate of patients with up-regulated SLC22A16 expression was worse than that of patients without up-regulated SLC22A16 expression (24). CXCR7 may be a prognostic marker for gastric cancer with peritoneal metastasis; however, the number of cases in this study were small, and the prognostic significance of CXCR7 was not confirmed (25). Wang et al. indicated that the high expression of SCD1 might predict poor prognosis in gastric cancer patients, but the AUC value at 1-, 3-, and 5-year survival rates were 0.557, 0.569, and 0.595, respectively (26), were significantly lower than the results of our study. In addition, the AUC's value of another predictive biomaker (linc-ROR) was 0.6495 (27), which was also lower than ours. So, we proposed that our findings improved the prediction accuracy for the prognosis of gastric cancer patients.
H. pylori is considered as a precancerous lesion of gastric cancer and is closely related to the occurrence and development of gastric cancer (28). The eradication of H. pylori can reduce the incidence of gastric cancer (29,30), but the prevalence of H. pylori is high, especially in economically underdeveloped areas (31). It has been reported that the prognostic factors of patients with gastric cancer infected with H. pylori, for example, miR-490-3p is associated with poor clinical prognosis. Low expression of miR-490-3p is associated with poor prognosis in H. pyloriinfected gastric cancer patients. However, the molecular functions of miR-490-3p are very diverse, and miR-490-3p can also promote the development of some cancers. The function of miR-490-3p is closely related to the tumor microenvironment (32). The expression levels of anti-H. pylori antibody, CA724, CA19-9, and CEA in H. pylori-infected gastric cancer patients were positively correlated with tumor stage, tumor size, and lymph node (33).However, Tas and Chae pointed out that the levels of CA724, CA19-9, CEA, and other markers had no significance in evaluating the prognosis of gastric cancer patients (34,35). In addition, this study was a retrospective study with a small sample size and the same hospital. Therefore, the accuracy of anti-H. pylori antibody, CA724, CA19-9, and CEA in predicting H. pylori-infected gastric cancer patients is still to be discussed. DNA methylation can induce the occurrence of gastric cancer in H. pylori-infected gastric cancer patients (36). However, this study did not establish a prediction model and could not accurately predict the patients.
Our research has some obvious strengths. First, the effect of lncRNA on the prognosis of H. pylori-infected gastric cancer patients has never been reported, and our research is the first to present this. Second, in this study, a model affecting the prognosis of H. pylori-infected gastric cancer patients was established based on the level of lncRNA, and a nomogram was constructed, which is conducive to more accurate prediction of the prognosis of H. pylori-infected gastric cancer patients and improvement of the survival rate of patients. Also, the signature and nomogram is convenient to apply in clinical practice for doctors. In addition, there are several limitations. First, the number of patients included in the study is relatively insufficient. Second, the 13-lncRNA signature and predicted nomogram were not validated in both internal and external cohorts and not compared with other models. Third, there is still a lack of experimental researches about the molecular function of these lncRNAs, and we will collect specimens from H. pylori (+) and H. pylori (−) GC patients and carry out experimental research.
In conclusion, we successfully constructed an H. pylorirelated lncRNA risk signature and nomogram associated with H. pylori-infected gastric cancer patients prognosis in TCGA. The results indicated that the signature and nomogram can predict the prognosis of these patients. Furthermore, we should carry out related molecular research to clarify the mechanism of related lncRNA in the future.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
ZX, NZ, and HX came up with the design and critical revision of the manuscript. The data analysis were conducted by YW, YZ, YS, and WZ. The original writing of the draft and its editing were conducted by LZ, ML, LS, and NX. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We would like to acknowledge the reviewers for their helpful comments on this study.

SUPPLEMENTARY MATERIAL
The