An integrated analysis of prognostic mRNA signature in early- and progressive-stage gastric adenocarcinoma

The pathogenesis and vital factors of early and progressive stages of stomach adenocarcinoma (STAD) have not been fully elucidated. In order to discover novel and potential targets to guide effective treatment strategies, a comprehensive bioinformatics study was performed, and the representative results were then validated by quantitative polymerase chain reaction (qPCR) and immunohistochemical (IMC) staining in clinical samples. A total of 4,627, 4,715, and 3,465 differentially expressed genes (DEGs) from overall-, early-, and progressive-stage STAD were identified, respectively. Prognostic models of 5-year OS were established for overall-, early-, and progressive-stage STAD, and ROC curves demonstrated AUC values for each model were 0.73, 0.87, and 0.92, respectively. Function analysis revealed that mRNAs of early-stage STAD were enriched in chemical stimulus-related pathways, whereas remarkable enrichment of mRNAs in progressive-stage STAD mainly lay in immune-related pathways. Both qPCR and IHC data confirmed the up-regulation of IGFBP1 in the early-stage and CHAF1A in progressive-stage STAD compared with their matched normal tissues, indicating that these two representative targets could be used to predict the prognostic status of the patients in these two distinct STAD stages, respectively. In addition, seven mRNAs (F2, GRID2, TF, APOB, KIF18B, INCENP, and GCG) could be potential novel biomarkers for STAD at different stages from this study. These results contributed to identifying STAD patients at high-risk, thus guiding targeted treatment with efficacy in these patients.


Introduction
Gastric cancer (GC) is one of the most common malignancies worldwide. Although many advances of systemic treatment in GC had been made over the past decades, it remained a concern that the majority of patients showed strong resistance to many treatment strategies (Rihawi et al., 2021), which resulted in a great health burden.
In GC, stomach adenocarcinoma (STAD) was the most common histological type (about 95%), and clinical guidelines had addressed differences in STAD therapeutic strategies and outcomes within different clinicopathological stages (Ajani et al., 2017). Optimally, patients with earlystage STAD undergo limited resection through endoscopies, while patients with advanced STAD require surgeries and multidisciplinary adjuvant treatments (Ajani et al., 2017). The 5-year survival rate for early-stage STAD (according to TNM malignancy classification) is 95%; however, the median survival time for patients with advanced-stage STAD was only 9-10 months (Ajani et al., 2016). Therefore, spotting high-risk STAD patients and choosing the appropriate treatment at the early time was crucial for prolonging survival time in these patients. Emerging evidence had revealed that biomarkers contributed to molecular classification, predicting prognosis, and driving precision therapy approaches in STAD population (Joshi and Badgwell, 2021). For example, Jiang et al. found that ITGB1-DT was apparently up-regulated in STAD tissues and was connected with the T stage, therapeutic effect, and poor prognosis of STAD patients, while suppression of ITGB1-DT could inhibit cell proliferation, invasion, and migration of STAD cells (Jiang et al., 2022). Furthermore, bioinformatics analysis can be used to screen key immunerelated genes (IRGs) and pathways significantly linked to STAD therapy. For example, Xia et al. constructed an immune-related risk signature model consisting of BMP8A, MMP12, NRG4, S100A9, and TUBB3, which were associated with prognosis in patients and could be the potential biomarkers for immunotherapy in STAD (Xia et al., 2022). Nevertheless, the pathogenesis and vital factors of early-and progressive-stage STAD had not been fully highlighted; it is of importance to identify novel and promising targets and Cox model, elucidate the mechanism of STAD, and provide a candidate diagnosis option in patients with distinct STAD stages. mRNA is single-stranded ribonucleic acid that carries genetic information to guide protein synthesis, and it has a central role in the pathogenesis of various cancers, including STAD. Numerous researchers underlined that mRNA had diagnostic and prognostic values in clinical practice. For example, Huang et al. demonstrated that the expression level of sirtuin-4 (SIRT4) was reduced in STAD tissues compared with normal gastric tissues and was also correlated with pathological differentiation and tumorinfiltrating depth of STAD (Huang et al., 2015). Lu et al. demonstrated that metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) was up-regulated in MGC-803 cells, and the increased MALAT1 could promote the metastasis of cancer cells, while the decreased MALAT1 could suppress the progression and proliferation of STAD (Lu et al., 2019). BicC family RNA-binding protein 1 (BICC1), which codes an RNA-binding protein, proved to be significantly correlated with grade, TNM stage, invasion depth, and even immune infiltrates in STAD . Taken together, these results indicated the key roles of mRNAs as the targets helping tumor diagnosis and targeted treatment.
The extensive applications of gene chips and highthroughput sequencing technologies in cancer research had brought omics data explosion, and STAD is no exception. Integrated bioinformatics analysis of publicly available data of STAD improves the insight into the underlying molecular mechanism of tumorigenesis, and it also contributes to identifying potential tumor biomarkers and drug targets for STAD. Depending on these analyses in mRNAs, we distinguished STAD patients from distinct stages and accurately predicted their clinical outcomes by constructing prognostic models. Function annotations and pathway enrichment of mRNA signatures revealed that different STAD stages were dominated by distinct key mechanisms. Representative biomarkers in early and progressive stages were measured in clinical samples by qPCR and IHC detections. The overall bioinformatics analysis procedure is summarized in Figure 1. This study highlighted burgeoning evidence supporting some mRNAs as biomarkers for the diagnosis of patients in different STAD stages, which could form the basis of precision medicine strategies in the future.
2 Materials and methods 2.1 Data preparation and DEGs identification mRNAs expression and clinical data of 407 STAD total samples were downloaded from The Cancer Genome Atlas (TCGA) database (http://portal.gdc.cancer.gov/), including 32 paracancer and 375 tumor samples. mRNAs expression and clinical data of early-stage (stages I and II) and progressive-stage (stages III and IV) STAD were classified as two independent datasets. The early-stage STAD samples included 21 paracancers and 164 tumor samples, while the progressive-stage STAD samples included 10 paracancers and 188 tumor samples. These two datasets were obtained from the 407 STAD total samples after the deletion of Frontiers in Molecular Biosciences frontiersin.org 24 samples without clinical staging information. All these data of STAD samples were downloaded in March 2021. Based on the three sets of STAD sample groups: overall-, early-, and progressive-stage STAD samples; differentially expressed genes (DEGs) were identified by analyzing the expression profile with edgeR R language package (Version 4.0.2) according to the cutoff criteria of P adj < 0.05, |log2FC| > 1. Adjusted p-value took into account the false discovery rate (FDR), and the volcano maps of DEGs in these three datasets were generated.

Univariate Cox regression analysis
The survival package of R software was adopted to conduct the univariate Cox proportional hazard regression assessment of DEGs in the overall-, early-, progressive-stage STAD groups. With the criteria p < 0.05, overall survival-related mRNAs (mRNAs-OS) in each stage were obtained, which was related to the survival and prognosis of STAD patients.

PPI network construction and key mRNAs identification
To further identify the key mRNAs-OS in all three stages of STAD, the Search Tool for the Retrieval of Interacting Genes (STRING, http://string-db.org) was applied to analyze the mRNAs-OS and obtain protein interaction data. Proteins with a minimum required interaction score of 0.400 or above were selected to create a protein-protein interaction (PPI) network, in which the nodes with network interruption were hidden. The PPI network and its combined scores were then imported into Cytoscape software (Version 3.6.1, https:// cytoscape.org/), and potential key mRNAs were identified using CytoHubba, a plug-in in Cytoscape software. According to the node degree, the top 40 candidate mRNAs selected in each stage were displayed for further analysis.

Establishment of three Cox proportional hazards regression models
The package "surminer" of R was applied to perform multivariate Cox proportional hazard regression analysis on those identified top mRNAs-OS, and subsequently, a prognostic model consisting of mRNAs-OS related to the patients' prognosis (mRNAs-PRO) was constructed for the overall-, early-, progressive-stage STAD. Among them, mRNA-PRO with p < 0.05 was regarded as an independent prognostic factor of STAD. Based on the expression of mRNAs-PRO, the risk score of individual patients was computed as follows: risk score = Exp (mRNA 1 ) × β 1 + Exp (mRNA 2 ) × β 2 + Exp (mRNA 3 ) × β 3 +... + Exp (mRNA n ) × β n . According to the median risk score, the STAD patients were clustered into highrisk and low-risk groups, and 5-year survival rates of the highand low-risk patients were calculated for each prognostic model. Then, a risk score curve was drawn to distinguish the risk score differences between two groups of patients. A survival status map was drawn to reflect the survival status of each patient. A heatmap was drawn to exhibit the differences of the expression levels of the mRNAs-PRO in the high-and lowrisk groups. A survival curve was drawn to display the 5-year survival rate in the high-and low-risk groups. An ROC curve showed by the area under the curve (AUC) of the model was drawn to evaluate its accuracy and reliability of predicting prognosis in each stage.

Functional enrichment and pathway analysis
To understand the underlying biological significance of mRNAs-OS in overall-stage STAD, GO function annotations including cellular component (CC), biological process (BP), and molecular function (MF) based on the FunRich (http://www. funrich.org/) database, and KEGG pathway enrichment based on KOBAS (http://kobas.cbi.pku.edu.cn/kobas3/) database were then analyzed. A p-value < 0.05 was set as the threshold to determine the crucial functions or pathways closely related to STAD in the overall stage. Furthermore, gene set enrichment analysis (GSEA) was carried out on the mRNAs of early-and progressive-stages STAD in order to explore the critical functions and pathways of the mRNA signatures in these two stages. The top three terms of BP, CC, MF, and KEGG pathways were presented. The threshold in this step was set based on net enrichment score (NES) and p-value. Gene sets with |NES| > 1 and p < 0.05 were considered to be statistically significantly enriched.

Diagnostic capability evaluation of prognostic models
In order to compare predictive accuracy of the age, gender, stage, and risk score for individual prognosis in overall-, early-, and progressive-stage STAD, the univariate Cox proportional hazards regression analysis was performed with the criterion of Ps < 0.05. Moreover, multivariate Cox proportional hazards regression analysis was applied to identify whether the age, gender, stage, and risk score could be independent prognostic factors in STAD patients. In addition, in order to investigate the prognostic values of mRNAs-PRO in different stratifications of other clinical prognostic variables, the overall-, early-, and progressive-stage STAD patients were clustered into different subgroups of age (≥65 or <65), gender, T stage, M stage, and N stage. The Kaplan-Meier survival curves were used to evaluate the prognostic capacity difference of three prognostic Cox models in STAD patients under different clinical variables.

Quantitative real-time polymerase chain reaction and immunohistochemical staining
A total of 30 pairs of cDNA tissue chips (including 13 earlystage pairs and 17 progressive-stage pairs), 84 pairs of tissue microarrays (including 32 early-stage pairs and 52 progressivestage pairs), and the related clinicopathological information of these matched STAD and normal samples were obtained from Shanghai OUTDO Biotech Co., Ltd. (Shanghai). All patients were pathologically diagnosed as STAD according to American Joint Committee on Cancer (AJCC) criteria. The samples were obtained following written consent in accordance with an established protocol approved by Institutional Review Board of Biobank in Shanghai Outdo Biotech Co., Ltd.
QPCR was used to detect the expression levels of insulin-like growth factor binding protein-1 (IGFBP1) and chromatin assembly factor 1 subunit A (CHAF1A) in 30 pairs of cancer and normal samples (detailed clinical data are seen in Supplementary Table S1), and GAPDH was used as a reference gene. Total RNA was extracted from tissues and cells by the TRIzol reagent (Sigma, America). The cDNA was obtained by reverse transcription of RNA using PrimeScript TM RT Master Mix (perfect real time) (Takara, Japan) according to the protocol of manufacturer. The expressions of IGFBP1 and CHAF1A were determined by the protocol of the ChamQ Universal SYBR qPCR Master Mix (Vazyme, Nanjing, China), according to the manufacturer's protocol. The primers used in this study were as follows: IGFBP1, forward: 5′-GCATTTCTG CTCTTCCAAAG-3′, reverse: 5′-GCAACATCACCACAGGTA G-3'; and CHAF1A, forward: 5′-AAAGGAGCAGGACAGTTG GA-3′, reverse: 5′-CTGGAAGGGACTTGATTTGC-3'.

Frontiers in Molecular Biosciences
frontiersin.org IHC was conducted to detect the protein levels of IGFBP1 and CHAF1A in 84 pairs of cancer and normal samples (detailed clinical data are seen in Supplementary Table S2), according to standard procedures by two independent pathologists blinded to the study. Based on proportion and staining intensity of positive stained cells, expression levels of IGFBP1 and CHAF1A were accessed semiquantitatively (Proteintech, China). Proportion was evaluated using semi-quantitative criterion: 0, (no staining); 1, minimal (< 10%); 2, moderate (10-50%); and 3, diffuse (> 50%) staining cells. Staining intensity was also scored as 0 (negative); +1 (weak); +2 (moderate); and +3 (strong). Taken together, the final expression score of each case, considering both proportion and staining intensity, was given as 0+ (0), negative; 1+ (1 or 2), weakly positive; 2+ (3 or 4), moderately positive; and 3+ (5 or 6), strongly positive. The statistical analysis was performed using a software package (SPSS, version 19.0, Chicago, IL, United States). Clinical pathological features and expression data of representative targets were analyzed using Pearson's chi-square and likelihood ratio tests. A level of p < 0.05 was considered statistically significant.

Differential expression analysis
A total of 4,627 DEGs were identified from the overall stage of STAD, composed of 2,445 up-regulated mRNAs and 2,182 down-regulated mRNAs ( Figure 2A; Supplementary  Table S3). In addition, 4,715 DEGs were distinguished from early-stage STAD, in which 2,542 DEGs were remarkably upregulated and 2,173 DEGs were notably down-regulated ( Figure Table S3).
Intersections between overall-and early-stage, overall-and progressive-stage, and early-and progressive-stage STAD provided 4,059, 2,850, and 2,540 overlapping signatures, respectively. Overlapping of target DEGs of all three stages in STAD obtained 2,526 consensus mRNAs ( Figure 3A, Supplementary Table S4).
Interestingly, six mRNAs, including SCGB3A1, SRARP, MUC5B, GABRB1, CNMD, and KRT27, were all up-regulated in early-stage STAD but down-regulated in progressive-stage STAD, while these targets were not significantly different when brought into overall STAD samples (Supplementary  Table S4).

Cox proportional hazards regression model of DEGs
Univariate Cox regression analysis identified 504, 430, and 193 mRNAs-OS for the overall-, early-, and progressive-stage STADs (Supplementary Table S5). Intersections analyzes revealed 168, 104, and 15 overlapping mRNAs-OS between overall-and early-stage STAD, overall-and progressive-stage STAD, and early-and progressive-stage STAD, respectively ( Figure 3B, Supplementary Table S6). Moreover, after overlapping mRNAs-OS among all three stages of STAD, a total of 14 signature genes were obtained ( Figure 3B, Supplementary Table S6). In addition, CytoHubba plug-in analysis selected 40 top mRNAs-OS from the overall-, early-, and progressive-stage STAD (Supplementary Table S7). Further multivariate Cox analysis based on those top mRNAs-OS constructed three prognostic models with 7, 9, and 14 mRNAs-PRO for these three stage sets (Figure 4, Supplementary Table S8). The survival risk score based on mRNAs-PRO for each stage was calculated by the model formula and stratified patients into low-and high-risk groups based on their median risk score. As presented in Figures 4A,D,G, expression heatmaps, risk score curve, and survival status map were plotted between the low-and high-risk groups of 7-, 9-, and 14-mRNA-based prognostic models for the overall-, early-, and progressive-stage STAD, respectively. The Kaplan-Meier survival curve of the high-risk group and the low-risk group for overall-, early-, and progressive-stage STAD analysis is presented in Figures 4B,E,H. The AUC value of the ROC curve was 0.73, 0.87, and 0.92 for overall, early-, and progressive-stage STAD, respectively, indicating that these models could form reliably to accurately predict the prognosis of STAD patients (Figures 4C,F,I).

Functional enrichment analysis
GO functional annotation and KEGG pathway analyses were carried out by the FunRich and KOBAS databases in Venn diagrams of overlapping DEGs and mRNAs-OS from overall-, early-, and progressive-stage STAD. Venn diagrams are shown for overlapping DEGs (A) or mRNAs-OS (B). Terms over the circles represent the groups of DEGs or mRNAs-OS at different stages of STAD.

Frontiers in Molecular Biosciences
frontiersin.org 504 mRNAs-OS of the overall-stage STAD. mRNAs-OS of overall stage were enriched in 47 GO terms (p < 0.05), composed of seven BP, 12 MF, and 28 CC items. As exhibited in Figure 5A, in BP, the smallest p-value lay in the cell communication item (p = 3.27E-3), while annotation with the largest count number annotation was signal transduction (count = 122). The annotation with the smallest p-value and largest count number in the MF category was the extracellular matrix structural constituent (p = 2.54E-7, count = 18). As for CC, mRNAs-OS was remarkably enriched in functions as extracellular (p = 5.68E-10), and the largest count number was laid in the plasma membrane (count = 127). KEGG pathway analysis demonstrated that enriched mRNAs-OS were notably involved in 68 pathways, and the top eight pathways with smallest p-values are shown in Figure 5B, including neuroactive ligand-receptor interaction, complement and coagulation cascades, ABC transporters, ascorbate and aldarate metabolisms, malaria, cAMP signaling pathway, phospholipase D signaling pathway, and steroid hormone biosynthesis.
In order to explore the functions and pathways of mRNAs in different stages of STAD, gene set enrichment analysis (GSEA) was conducted in early-and progressive-stage STAD, respectively. Three most significantly enriched pathways were exhibited in Figure 6. As for BP, enrichment of mRNAs was remarkably laid in detection of chemical stimulus, detection of stimulus involved in sensory perception, and sensory perception of chemical stimulus in early-stage STAD ( Figure 6A), while those mRNAs were significantly enriched in positive regulation were notably enriched pathways in progressive-stage STAD including external side of plasma membrane, immunological synapse, and presynapse ( Figure 6D). The significantly enriched pathways in MF included olfactory receptor activity, odorant binding, and metal cluster binding in early-stage STAD Frontiers in Molecular Biosciences frontiersin.org 08 ( Figure 6E), while pathways of progressive-stage STAD enriched in immune receptor activity, cytokine receptor activity, and structural constituent of muscle ( Figure 6F). In addition, KEGG pathway analysis of early-stage STAD revealed that mRNAs were enriched significantly in olfactory transduction, maturity onset diabetes of the young, and regulation of autophagy ( Figure 7A), while mRNAs were enriched in pathways including cell adhesion molecules cams, primary

Prognostic model validation
The predictive association between the age, gender, stage and risk score, and individual prognosis was evaluated by univariate and multivariate Cox regression analyses. Figure 8 shows both univariate and multivariate Cox regression analyses of overall survival time in overall-, early-, and progressive-stage STAD patients. As exhibited in the forest plots, while stage were significantly associated with the overall survival time in overall-and early-stage STAD patients (Ps ≤ 0.044) ( Figures 8A-D), the risk score was the only independent factor for all the three stages STAD patients by both univariate and multivariate Cox regression models (Ps ≤ 0.002) ( Figures 8A-F). These demonstrated the rationality of the stage stratification in the overall group and the risk-based prognostic models built for the overall-, early-, and progressive-stage STAD patients in this study. Moreover, the reliability and validity of prognostic models in classifying the high-and low-risk groups were further confirmed under various clinical circumstances by the Kaplan-Meier survival curves, and the overall survival (OS) time in the high-risk group was significantly lower than that of the low-risk group in overall-(Supplementary Figure S1), early-(Supplementary Figure S2), and progressive-(Supplementary Figure S3) stage STADs in all clinical situations, including age (≥ 65 or < 65), gender, T stage, M stage, and N stage (all p < 0.05). Thus, the prognostic models of overall-, early-, and progressive-stage STADs could be well used in a variety of clinical practices, indicating these three prognostic models were reliable and stable under different clinical circumstances.

QPCR and IHC of IGFBP1 and CHAF1A in STAD samples
Relative mRNA levels of IGFBP1 and CHAF1A in STAD and matched paracancerous tissues were evaluated via qPCR in overall-, early-, and progressive-stage STAD ( Figure 9A), and the results showed the expression of IGFBP1 was significantly elevated in early-stage STAD in comparison with matched normal tissues (p = 0.018, Figure 9A-top), which was in line with bioinformatics findings. CHAF1A was markedly increased in overall-and progressive-stage STAD in comparison with matched normal tissues (both p < 0.001, Figure 9A-bottom), which was also consistent with expected analyses.
In addition, Figure 9B showed the representative IHC staining on the protein levels of IGFBP1 and CHAF1A, and Figure 9C demonstrated the cytoplasmic immunoreactivity of these two representative targets in the paired cancer and paraneoplastic normal tissues of STAD patients. For IGFBP1, cancer samples demonstrated stronger staining than matched normal tissues in early-stage STAD (p = 0.022, Figure 9C-top). For CHAF1A, cancer samples showed much stronger staining than matched normal tissues in the overall-, early-, and especially the progressive-stage STAD (all p < 0.001, Figure 9C-bottom). The IHC results on the protein levels of these two representative targets were basically lined well with the qPCR results on their mRNA expressions; both were in accordance with the calculation results. Thus, IGFBP1 could serve as a biomarker for early-stage STAD, while CHAF1A could act as a biomarker for progressivestage STAD.

Discussion
According to the Global Cancer Statistics 2020, the incidence and mortality of GC ranks the fifth and fourth in all Frontiers in Molecular Biosciences frontiersin.org malignancies, respectively (Sung et al., 2021). It causes a considerable global health burden and requires urgent attention (Yue et al., 2021). STAD accounts for 90% of gastric cancer and is the most common histological type of this malignance (Joshi and Badgwell, 2021). As the onset of STAD is insidious, most patients had already developed an advanced stage of cancer when they were confirmed diagnosis (Joshi and Badgwell, 2021). Current interventions have highlighted advances in therapeutics of STAD, including surgery, systemic chemotherapy, radiotherapy, immunotherapy, and targeted therapy; however, the efficacy does not meet the need for advanced gastric cancer (Jim et al., 2017), and the 5-year survival rate of distant-stage gastric cancer is less than 10% (Li et al., 2022). Under the passive situations nowadays, discovery of novel diagnostic biomarkers and biomarkerdriven therapy for STAD is urgently needed.
With the development of high-throughput sequencing technologies, the mRNA molecules have been emerging as a new class of cancer biomarkers. Recent studies have explored associations between mRNAs and cancer features, and is a protective factor for STAD (HR < 1), and the green or red rectangle to the right of the dotted line indicates that the clinical factor is a risk factor for STAD (HR > 1).

Frontiers in Molecular Biosciences
frontiersin.org Representative images of IHC displaying negative (0), low (1+), moderate (2+), and strong (3+) staining for IGFBP1 and CHAF1A proteins from 84 STAD and paired paraneoplastic normal tissues (early-stage: 32 pairs and progressive-stage: 52 pairs). Various staining intensities of matched cancer and normal tissues were demonstrated, respectively, for IGFBP1 and CHAF1A. (C) IHC scores of overall, early-stage, and progressive-stage STAD, and paired normal tissues are presented in histograms with frequency. The statistical analysis was performed using Pearson's χ2 test and likelihood ratio test. *p < 0.05 and ***p < 0.001.
Frontiers in Molecular Biosciences frontiersin.org moreover, various mRNA signatures have been reported in STAD for prognostic analysis Wu et al., 2020;Zhou et al., 2020). However, some limitations of these studies should be noticed: most of them were reports of key biomarkers in overall STAD, while neglecting STAD stages Fu et al., 2020;Qiu et al., 2020). Nevertheless, STAD is a cancer characterized with high heterogeneity, and the differences of pathological features in distinct STAD stages were evident . Therefore, it is inappropriate to use the same mRNAs as biomarkers to classify patients with STAD at all the different stages in the clinical work. This study identified 4,715 DEGs of early-stage STAD and 3,465 DEGs of progressive-stage STAD, and a total of 2,540 DEGs were overlapping, indicating that many remaining DEGs were evidently different in early-and progressive-stages STAD, and these DEGs might be vital to characterize the two stages. Interestingly, six mRNAs (SCGB3A1, SRARP, MUC5B, GABRB1, CNMD, and KRT27) were up-regulated in early-stage STAD, while they were down-regulated in progressive-stage STAD. This suggested these mRNAs might play a role in different stages of cancers by participating in different mechanisms; meanwhile, these signatures might act as convenient biomarkers to distinguish STAD stages.
Multivariate Cox regression analysis of the age, gender, stage, and risk score demonstrated that the tumor stage was an independent prognostic factor, and the mRNAs-PRO signatures for distinct STAD stages were potential predictors for prognosis.
The prognostic models of overall-, early-, and progressivestage STAD were, respectively, composed of 7 mRNAs, 9 mRNAs, and 14 mRNAs. There was no consensus mRNAs-PRO in these three prognostic models. The AUC values of ROC curves of the prognostic models for overall, early, and progressive groups were 0.73, 0.87, and 0.92, respectively. The value of the overall stage was the smallest, suggesting a sorting stage might slightly improve the AUC value and the predictive reliability of the prognostic models. We also constructed a prognostic model by analyzing the data of all GC patients with the same procedure as the other three sets of data, and the value of the AUC was only 0.70, which was worst (data were not shown). This re-emphasized that the prognostic model established by stratified data according to influential clinical variables was more effective. As the amount of experimental data increased, novel molecular biomarkers could be identified to facilitate more accurate predictions after stratification. This might indicate that the prognostic model constructed for the overall-stage STAD was problematic due to the stage mixing.
Table 1 compared the abnormal expression of the prognostic mRNAs-PRO predicted in this study with those reported in previous studies. In the prognostic model of overall-stage STAD, all seven mRNAs (EGF, SERPINE1, ESCO2, ERCC6L, UHRF1, SPARC, and F2) were calculated to be up-regulated in this study. After a systematic and careful literature research, we found that the former six mRNAs were all experimentally validated to be overexpressed in either STAD tissues or GC tissues, and the aberrant expression of the remaining F2 in this cancer has not been confirmed by any previous experiments or bioinformatics analysis.
Among the nine mRNAs in the prognostic model of earlystage STAD, four mRNAs were up-regulated (WT1, IGFBP1, APOE, and TF) and five mRNAs were down-regulated (JUN, CA10, GRIA2, GRID2, and APOB). After comparing the outcomes in the related literature with the findings in our results, it was shown that the high expression level of WT1, IGFBP1, and APOE, and the low expression level of JUN was verified by previous clinical cases or experiments on GC, and the down-regulations of CA10 and GRIA2 were demonstrated in other tumor origins, where the results were all matched with our calculations. Although the remaining three mRNAs have not been reported by any cancer experiments before, GRID2 was calculated to be significantly down-regulated in endometrial carcinoma by bioinformatics studies, and TF and APOB have not been mentioned by any cancer studies before and were novel candidate prognostic signatures in early-stage STAD reported for the first time by this study. Notably, Kalantari et al. demonstrated that Lgr5High/DCLK1 high phenotype was associated with early-stage gastric carcinoma specimens (Kalantari et al., 2017). However, in this study, we found that Lgr5 was significantly upregulated, but DCLK1 was significantly downregulated in early-stage STAD (Supplementary Table S3); moreover, DCLK1 was one of the mRNAs-OS (Supplementary Table S5), though it was not in the 9-mRNA prognostic model of this premature stage. The contradiction between these two studies needs to be further explored.
Among the 14 mRNAs in the prognostic model of progressive-stage STAD, 13 mRNAs (CHAF1A, VCAN, CDH11, PDGFRB, BMP1, ASPN, BGN, GINS4, COL6A3, ECT2, IBSP, KIF18B, and INCENP) were upregulated, while one mRNA (GCG) was downregulated. As shown by the summaries of earlier studies in Table 1, we noted that the overexpression of eleven mRNAs (CHAF1A, VCAN, CDH11, PDGFRB, BMP1, ASPN, BGN, GINS4, COL6A3, ECT2, and IBSP) were all verified in GC cell lines; the increased expression level of KIF18B had been confirmed in hepatocellular carcinoma cells, and INCENP and GCG have not been mentioned by any cancer studies before and were novel candidate prognostic signatures in progressive-stage STAD reported for the first time by this study.
Emerging predictive tools for GC had been established based on prognostic genes (Chen J. et al., 2021b), lymph node histopathology (Wang X. et al., 2021b), immune microenvironment or/and tumor microenvironment-Frontiers in Molecular Biosciences frontiersin.org TF The abnormal expression of TF has not been reported in any cancer experiments. However, our study showed that TF was not differentially expressed in overall-stage STAD, but was significantly up-regulated in the early-stage STAD, implying that TF might be a novel biomarker of early-stage STAD NA GRID2 Abnormal expression of GRID2 has not been reported in any cancer experiments. However, Chen et al. suggested that GRID2 was significantly downregulated in endometrial carcinoma, and its expression was significantly associated with the proliferation and invasion of cancer cells through bioinformatics analysis. In this study, it was shown that GRID2 was significantly downregulated in the overall-and early-stage STAD, indicating that GRID2 might be a novel biomarker of these two stages of STAD PMID: 33577492 Chen et al.
APOB Abnormal expression of APOB has not been reported in any cancer experiments. However, our study showed that APOB was significantly downregulated in the overall-and earlystage STAD, indicating that APOB might be a novel biomarker of these two stages STAD NA (Continued on following page) Frontiers in Molecular Biosciences frontiersin.org  (Cai et al., 2020;Wan et al., 2020), deep learning radiomic nomogram , lncRNAs (Ma et al., 2020), and other scoring systems . Although these prognostic models mentioned earlier could predict the outcomes of certain GC patients in diverse clinical practices, they all have shortcomings: they did not be classified as histopathology types or stages, and AUC values of ROC curves for early-and progressive-stage STAD in this study were superior. This demonstrated better survival prediction, probably owing to the more refined and representative data after stratification. Moreover, three models were further confirmed reliably in various clinical circumstance, including different ages (≤65 years or > 65 years), genders (males or females), T stages (T1, 2 or T3, 4), N stages (N0 or N1, 2), and M stage (M0) (Supplementary Figures S1-3), indicating the applicability of prognostic models established by stratified data according to vital clinical variables. As experimental data were enhanced after stratification, novel molecular biomarkers would be identified to facilitate more accurate predictions. Enrichment analysis of the overall STAD stage was based on mRNAs-OS, whereas the GSEA of stratification stages was based on all mRNAs. As for GO analysis, most of mRNAs-OS were enriched in cell communication, cell growth and/or maintenance, signal transduction, receptor activity, cell adhesion molecule activity, and auxiliary transport protein activity ( Figure 5A), which were basic, common, and essential for survival of cancer cells. Both GO and KEGG analyses in stratified STAD stages revealed notable heterogeneity in earlyand progressive-stage STAD. In early-stage STAD, GSEArevealed chemical stimulus, including detection, and sensory were markedly enriched ( Figure 6A), but few reports focused on these respects. MRNAs of progressive-stage STAD from GSEA were mainly enriched in immune-related functions, including regulation of lymphocyte activation, regulation of T-cell activation, immunological synapse, cytokine receptor activity, and immune receptor activity ( Figures 6B,D,F). Immune-related signatures took part in formation of the tumor microenvironment and could predict prognosis in GC (Dai et al., 2021). Wang et al. performed pathway enrichment analysis based on single-cell sequencing in GC, and 12 pathways were immune-related, such as defensins, IL-7 signaling, and IL6/JAK/STAT3 signaling; meanwhile, these pathways were all associated with longer survival, suggesting certain immune-related biological processes contribute to distinct molecular consequences and patient survival (Wang R. et al., 2021a). As for KEGG analyses in earlystage STAD, regulation of autophagy is of note ( Figure 7A). Autophagy is a double-edged sword in GC. On the one hand, it inhibits tumor initiation at early-stage by clearing damaged mitochondria, peroxisome, and also meets the high metabolic needs from enhanced proliferating tumors (Kongara and Karantza, 2012). On the other hand, autophagy protects some tumor cells against nutrient and oxygen deprivation, and harsh tumor microenvironments; meanwhile, autophagy is shown to facilitate resistance to cisplatin in GC cells, and thus, it is a cause of tumor metastasis, recurrence, and chemoresistance (Maes et al., 2013;Katheder et al., 2017). As for KEEG analysis in progressive-stage STAD, cell adhesion molecules (CAMs) were of note ( Figure 7B). Cell adhesion molecules have multifaceted roles, including signaling molecules, and key constituents of the cell migration machinery, which are involved in virtually each step of tumor progression from primary cancer development to metastasis (Hamidi and Ivaska, 2018). Altered expression of cell adhesion molecules is frequently detected in tumors, and meanwhile, these molecules contribute to supporting the oncogenic growth factor receptor (GFR) signaling, and GFR-dependent cell invasion and migration, which are main features of progressive tumors (Hamidi and Ivaska, 2018).
Further qPCR and IHC detections were performed to estimate mRNA and protein levels of two representative targets in clinical samples. IGFBP-1, which belongs to the insulin-like growth factor system, plays an essential role in the pathophysiology of various tumors (Lin et al., 2021). The role of IGFBP1 had been explored by previous experiments. Yuya Sato et al. reported IGFBP1 could predict hematogenous metastasis in patients with gastric cancer (Sato et al., 2019), while another research demonstrated that the role of IL35 in GC angiogenesis was altering TIMP1, PAI1, and IGFBP1 (Li X. et al., 2020b). In our study, the experimental results of IGFBP1 in both transcriptional and protein levels were in accordance with the bioinformatics outcome, which illustrated it could predict early-stage STAD. CHAF1A, as a known histone chaperone, is also upregulated in tumors of many origins, including GC. It has been reported CHAF1A could upregulate the c-MYC and CCND1 expressions and further promote gastric carcinogenesis (Zheng et al., 2018). In this study, both qPCR and IHC results of CHAF1A were in line with bioinformatics analysis, which illustrated that it could predict progressive-stage STAD. Thus, these two targets could serve as potential signatures for identification of STAD at different risks.
This study stratified STAD into early and progressive stages, which is closely related to distinct treatment and prognosis, and by combining TCGA data, conclusions drawn are more general and representative. In addition, through research validation in clinical cases, we believed our results were stable and reliable. Meanwhile, it is worth noting that limited to TCGA data, the number of paraneoplastic tissues was small, and cancer and paraneoplastic normal cases were evidently not one-to-one matching. Therefore, selection bias is hard to avoid in bioinformatics results when based on these data. Also, since there were only 30 cases in the qPCR validation Frontiers in Molecular Biosciences frontiersin.org cohort, and most of which were progressive-stage STAD, and for IHC, though validated by more than 80 paired STAD and paraneoplastic normal tissues, it was a semi-quantitative method, so subjectivity cannot be ruled out when evaluation of targets. Therefore, further investigations by large and matched cohorts were strongly suggested to be performed in the future.
In conclusion, our study combined bioinformatics analysis, clinical parameters, qPCR, and IHC detections in clinical samples to draw the conclusion that mRNA-based models by stratification could predict outcomes of patients at different risks, and selected signatures could serve as novel biomarkers for STAD patients at varied stages.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding authors.

Ethics statement
The studies involving human participants were reviewed and approved by the Clinical Research Ethics Committee, Outdo Biotech (Shanghai, China). The patients/participants provided their written informed consent to participate in this study.

Author contributions
ZH designed this study. XH, KZ, and NX performed the bioinformatics analysis. JZ carried out the qPCR and IHC analyses. XH, JZ, NX, JW, and ST drafted the manuscript. ZH made the critical revision of the manuscript for important intellectual content and provided the funding support. JZ and YL provided the administrative, technical, and material support.