Systematic Review of Prognostic Gene Signature in Gastric Cancer Patients

Gastric cancer (GC) is the second leading cause of cancer mortality and remains the fourth common cancer worldwide. The effective and feasible methods for predicting the possible outcomes for GC patients are still lacking. While genetic profiling might be suitable in some way, the application of gene expression signatures has been show to be a robust tool. Here, by performing a comprehensive search in PubMed, we provided an up-to-date summary of 39 prognostic gene signatures for GC patients, and described the processing procedure of the selection, calculation and construction of gene signature. We also reviewed current web tools including PROGgene and SurvExpress that can be used to analyze the prognostic value of multiple genes for GC. This review will aid in comprehensive understanding of the current prognostic gene signatures to accurately predict the outcome of GC patients, and may guide the future clinical management when the reliability of these signatures is validated in clinics.


INTRODUCTION
Gastric cancer (GC) is one of the leading causes of cancer-related death in many countries. In China, GC is identified as the second-leading risk for cancer-related lethality, ranking the second in frequent malignancy for male and the third for female (Chen et al., 2016). Due to poor diet habits, the Helicobacter pylori (Hp) infection and insufficient early endoscopic screening techniques, the GC incidence and mortality rates remain highin China (Li and Kaminishi, 2009). It is estimated that there are 221,478 GC patient deaths in China every year, roughly half of the world's gastric cancer deaths in 2012 (Strong et al., 2015). Nowadays outcomes of GC patients are still undesirable poor even with the advancement in surgery, chemotherapy, and radiotherapy (Peng et al., 2018), the overall 5-year GC survival rate is below 30% (Miller et al., 2016). However, the effective methods for predicting the outcome for the purpose of timely appropriate intervention for GC patients are still lacking (Yin et al., 2013). Thus, there is high clincial demand on valid biomarkers to assess in advance patient prognosis and tailor patient management (Yu et al., 2019).
Several single molecules have been reported to be related with GC patients' outcomes and peritoneal relapse (Taniguchi et al., 1998;Ishii et al., 2000). For example, the over-expressed HER2 gene was found to be associated with the lymph gland metastasis of GC (Hecht et al., 2016). P53 was proved to be an unfavorable biomarker of GC . However, the single biomarker applied in prognosis is less robust relative than the multiple biomarker-based models (He and Zuo, 2019). Many studies have demonstrated that signatures with an optimal combination of several candidate biomarkers could note worthily improve the predictive accuracy . Hence, gene signatures comprised of multiple genes are developed to strengthen the ability in predicting prognosis of cancer patients. Several gene signatures have been constructed to accurately predict the GC prognosis (Cho et al., 2011;Wang et al., 2016Wang et al., , 2017aHou et al., 2017;Lee et al., 2017). In 2005, Chen et al. developed a prognostic model using three genes based on gene expression profiles of primary GC tumor samples and adjacent mucosas (Chen et al., 2005). In 2007, Marchet et al. constructed a model with three genes to predict the lymph node involvement of cancer cells in a cohort of 32 primary gastric carcinoma patients (Marchet et al., 2007). Consequently, the goal of our current study was to perform a comprehensive systematic review for published GC prognostic signatures derived from the genomewide studies. In this study, we carried out a systematic review of reported prognostic signatures for GC, and identified 39 gastric cancer prognostic signatures. We also summarized 3 universal strategies of signature selections, calculations and constructions. Furthermore, the web tools for the prognosis assessment of gene signatures in GC were introduced and discussed.

Selection Criteria of Studies
To identify the published gastric cancer prognosis signatures, we firstly searched in NCBI PubMed database using both MeSH terms and entry terms of "prognostic AND gene expression signature AND gastric cancer." We also checked recent reports and reviews on this topic. In this study, we considered and selected signatures that were derived from mRNA expression profiling studies and were proven to be related to patients' survival outcomes by independent validation ( Figure 1A).
As a result of our search, a total of 259 studies were obtained by removing the duplication. Then we excluded some studies rigorously according to our criteria. The criteria of exclusion were as follows: (i) Review/Meta-Analysis; (ii) No human study/Not focused on GC; (iii) No gene-expression signature mentioned /Not address patient outcome; (iv) Without survival data. Finally, 39 signatures were collected from 39 studies published between 2005 and 2020 ( Figure 1B).

Statistical Collection
Two distinct survival association metrics were applied in evaluation of each prognostic model: (i) Hazard ratios (HRs) estimated by the Cox proportional-hazards regression model; (ii) The time-dependent receiver-operating characteristics (ROC) curves. Related information was listed in Table 1. We also provide the name and number of genes in each signature in Table 1 to facilitate the selection of the signatures for potential clinical application.

The Selection and Calculation of Multi-Gene Signature
Through analyzing these 39 articles, we summarized three common strategies to construct gene signatures (Figures 3-5). The major difference in developing signatures is the source of potential genes. Strategy I refers to finding differentially expressed genes (DEGs) and selecting candidate genes through univariate and multivariate Cox regression analysis. Strategy II focuses on one gene group of typical pathways, like Hedgehog signaling pathway-associated genes, then filtering genes. Strategy III means to obtain potential genes from molecular family and construct signatures based on their subtypes.

Gene Signature Based on DEGs With High Prognostic Values
In Strategy I, authors established gene signatures based on DEGs with high prognostic values. Based on DEGs derived from Cancer Genome Atlas (TCGA), Dai et al. constructed a 13-mRNA signature to predict the prognosis of GC patients (Dai et al., 2019). Liu et al. identified nine hub genes and constructed a 9gene signature based on gene expression profiling datasets by integrated bioinformatics analysis (Liu et al., 2018). Masaaki et al. selected only genes specifically expressed in gastric tissues from clinical samples and established a 29-gene signature (Motoori et al., 2005). The main steps of the workflow of strategy I were as follows: (i) Identification of DEGs associated with GC survival; (ii) Selection of key candidate mRNAs for the validation; (iii) Risk score model construction; (iv) Validation of risk score model. The overall process is presented in Figure 3.     (Liu et al., 2018). They firstly screened differentially expressed genes (DEGs) using microarray and RNA sequencing data and conducted certain integrated analysis, including functional enrichment for identifying the potential key genes involving the pathogenesis and prognosis of GC. Then these key genes that significantly correlated with patients' survival were regarded as candidate prognostic genes by univariate and multivariate Cox analysis. Finally, a prognostic gene signature was developed according to a linear combination of gene expression values multiplied by a regression coefficient (β) accessed from the multivariate Cox proportional hazards regression model of each gene. The formula is as follows: Risk score = β1X1 + β2X2 + β3X3 +. . . βnXn (Liu et al., 2018). In order to validate the prognostic power of the signature, all the patients were divided into low-or high-risk groups according to the median prognostic score. Then a time-dependent ROC curve analysis was performed to calculate the predictive ability of the gene signature for clinical outcomes.

Signature Constructed With Prognosis Related Pathway
In this strategy, authors established gene signatures from a specific gene group from typical cellular pathway.  . The main steps of the workflow of strategy II were as follows: (i) Identification of DEGs in GC; (ii) Identification of a typical pathway-related genes in GC; (iii) Genes selected in the previous step were integrated to constructed a gene signature; (iv) The Cox proportional hazards model was applied to test their association with overall survival; (v) Validation of the risk score model; The overall process is presented in Figure 4. For example, Wu et al. identified and validated a Hedgehog (Hh) pathway-based 3-gene prognostic signature for gastric cancer . They first analyzed the prognostic values of 9 canonical Hh signaling pathway-associated genes for GC patients. Three members IHH, PTCH1, and SMO were identified to have significant prognostic value at cutoff values. Based on the established cutoff value, patients were divided into subgroups with high-or low-risk respectively, and univariate Cox proportional-hazards regression analysis was carried out to calculate the coefficient for each of the three Hh-associated biomarkers. Subsequently, the prognostic risk for each cancer case was scored by summing the coefficientweighted expression of the IHH-PTCH1-SMO signature as follows: 3-gene signature score = (0.553 * IHH value) + (0.457 * PTCH1 value) + (0.411 * SMO value) . To validate the prognostic Hh gene signature, another GEO dataset was used as validating data set. In particular, they also performed independent validation of the prognostic significance by immunohistochemistry (IHC).

Signature Construction With a Specific Gene Family
In the strategy III, authors assessed the prognostic value of a specific gene family with updated public resources and integrative bioinformatics analysis. Yu et al. investigated the biological and prognostic values of the NDRG family in GC (Yu et al., 2019). Chang et al. examined the prognostic significance of oxygensensing genes from the 2-oxoglutarate-dependent oxygenase family . The main steps of the strategy III were as follows: (i) Target on a specific gene family; (ii) Analyzing of prognostic values of the gene family with different clinic pathological features; (iii) Constructing a prognostic model; (iv) Validation of the prognostic model. The overall process is presented in Figure 5.
For example, the N-myc downstream-regulated gene (NDRG) family, NDRG1-4 has been involved in a wide spectrum of biological functions in multiple cancers. From this perspective, Yu et al. firstly investigated the mRNA of the NDRG family was investigated in The Cancer Genome Atlas (TCGA). For each individual in the GC data of TCGA, a prognostic risk score was computed based on the risk score equation based on the score value with an optimal cutoff. High/low-risk groups were determined by the algorithm of the prognostic risk score. Finally, the low-risk group displayed a significantly favorable survival outcome than the high-risk group (Yu et al., 2019).

Web Tools for the Prognosis Assessment of Gene Signature in Gastric Cancer
With advances in high-throughput techniques, a big volume of omics data were generated by next generation sequencing and gene microarray platforms, and these data have been deposited into public databases and can be leveraged to identify prognostic markers in different cancer types. Bioinformaticians have developed a number of online web tools for prognosis analysis such as OSkirc , OSlms , OSacc , OSblca , OSuvm , PROGgene (Goswami and Nakshatri, 2013), SurvExpress (Aguirre-Gamboa et al., 2013), and KM plotter (Györffy et al., 2010). However, only two web servers including PROGgene and SurvExpress can be used to analyze the prognostic value of multiple genes as a signature.
In 2013, Goswami et al. implemented PROG gene, a survival analysis web tool based on gene expression for multiple cancer types. In 2014, they presented the second version of PROG gene, PROG geneV2, which has several enhancements over the previous version (Goswami and Nakshatri, 2014). In the PROGgeneV2, users can create the KM plots for published/curated gene signatures. PROGgeneV2 encompassed nearly ten thousand published/curated gene signatures from Molecular signature database to its repository. Users can directly search the keywords of gene signatures, and the application will retrieve genes included in the gene signature from the Molecular Signature Database. At last, a combined plot using mean of the expression of all genes in the signature will be presented for the entire signature (Goswami and Nakshatri, 2014).
In 2013, Raul et al. established SurvExpress, a web tool that performs the prognostic analysis of biomarkers and risk assessment for pan-cancers (Aguirre-Gamboa et al., 2013). SurvExpress can perform the assessment of single/multi-gene biomarkers in cancer. The prognostic index (PI, also called the risk score) is commonly utilized to stratify risk groups. Two methods are applied to generate risk groups in SurvExpress. The first method generates the risk groups by splitting the ordered PI (higher values for higher risk) with the median. The second method generates risk groups using an optimization algorithm from the ordered PI. A log-rank test was analyzed along all values of the arranged PI for two groups, and then the algorithm chooses the split point where the p-value is minimum.

DISCUSSION
In this study, we provided a systematic review of prognostic gene signature in gastric cancer patients. The purpose of this study is to provide a gene list for further prospective clinical application, but not to restate the procedures and results of the original publications. Compared to previous studies, our study completed the following: (i) Performing a systematic literature search for GC prognostic signatures and yielding a comprehensive signature collection; (ii) Extracting three common strategies to construct novel gene signatures; (iii) Reviewing current web tools including PROGgene and SurvExpress that can be used to analyze the prognostic value of multiple genes for GC. It is also necessary to consider and integrate with other types of gene signatures for GC prognosis, such as miRNA and lncRNA signatures (Cheng, 2018;Chen et al., 2019). Although we have summarized three general strategies to construct gene signatures, other solutions should also be mentioned. For example, Lukas et al. used different gene expression patterns of GSK3B, CTNNB1, and NOTCH2 as a risk score to instead of using an equation to make risk score quantified. GSK3B high , CTNNB1 high , and NOTCH2 low was linked with better outcomes (Bauer et al., 2012).
Chang et al. identified two signatures each consisting of a 5 genes, Signature 1 (KDM8, KDM6B, P4HTM, ALKBH4, ALKBH7) and signature 2 (KDM3A, P4HA1, ASPH, PLOD1, PLOD2), which can be used to predicting the OS in ten types of cancer patients . Yuzhalin et al. analyzed the extracellular matrix (ECM) genes significantly upregulated across a large cohort of patients with ovarian, lung, gastric and colon cancers and defined a nine-gene signature which was associated with poor prognosis in these cancer patients (Yuzhalin et al., 2018). Interestingly, these two literatures are both summarized in the strategy III. This might imply that the gene established by method III also has a good evaluation effect on the prognosis of other diverse types of cancer.
As well, we conducted a search in genes of 39 signatures to find the most overlapped gene. Three genes (COL1A1, COL1A2, EGFR) were used three times to construct signature in these articles, which indicated that they may be more powerful in GC prognosis and deserve to be noticed in GC prognostic gene signature construction in the future.
Notably, several drawbacks need to be discussed for these 39 gene signatures. For example, the limited number of clinical samples might affect there producibility of the prognostic signature, including more independent datasets from a certain or different race/ethnicity for cross-validation would improve the reliability of signature. Since GC incidences and survival outcomes differ significantly between Western and Asian countries (Macdonald, 2011). Zhu et al. found that their signature built based on Chinese patients is hard to be validated in the patients from other areas (Zhu et al., 2018). General signature research would also be limited by its retrospective study. The absent prospective study leads to low authenticity and acceptance of signatures in clinics. In a word, the combination of risk score and prospective randomized trials is of great necessities, in the hope that the true relevance of the risk score could be validated in the future study.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
XG and LX: study concept and design. LX, LC, FW, LZ, and QW: acquisition of data. QW, LX, LC, and FW: analysis and interpretation of data. LX, LC, FW, and XG: draft of the manuscript and critical revision of the manuscript for intellectual content. All authors contributed to the article and approved the submitted version.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe. 2020.00805/full#supplementary-material Figure S1 | Forest plot of the risk ratio for rates of survival on 39 gene signatures in GC.