mRBioM: An Algorithm for the Identification of Potential mRNA Biomarkers From Complete Transcriptomic Profiles of Gastric Adenocarcinoma

Purpose In this work, an algorithm named mRBioM was developed for the identification of potential mRNA biomarkers (PmBs) from complete transcriptomic RNA profiles of gastric adenocarcinoma (GA). Methods mRBioM initially extracts differentially expressed (DE) RNAs (mRNAs, miRNAs, and lncRNAs). Next, mRBioM calculates the total information amount of each DE mRNA based on the coexpression network, including three types of RNAs and the protein-protein interaction network encoded by DE mRNAs. Finally, PmBs were identified according to the variation trend of total information amount of all DE mRNAs. Four PmB-based classifiers without learning and with learning were designed to discriminate the sample types to confirm the reliability of PmBs identified by mRBioM. PmB-based survival analysis was performed. Finally, three other cancer datasets were used to confirm the generalization ability of mRBioM. Results mRBioM identified 55 PmBs (41 upregulated and 14 downregulated) related to GA. The list included thirteen PmBs that have been verified as biomarkers or potential therapeutic targets of gastric cancer, and some PmBs were newly identified. Most PmBs were primarily enriched in the pathways closely related to the occurrence and development of gastric cancer. Cancer-related factors without learning achieved sensitivity, specificity, and accuracy of 0.90, 1, and 0.90, respectively, in the classification of the GA and control samples. Average accuracy, sensitivity, and specificity of the three classifiers with machine learning ranged within 0.94–0.98, 0.94–0.97, and 0.97–1, respectively. The prognostic risk score model constructed by 4 PmBs was able to correctly and significantly (∗∗∗p < 0.001) classify 269 GA patients into the high-risk (n = 134) and low-risk (n = 135) groups. GA equivalent classification performance was achieved using the complete transcriptomic RNA profiles of colon adenocarcinoma, lung adenocarcinoma, and hepatocellular carcinoma using PmBs identified by mRBioM. Conclusions GA-related PmBs have high specificity and sensitivity and strong prognostic risk prediction. MRBioM has also good generalization. These PmBs may have good application prospects for early diagnosis of GA and may help to elucidate the mechanism governing the occurrence and development of GA. Additionally, mRBioM is expected to be applied for the identification of other cancer-related biomarkers.

Purpose: In this work, an algorithm named mRBioM was developed for the identification of potential mRNA biomarkers (PmBs) from complete transcriptomic RNA profiles of gastric adenocarcinoma (GA).
Methods: mRBioM initially extracts differentially expressed (DE) RNAs (mRNAs, miRNAs, and lncRNAs). Next, mRBioM calculates the total information amount of each DE mRNA based on the coexpression network, including three types of RNAs and the protein-protein interaction network encoded by DE mRNAs. Finally, PmBs were identified according to the variation trend of total information amount of all DE mRNAs. Four PmB-based classifiers without learning and with learning were designed to discriminate the sample types to confirm the reliability of PmBs identified by mRBioM. PmB-based survival analysis was performed. Finally, three other cancer datasets were used to confirm the generalization ability of mRBioM.
Results: mRBioM identified 55 PmBs (41 upregulated and 14 downregulated) related to GA. The list included thirteen PmBs that have been verified as biomarkers or potential therapeutic targets of gastric cancer, and some PmBs were newly identified. Most PmBs were primarily enriched in the pathways closely related to the occurrence and development of gastric cancer. Cancer-related factors without learning achieved sensitivity, specificity, and accuracy of 0.90, 1, and 0.90, respectively, in the classification of the GA and control samples. Average accuracy, sensitivity, and specificity of the three classifiers with machine learning ranged within 0.94-0.98, 0.94-0.97, and 0.97-1, respectively. The prognostic risk score model constructed by 4 PmBs was able to correctly and significantly ( * * * p < 0.001) classify 269 GA patients into the high-risk (n = 134) and low-risk (n = 135) groups. GA equivalent classification performance was achieved using the complete transcriptomic RNA profiles of colon adenocarcinoma, lung adenocarcinoma, and hepatocellular carcinoma using PmBs identified by mRBioM.

INTRODUCTION
Gastric cancer is a global health problem, with more than 1 million patients being diagnosed worldwide each year. Gastric cancer remains the third leading cause of cancer-related death, despite a worldwide decline in morbidity and mortality over the past 5 years (Bray et al., 2018;Thrift and El-Serag, 2020). Gastric adenocarcinoma (GA) is a type of gastric cancer caused by malignant transformation of gastric gland cells. Incidence of GA accounts for approximately 95% of gastric malignancies (Lawrence, 2004), and GA pathogenesis has not been fully elucidated. Five-year survival rate of early gastric cancer can reach >90% (Tan, 2019), and 5-year survival rate of patients with advanced gastric cancer is only 20-40% (Siegel et al., 2016;. Therefore, an improvement in early diagnosis and treatment of GA can decrease GA incidence and mortality. Several studies have suggested that molecular biomarkers are important for early diagnosis, treatment, and evaluation of prognosis of cancer (Parker et al., 2009;Collins and Varmus, 2015;Pellegrini et al., 2015). According to the central dogma of biology, RNA carries genetic and regulatory information that reflects the state of the cells. RNA biomarkers have considerably higher sensitivity and specificity for the detection of cancer samples compared with those of protein biomarkers and can more dynamically reflect cellular states and regulatory processes to provide additional cellular information compared with that provided by DNA biomarkers (Xi et al., 2017). Furthermore, miRNAs can regulate gene expression by binding to mRNAs or related proteins (Bartel, 2009). LncRNAs can competitively bind miRNAs as competing endogenous RNAs (ceRNAs) to regulate gene expression and cellular functions (Xia et al., 2014;. Therefore, mRNAs occupy a key position in the complex regulatory processes involving three types of biomolecules. Abnormal expression of mRNAs in the key positions of the regulatory network can easily bias the overall stability of the network. mRNAs may cause abnormal activation of one or more signaling pathways, which also leads to abnormal expression or function of the biomolecules in these signaling pathways to promote physiological and tissue disorders, such as cancer Duan et al., 2020;Hu et al., 2020;Wei et al., 2020). mRNAs that occupy the key positions are more likely to be biomarkers. Many mRNA biomarkers associated with occurrence and development of GA were identified using experimental and computational methods. Representative studies can be summarized as follows. Yoon et al. (2019) confirmed that the activation of KRAS in GA cells stimulates epithelial-tomesenchymal transition to form cancer stem-like cells, thereby promoting metastasis.  found that overexpression of DGKi in GA indicates poor prognosis, and the MAPK signaling pathway may be one of the key pathways that regulate occurrence and development of GA by DGKi. Necula et al. (2020) showed that overexpression of COL10A1 in GA patients is associated with poor survival and that COL10A1 can be used as a potential biomarker for early detection of GA. Wang (2017) identified 446 differentially expressed (DE) mRNAs in the gene expression profile related to gastric cancer, used these DE mRNAs to construct a protein-protein interaction network, and finally identified five key mRNAs in the protein-protein interaction network (COL5A2, TOP2A, KIF20A, FN1, and PRC1). However, existing GA-related mRNA biomarkers are not sufficient to provide accurate GA diagnosis in the clinic and thoroughly elucidate GA pathogenesis. Identification of GA-related mRNA markers with high sensitivity and specificity is of great significance for early diagnosis, targeted therapy, and analysis of prognosis of GA. Therefore, this study first proposes an algorithm to identify potential mRNA biomarkers (PmBs) related to GA based on complete transcriptomic RNA (including mRNA, lncRNA, and miRNA) profiles of GA. The proposed algorithm evaluates the potential of an mRNA with abnormal expression as GA biomarker in the regulation of transcriptional coexpression and at the proteinprotein interaction level. The integrated analysis of multiple omics data objectively avoids the problems of signal noise and high inaccuracy caused by single omics analysis. Then, the sample classification power and prognostic relevance of PmBs were analyzed to assess their reliability and value for assistance with clinical diagnosis. The novelty of this paper are as follows: 1. An novel algorithm named mRBioM for the identification of potential mRNA biomarkers from complete transcriptomic profiles of GA was developed. 2. A cancer-related factor was proposed to distinguish whether a single sample is cancer or normal, which may have good application prospects in the personalized diagnosis of cancers. 3. The mRBioM-based prognostic risk score model was constructed to assess the overall survival rate of cancer patients.

Data Collection
The complete transcriptome TCGA-STAD dataset of RNAs (including mRNA, lncRNA, and microRNA) of GA patients published by various countries was obtained from the Genomic Data Commons of National Cancer Institute in July, 2019. The pathological tissue types of the source data were limited to GA. The dataset included 279 GA patients and the corresponding clinical information ( TCGA-STAD was organized into five subsets for various studies: dataset 1 for GA-related PmB identification, datasets 2-4 for evaluation of PmB classification, and dataset 5 for survival analysis, as shown in Figure 1A. Three other cancer-related RNA transcriptomic profiles were downloaded from the Genomic Data Commons database in May of 2020 and were used to verify the generalization ability of mRBioM: TCGA-COAD, including 478 cases of colon cancer and 41 cases of normal tissues; TCGA-LUAD, including 533 cases of lung adenocarcinoma and 59 cases of normal tissues; and TCGA-LIHC, including 371 cases of liver cancer and 50 cases of normal tissues. The characteristics of the three datasets are shown Figure 1B.

mRBioM Algorithm
The amount of information for a molecule can determine whether this molecule is in a key position in the regulatory network (Teschendorff et al., 2014). Thus, mRBioM identified PmBs by evaluating the amount of information for each DE mRNA based on the transcriptional coexpression relationships between DE mRNAs, miRNAs, and lncRNAs and in the PPI network. The steps of the mRBioM algorithm are described below.

DE RNA Analysis
The limma package of R (Ritchie et al., 2015) was used to identify DE RNAs from dataset 1 containing 20 GA and 20 paracancer samples (a total of 40 samples) from TCGA-STAD. Dataset 1 was preprocessed by cleaning and standardization; next, the logarithm of the expression fold change (FC) of each RNA in GA vs. adjacent normal samples was calculated. The log 2 FC value and corresponding corrected p-value (represented by Padj) of each RNA were used to determine whether an RNA was differentially expressed. The screening conditions for DE RNAs in this study were Padj < 0. 05 or 0.01 and | log 2 FC | 1.

Calculation of the Coexpression Correlation Coefficient Matrix for RNAs
Suppose that we identified N, J, and K DE mRNAs, DE miRNAs, and DE lncRNAs, respectively. The expression vector of each DE RNA in all samples was extracted from dataset 1. Pearson correlation coefficients M xy and L xz between DE mRNA x(x = 1,·, N) and DE miRNA y(y = 1,·, J) and between DE mRNA x and DE lncRNA z(z = 1,·, K), respectively, were calculated according to Eqs. (1) and (2).
where x i , y i , and z i andx,ȳ, andz are the i-th element and the average value of all elements in the expression vectors of DE mRNA x, DE miRNA y, and DE lncRNA z, respectively. Pearson correlation coefficients between all DE mRNAs and DE miRNAs and between all DE mRNAs and DE lncRNAs constitute two correlation coefficient matrixes, which are represented by M (N × J) and L (N × K), respectively.

Calculation of the Amount of Information for DE mRNA in the Coexpression Network
The connection of each molecule in the regulatory network is influenced by many factors, such as environment and diet, and has a degree of uncertainty that accounts for the amount of information for each molecule (Teschendorff et al., 2014).
In this study, we propose to use the information rate of a DE mRNA in the transcriptional coexpression networks to measure the uncertainty of its connection and then use Shannon's information entropy theory to estimate the amount of coexpression information for a DE mRNA. The information rate for DE mRNA x in the coexpression network between DE mRNA and DE miRNA was defined as the ratio of a significant pearson correlation coefficient (p < 0.05) in where M x nd L' x are the vectors composed of the pearson correlation coefficients with statistical p values less than 0.05 in the x-th row of M and L, respectively; M' xy (y = 1, 2,·, J') and L' xz (z = 1, 2, ·, K') are the pearson correlation coefficients with statistical p-values less than 0.05 in the x-th row of M and L, respectively; and J' and K' are the corresponding numbers.
According to Shannon's information entropy theory, the amount of coexpression information for DE mRNA x (expressed as S RNAx ) is estimated by Eq. (5).
where p xy is the y-th information rate in p x , y = 1, 2, ·,J'; q xz is the z-th information rate in q x , z = 1, 2,·, K'.

Estimation of the Amount of Information for DE mRNA in the Protein-Protein Interaction Network
We constructed a protein-protein interaction network based on the protein interaction information of all DE mRNAs acquired from the online STRING database 1 . Higher protein-protein connectivity score in the protein-protein interaction network corresponds to greater amount of interaction information between two proteins (Szklarczyk et al., 2019). Therefore, we used cs to measure the amount of protein interaction information (represented by S PPIx ) that corresponds to DE mRNA x according to Eq. (6).
where cs xj = 1) represents the connection score between a protein encoded by DE mRNA x and a protein encoded by another DE mRNA j (j N, j =x).

Identification of PmBs Associated With GA
The sum of S RNAx and S PPIx normalized by maximum was used as the total information amount of DE mRNA x (denoted by S x ) according to Eq. (7).
All DE mRNAs were sorted according to S x (x = 1, 2 ·, N), and PmBs were identified based on the change trend of S x (x = 1, 2·,N). The number of identified PmBs was recorded as Q. Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses of PmBs were performed by the clusterProfiler R package to investigate the functions of PmBs (Yu et al., 2012).

Evaluation of Sample Classification Power of PmBs
We designed four classifiers based on PmBs to discriminate the positive GA and negative control samples to illustrate the value of PmBs identified by mRBioM in auxiliary clinical diagnosis. The performance of the four classifiers was evaluated by sensitivity, specificity, and accuracy.

Cancer-Related Factor
The cancer-related factor of a sample was determined by the expression values of PmBs in the samples and was used to discriminate the sample types. The cancer-related factor value of a sample was defined as the ratio of the average logarithm values of the expression of upregulated and downregulated PmBs in the sample according to Eq. (8).
where CF indicates the value of cancer-related factor. n up nd Ex up (i) are the number of upregulated PmBs and the expression value of the i-th upregulated PmB in a sample, respectively. Similarly, n dn nd Ex dn (j) are the number of downregulated PmBs and the expression value of the j-th downregulated PmB.
1 http://string-db.org/ We randomly selected n (in this instance, n = 10) GA and adjacent normal samples from the mRNA expression profile of dataset 1 to identify the best CF threshold for discrimination of the positive and negative samples, and only the expression value of Q PmBs from each sample was used to form dataset 2 (C = 10, N = 10). Next, the expression profiles of 2n samples in dataset 2 were converted into a new expression profile containing n samples. The expression value vector S m (dimension is 1 × n) of the m-th (m = 1, 2,·,Q) PmB in the synthetic expression profile was calculated according to Eq. (9).
where S tm (dimension is 1 × n) and S nm (dimension is 1 × n) are the expression value vectors of the GA and control samples in dataset 2 of m-th (m = 1, 2,·,Q) PmB, respectively. S tm (i) and S nm (i) are the i-th expression value elements (I = 1, 2,·,n) of S tm (dimension is 1 × n) and S nm (dimension is 1 × n), respectively. Next, Eq. (8) was used to calculate the CF of the i-th sample in the generated expression profile (denoted as CF i , i = 1, 2,·,n), and the geometric mean value of the CF values of n samples (Eq. 10) was used as the threshold of CF (denoted as CF th ).
Finally, the samples of dataset 2 were excluded from TCGA-STAD, and the remaining samples only with the expression values of Q PmB were used to form dataset 3 (C = 267, N = 12), which was used to test the ability of cancer-related factor to recognize the GA samples. If the CF of a sample was greater than CF th , the sample was identified as GA (positive); otherwise, the sample was identified as control (negative).

Classifiers With Machine Learning
Three classifiers with machine learning based on random forest (RF) , support vector machine (SVM) Zhang and Liu, 2017), and naive Bayes (NB) (Dou et al., 2015) were constructed using the normalized expression values of PmBs as the classification feature implemented by randomForest R package (Liaw and Wiener, 2002), the svm function of the e1071 R package (Meyer et al., 2019), and the NaiveBayes function of the klaR R package (Weihs et al., 2005), respectively. Of course, there are other improved Bayesian models that can replace NB classification algorithms (Nagarajan et al., 2013;Thapa et al., 2018). Since the unbalanced sample size between the GA and control groups will affect the classification effect of the three classifiers, we used the downsampling method to randomly extract 28 samples from 277 GA samples and retain all 22 adjacent normal samples in TCGA-STAD, which formed validation dataset 4 (C = 28, N = 22). Finally, the performance of the three classifiers with machine learning was confirmed on dataset 4 by using the fivefold crossvalidation method.

PmBs-Based Survival Analysis
We excluded 10 patients with missing survival time or less than 30 day survival from the cohort of 279 patients in TCGA-STAD to exclude patients who died from other factors and finally used the transcription profiles of 269 GA patients with 55 PmBs to form dataset 5 for survival analysis. The average survival time of GA patients in dataset 5 was 21.575 ± 17.506 months, and 105 GA patients died at the end of follow-up, accounting for 39% of the total cohort. Clinical information about patients (Supplementary Table 1) and dataset 5 (C = 269) were integrated, and a univariate Cox regression model of the survival R package (Peterschmitt et al., 2018) was used to identify survival-related PmBs that have a significant impact on survival time (p < 0.05); then, a multivariate Cox regression model was used to determine T survival-related PmBs to construct a prognostic risk model (Lossos et al., 2004) used to calculate the survival-based risk score of a patient (Eq. 11).
where Exp PmB (t) is the expression value of t-th survival-related PmB in the patient sample, and W PmB (t) is the corresponding multivariate Cox regression coefficient of t-th survival-related PmB, t = 1, 2,·, T. Then, the median of the risk scores of all patients in dataset 5 was used as the cutoff value to divide the patients into the high-and low-risk groups. Finally, Kaplan-Meier analysis was used to assess the overall survival rate of patients in the high-and low-risk groups, and the log-rank test was used to determine whether there is a significant difference in the overall survival rate of patients in the high-risk vs. low-risk groups. In addition, we used the survivalROC package (Kamarudin et al., 2017) of R to perform ROC curve analysis to evaluate the sensitivity and specificity of the prognostic risk model.

DE mRNAs and PmBs in GA
A total of 170 DE mRNAs | log 2 FC(| 1, Padj < 0.01), 623 DE lncRNAs | log 2 FC(| 1, Padj < 0.05), and 52 DE miRNAs | log 2 FC(| > 1, Padj < 0.01) were obtained. Figure 2A shows the volcano plots of significantly DE RNAs, the details of all DE mRNAs are shown in Supplementary Table 2. And the results of the protein-protein interacti network analysis are shown in the attached file "string_protein_interactions_170.tsv." The total information amount for each DE mRNA was calculated by mRBioM, and the curve constructed by total information amount of all DE mRNAs from large to small is shown in Figure 2B. There is a significant decrease of curve after the orange area and finally the curve tends to be stable. Therefore, a total of 55 DE mRNAs with total information amount corresponding to the orange region were identified as PmBs for further study ( Table 2). A literature search confirmed that 13 PmBs were related to GA (23.64%), and 27 PmBs were related to other cancers (49.09%) ( Table 2). The expression distribution of 55 PmBs is shown in Figure 2C, corresponding to 41 upregulated PmBs (lower right corner vs. lower left corner) and 14 downregulated PmBs (upper right corner vs. upper left corner).

Functional Enrichment Analysis of PmBs in GA
GO and KEGG functional enrichment analyses were performed by clusterProfiler of R using 55 PmBs to investigate the potential functions of these biomarkers. As shown in Figure 3A, the GO terms indicated that these 55 PmBs were mainly concentrated in chromatin binding (p < 0.05). The results of KEGG analysis with p < 0.05 suggested that these 55 PmBs were mainly related to pathways closely associated with occurrence and development of cancer, such as mitophagy-animal, ribosome biogenesis in eukaryotes, MAPK signaling pathway, cAMP signaling pathway, central carbon metabolism, microRNAs in cancer, and renal cell carcinoma ( Figure 3B).

Sample Classification Power of Cancer-Related Factor
CF th of dataset 2 was 0.9725, and the remaining samples in TCGA-STAD formed dataset 3 to test the sample classification power of cancer-related factor ( Table 3). Table 3 shows that accuracy, sensitivity, and specificity achieved by cancer-related factor were 0.90, 0.89, and 1, respectively. The ROC curve of cancer-related factor is shown in Figure 4A, and the area under the ROC curve (AUC) reached 0.9494. The cancer-related factor constructed by PmBs has high specificity and sensitivity and low computational complexity and does not require training; thus, it has great potential application in auxiliary clinical diagnosis.

Sample Classification Power of Classifiers With Machine Learning
The results of fivefold cross-validation of RF-based, SVM-based, and NB-based classifiers using dataset 4 are shown in Table 4. Average accuracy, sensitivity, and specificity of the RF-based, SVM-based, and NB-based classifiers were 0.94, 0.98, and 0.96, 0.94, 0.97, and 0.94, and 1, 1, and 0.97, respectively. The average ROC curves of the three classifiers are shown in Figure 4B, and all three AUCs were above 0.99. This finding provides further proof that PmBs can be potential markers related to GA.
The median of the risk scores of all GA samples −(34 in this case) was used as the cutoff value, and 269 patients were divided into the high-risk (>−34, n = 134) and low-risk groups (<−8.34, n = 135). Kaplan-Meier survival analysis of patients in the highand low-risk groups showed that the difference between the two groups was significant (p < 0.0001). As shown in Figure 5A, the average survival time of patients inhe high-risk group was shorter, and the number of deaths was higher than those of patients in the low-risk group. In addition, the results of ROC analysis showed that the AUC value of the prognostic risk model constructed using 4 PmBs was 0.7742 ( Figure 5B), suggesting good specificity and sensitivity.

Generalization Ability
Generalization ability of mRBioM was assessed on three other complete transcriptomic datasets, including TCGA-COAD (colonic adenocarcinoma), TCGA-LUAD (lung adenocarcinoma), and TCGA-LIHC (hepatocellular carcinoma), downloaded from the TCGA database, and the results are shown in Table 5. Average accuracy and sensitivity of CF were between 0.92 and 0.99, and average specificity was 1. Average accuracy and sensitivity of the RF-based, SVM-based, and NB-based classifiers were between 0.94 and 0.99, and average specificity was above 0.95. Therefore, the classifiers constructed with PmBs have good sample classification power in 3 other cancer datasets, indicating that the mRBioM algorithm has good generalization ability and can effectively identify potential cancer-related mRNA markers in other cancers.

DISCUSSION
This study proposed the mRBioM algorithm to identify potential mRNA biomarkers from the complete transcriptomic RNA profiles of GA. Unlike existing algorithms, mRBioM evaluates the potential of each DE mRNA as a biomarker by combining the corresponding amount of information at the transcription and protein levels based on the information entropy theory. Fifty-five DE mRNAs were identified as PmBs associated with GA. These 55 PmBs were used to construct four sample classifiers, including cancer-related factor, RF-based, SVMbased, and NB-based classifiers, to illustrate the reliability of PmBs identified by mRBioM. Good sensitivity, specificity, and accuracy of classification were achieved by the four classifiers. Four of fifty-five PmBs had good ability for prognostic evaluation of the overall survival of GA patients. TCGA-COAD, TCGA-LUAD, and TCGA-LIHC datasets confirmed the generalization ability of mRBioM. The classifiers constructed by the identified PmBs suggested good performance in a variety of classification algorithms and cancer-related datasets, which is expected to be used in more researches on cancer-related biomarker identification.
Thirteen of 55 PmBs ( Table 2) were confirmed by the data of the literature to play certain roles in occurrence and development of GA and were biomarkers or potential therapeutic targets of GA. For example, GPRC5A and SOX9 have been shown to be related to occurrence and development of GA , and their expression levels changed more than fourfold in the GA vs. adjacent control samples according to the result of DE RNA analysis (log 2 FC > 2). FCMET has been confirmed as a resistance factor in GA (Ebert et al., 2019), and Wang R. G. et al. (2020) demonstrated that FKBP10 may be a crucial player mediating cell proliferation, invasion, and migration by regulating the PI3K signaling pathway in GA. Twenty-seven of 55 PmBs ( Table 2) were shown to be associated with other cancers according to the data of the literature. Thus, mRBioM identified some new GA-related mRNAs. We attempted to extract additional DE mRNAs as PmBs related to GA. However, adding PmBs did not improve the classification powers of the four classifiers, and these extra PmBs were not associated with prognosis. Thus, our strategy for PmBs screening according to the change trend of the total information amount for all PmBs was effective.
Notably, the value ranges of the cancer-related factor calculated in most cancer and adjacent normal samples of four cancer-related datasets were 0.9-1.4 and 0.7-0.9, respectively. Additionally, the thresholds of cancer-related factors (CF th ) in all four datasets were approximately 1. The values of cancerrelated factors and their corresponding thresholds showed good consistency and robustness in all four datasets. Although the classification performance of cancer-related factor is slightly worse than that of three classifiers with machine learning, the approach does not require training and has considerably lower computational complexity than that of three classifiers with machine learning. Importantly, the method requires only a small number of cancer and adjacent samples to determine the threshold and evaluates whether a single sample corresponds to cancer. Thus, the cancer-related factor may have good application prospects in the personalized diagnosis of cancers. LMNB2, BGN, MFSD12, and SOX4 in 55 GA-related PmBs were identified and combined into a prognostic risk scoring model. There is no experimental evidence that LMNB2, BGN, and MFSD12 in this combination are associated with GA, and these are new PmBs identified in this study. LMNB2 belongs to the lamin family and is closely related to occurrence, development, and prognosis of liver cancer (Kong et al., 2020;Li X. N. et al., 2020). BGN is an important member of the leucinerich small proteoglycan family and an important component of the extracellular matrix. Clinical studies have shown that upregulation of BGN is related to poor prognosis of patients with various types of cancer syndromes (Zhao S. F. et al., 2020). MFSD12, also known as PP3501, is a nuclear protein . Bioinformatic analysis revealed that upregulated expression of MFSD12 is a key promoter of cell proliferation, potential prognostic biomarker, and therapeutic target for melanoma (Wei et al., 2019). SOX4 is a key transcription factor involved in occurrence and development of many cancers Wang et al., 2018;Ding et al., 2019) and was shown to be related to the proliferation, migration, and invasion of GA cells and prognosis of GA patients (Fang et al., 2012;Dong et al., 2018;Shao et al., 2020). Therefore, the model has good sensitivity and specificity (AUC = 0.7742), and the risk score calculated by the model can effectively predict the risk of GA patients (p < 0.0001, hazard ratio = 2.845, 95% CI: 2.033-3.981).
In conclusion, our study proposes an mRBioM algorithm to identify PmBs from the complete transcriptomic RNA profiles of GA by integrating and analyzing the information at transcriptome and proteome levels. mRBioM identified 55 PmBs related to the occurrence, development and prognosis of GA, which may provide potential biomarkers for early diagnosis, treatment, and prognosis of GA. mRBioM can also be applied in other cancers for cancer-related biomarker identification. But this study also has several limitations. mRBioM is a computational method, and reliability of GA-related PmBs identified by mRBioM was confirmed only by computational methods; thus, further experimental studies are needed to verify the clinical value of identified GA-related PmBs.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
CD and NR designed the study and wrote the manuscript. CD, FG, and XL conducted the computer experiments. NR, WD, GW, and JZ analyzed the results and revised and offered advice about the manuscript. All authors participated in the critical review, revision of this manuscript, contributed to the article, and approved the submitted version.