Salivary Microbiota for Gastric Cancer Prediction: An Exploratory Study

To characterize the salivary microbiota in patients at different progressive histological stages of gastric carcinogenesis and identify microbial markers for detecting gastric cancer, two hundred and ninety-three patients were grouped into superficial gastritis (SG; n = 101), atrophic gastritis (AG; n = 93), and gastric cancer (GC; n = 99) according to their histology. 16S rRNA gene sequencing was used to access the salivary microbiota profile. A random forest model was constructed to classify gastric histological types based on the salivary microbiota compositions. A distinct salivary microbiota was observed in patients with GC when comparing with SG and AG, which was featured by an enrichment of putative proinflammatory taxa including Corynebacterium and Streptococcus. Among the significantly decreased oral bacteria in GC patients including Haemophilus, Neisseria, Parvimonas, Peptostreptococcus, Porphyromonas, and Prevotella, Haemophilus, and Neisseria are known to reduce nitrite, which may consequently result in an accumulation of carcinogenic N-nitroso compounds. We found that GC can be distinguished accurately from patients with AG and SG (AUC = 0.91) by the random forest model based on the salivary microbiota profiles, and taxa belonging to unclassified Streptophyta and Streptococcus have potential as diagnostic biomarkers for GC. Remarkable changes in the salivary microbiota functions were also detected across three histological types, and the upregulation in the isoleucine and valine is in line with a higher level of these amino acids in the gastric tumor tissues that reported by other independent studies. Conclusively, bacteria in the oral cavity may contribute gastric cancer and become new diagnostic biomarkers for GC, but further evaluation against independent clinical cohorts is required. The potential mechanisms of salivary microbiota in participating the pathogenesis of GC may include an accumulation of proinflammatory bacteria and a decline in those reducing carcinogenic N-nitroso compounds.


INTRODUCTION
Gastric cancer (GC) constitutes the third highest cause of cancer mortality worldwide (Bray et al., 2018), and the 5-year survival rates are 27.4 and 32% in China and the USA, respectively. The occurrence and development of gastric carcinogenesis is a complex pathogenic process involving multiple factors, multistage changes and polygenic alterations (Massarrat and Stolte, 2014;Goral, 2016). A late-stage presentation is common in most GC cases, because symptoms in early stages of the disease are usually vague and non-specific. As early detection leads to better outcomes, there is a critical need for new avenues of prevention, risk stratification, and early detection for GC.
Microbes in the upper digestive tract have been shown to facilitate carcinogenesis by contributing inflammatory processes via activation of Toll-like receptors pathway (Kauppila and Selander, 2014), or protect against carcinogenesis by providing barriers to pathogen invasion (Yang et al., 2014). Chronic infection with Helicobacter pylori is a well-established risk factor for gastric carcinogenesis. Lines of evidence demonstrated that the process of Correa's cascade of gastric carcinogenesis initiated by H. pylori involves multiple virulence factors, host genetic make-up, and nutritional factors (Warren and Marshall, 1983;Polk and Peek, 2010;Engstrand and Lindberg, 2013;Plummer et al., 2015). Nevertheless, only about 3% of those infected with H. pylori will eventually develop into gastric cancer, and the eradication of H. pylori does not completely prevent the occurrence of GC (Peek and Crabtree, 2006). These lines of evidence suggest that non-H. pylori microorganisms colonizing the stomach may represent an additional modifier of gastric cancer risk (Sung et al., 2020). The enrichment of some bacteria in the gastric mucosa has been associated with the progression of gastric cancer, including Peptostreptococcus stomatis, Streptococcus anginosus, Parvimonas micra, Slackia exigua, and Dialister pneumosintes (Coker et al., 2018). Our recent study suggested that a reduction of nitrite-oxidizing Nitrospirae taxa in the gastric mucosa may contribute to gastric neoplastic progression via nitrate accumulation (Wang et al., 2020).
Most of the microbial sources in the stomach are believed from the external environment. The oral cavity contains a large number of microorganisms, including bacteria, viruses, fungi, mycoplasma, and chlamydia (Aas et al., 2005;Wade, 2013;He et al., 2015). The oral microbiota can enter the downstream digestive tract from the oral cavity through saliva and can also migrate to various parts of the body to cause infections and local inflammatory reactions in corresponding sites , oral microbes are closely correlated with several systemic diseases such as the oral tumors, type 2 diabetes, cardiovascular disease, urinary systemic diseases and rheumatoid arthritis (Seymour, 2010;Ahn et al., 2012;Salazar et al., 2012;Whitmore and Lamont, 2014;Gao et al., 2018). Recently, oral microbiota has been suggested to play a role in the etiology of esophageal cancer, colorectal cancer (CRC), and pancreatic cancer (Michaud and Izard, 2014;Peters et al., 2017;Flemer et al., 2018). Interestingly, a higher incidence of GC was found among people with worse oral hygiene (Watabe et al.), indicating a potential link between the oral microbiota and the occurrence/ development of gastric cancer. In this study, we characterized the microbial compositional and ecological changes in salivary microbiota of patients with GC and non-malignant gastric lesions including superficial gastritis (SG) and atrophic gastritis (AG). We demonstrated the possibility of using salivary microbes as biomarkers GC detection, and explored the potential mechanisms of oral microbiota in the pathogenesis of GC.

Participants
Two hundred and ninety-three patients who received an endoscopic examination in the Chinese PLA General Hospital and Civil Aviation General Hospital were enrolled in this study. The study cohort was recruited from October 2017 to October 2019. The inclusion criteria were: (1) adult male or female; (2) Han nationality from northern China; (3) able and willing to provide signed and dated informed consent; (4) able and willing to provide salivary samples. The exclusion criteria were: (1) taking antibiotics, proton pump inhibitors (PPIs), probiotics, prebiotics, chemotherapeutic drugs, and any other drugs affecting oral microbiota within the last month; (2) diagnosed with acute or chronic pulmonary, cardiovascular, hepatic, or renal disorders; (3) positive test for human immunodeficiency virus, hepatitis B or C virus; (4) a history of major surgery; and (5) women who were pregnant or lactating.
Data collection was conducted for all subjects, including demographics, medical history, drugs, and hematology tests.

Endoscopic and Histologic Examination
Patients' diagnostic evaluation was based on the endoscopic and histological examination. SG was confirmed according to the infiltrating depth and density of chronic inflammatory cells in the mucosa, without the reduction of proper gastric glands at each biopsy site. If the gastric mucosa in the antrum and the body were atrophied and thinned, the submucosal vessels could be well visualized under the gastroscopy; in the meantime, the proper gastric glands reduced at each biopsy site, it was defined as AG. IM was defined as the replacement of gastric mucosal epithelial cells by intestinal epithelial cells at each biopsy site. GC was confirmed by the histological examination; according to WHO gastric adenocarcinoma grading criteria, it was divided into well differentiated, moderately differentiated, and poorly differentiated (Jean-Francois, 2011).

Sample Collection
Salivary sample collection and preparation were carried out in accordance with previously published consensus (Shi et al., 2019). All the subjects were fasting and did not brush the teeth in the morning. Thirty minutes before sampling, subjects were asked to rinse the mouth with water, and then 1 ml saliva was collected in a sterilized tube containing 1.0 ml RNAlater (Life Technologies, USA), transferred to the laboratory and stored at room temperature until DNA extraction.

DNA Extraction and 16S rRNA Gene Amplicons
To evaluate the salivary bacterial diversity, high-throughput sequencing of the 16S rRNA was performed. Bacterial genomic DNA of the saliva was isolated using the QIAamp DNA Mini Kit (QIAGEN, Valencia, CA, USA) combined with the bead-beating method. The DNA concentrations of each sample were adjusted to 50 ng/ml and stored at −80°C for sequencing. The hypervariable V3-V4 region of the 16S rRNA gene was amplified using the universal primers (515F, GTGCCAGCMGCCGCGGTAA and 806R, GGACTACHV GGGTWTCTAAT) with a 6-bp barcode. All PCR reactions (including denaturation, annealing and elongation) were carried out with Phusion ® High-Fidelity PCR Master Mix (New England Biolabs). The single amplifications were performed in 25 µl reactions with 50 ng template DNA. Normalized equimolar concentrations of PCR products were pooled and sequenced using the Illumina MiSeq PE300 platform (Illumina, San Diego, CA, United States) at Shenzhen Decipher Biotechnology Laboratory.
We employed the QIIME 2 (Bolyen et al., 2019) dada2 denoise-paired method to denoise, dereplicate, and filter chimeras from the sequence data. For taxonomic classification, we trained a Naive Bayes classifier on the 16S rRNA V3-V4 regions with q2-feature-classifier method (Supplementary File S1). The metagenome functions of the salivary microbiota were predicted through PICRUSt2 on the basis of 16S rRNA gene sequencing profiles (Douglas et al., 2020).

Statistical and Bioinformatic Analyses
The baseline continuous data were presented by mean ± standard deviation (SD) and analyzed by independent t test or non-parametric rank test. The categorical data were described in percentages and compared by c 2 test or Fisher's exact test. All tests for significance were two-sided, and P <0.05 was considered significant.
Calypso (version 8.84) was used to conduct statistical analysis of the microbiota compositional data. The read counts were normalized with total sum normalization, and taxa having less than 0.02% relative abundance across all samples were excluded from the following analysis. The Amplicon sequence variant (ASV) counts were normalized with total-sum scaling (TSS) followed by cumulative-sum scaling (CSS). The alpha diversity of the salivary microbiota was measured by Shannon's index and Chao1 index. The relative abundances of taxa were log2 transformed to account for the non-normality. Principal coordinate analysis (PCoA) based on unweighted and weighted UniFrac distance matrices were employed to stratify samples and identify group level clusters, and the corresponding statistical significance was assessed using Permutational multivariate analysis of variance (PERMANOVA). Anosim was applied to compare the intra-group distances with between-group distances. Kruskal-Wallis test was used to detect significant differences in the alpha diversity, abundances of taxa, and metabolic pathways across the histological stages, which was followed by Wilcoxon rank-sum test confirming the significant differences between each two groups. Benjamini-Hochberg (BH) procedure was applied to control the false discovery rate. The Linear discriminant analysis Effect Size (LEfSe) (Segata et al., 2011) was applied to identify the features (ASV or functions) most likely to explain differences in the salivary microbiota between histological types. Spearman correlation networks were constructed based on the top 30 most abundant genera and edges of correlations with Holm-corrected P <0.05 were shown.
The random forest (RF) model was built through the caret R package. Five-fold cross-validation and area under the receiver operating characteristic (ROC) curve (AUC) were used to evaluate the prediction performance of the model and was implemented using pROC R package. The RF disease classifier using oral bacterial abundances at the genus level was constructed with 60% randomly selected samples as the training set and tuneLength = 4. The R code and taxa abundance table used for constructing the random forest model are provided in the Supplementary Code 1.

Demographic Characteristics of the Patient Cohort
After a standardized endoscopic procedure and histopathological evaluation, a total of 101 SG, 93 AG (21 without IM, 72 with IM), and 99 GC subjects were enrolled. The gender and age were matched among the four groups (P = 0.9152 and P = 0.3582, respectively). There were also no significant differences in body mass index (BMI), socioeconomic, medical history (including periodontosis), or lifestyle characteristics (smoking and drinking status) among the four groups (Table 1).

Salivary Microbiota Changes Are Associated With Gastric Neoplastic Progression
This study assessed the salivary microbiota by sequence analysis of the 16S ribosomal RNA gene. A total of 14,989,371 raw reads were obtained after quality filtering, with an average of 51,158 for each sample. The refined reads were clustered into 1,275 ASVs. The salivary microbiota alpha diversity was significantly lower in GC than that of SG and AG ( Figures 1A, B). Beta analysis with PCoA showed that the cluster of GC samples could be separated from SG and AG ( Figures 1C, D). Regarding AG, the alpha and beta diversities in the salivary microbiota in patients with and without intestinal metaplasia were not distinguishable ( Figure  S1). For GC patients with different histological grades (well differentiated, moderately differentiated, and poorly differentiated), there was no significant difference in the biodiversity among in the salivary microbiota ( Figure S2). Compositionally, the most abundant phyla in the salivary microbiota are Bacteroidetes, Protobacteria, Firmicutes, Fusobacteria, and Acinobacteria, which account for more than 94% of the bacterial community for each histological stage of GC ( Figure S3A). Patients with GC had a higher relative abundance of Cyanobacteria (Table S1). At the genus level, Prevotella, Neisseria, Veillonella, Haemophilus, Porphyromonas, Streptococcus, Fusobacterium, and Rothia constitute more than 70% of the salivary microbiota for each histological stage of GC ( Figure S3B). The levels of Anaerovorax, Bulleidia, unclassified F16, and Peptostreptococcus gradually decreased from SG through AG to GC ( Figure 2; Table S1), indicating a negative association of these bacteria with GC development. The genera Streptococcus and unclassified Streptophyta were significantly higher in GC, whereas Fusobacterium, Haemophilus, Neisseria, Parvimonas, Peptostreptococcus, Porphyromonas, and Prevotella were less abundant in GC. In addition, Bacteroides genus was found particularly more abundant in the patients with AG.
It has been shown that the composition and function of the oral microbiota are affected by lifestyle factors such as alcohol and tobacco use and health characteristics such as periodontitis, tooth status, and HP infection (Bornigen et al., 2017;Zhao et al., 2019). We used MaAsLin2 to access the multivariable association between metadata and salivary microbiota. Analysis result indicated that only Faecalibacterium had a significant negative correlation with tabaco usage (Figure 3), while no taxa were found to be significantly associated with alcohol usage, periodontosis, or HP-infection. There were 13 genera that had significant correlationships with GC, including five positive correlations (enriched in GC) and eight negative correlations (reduced in GC). Six genera were found to be negatively correlated with both AG and GC, and Bacteroidetes had a significant positive correlation with AG. Taken together, the results of multivariate analysis are in accordance with the results of univariate analysis.

Salivary Microbiota Is Predictive of Stages of Gastric Carcinogenesis
To identify the most relevant taxa responsible for the differences among the disease stages, we performed LEfSe analysis based on the genus ( Figure 4A; Figure S4A). The representative bacterial  To further explore the potential of the salivary microbiota as diagnostic biomarkers for GC, we constructed a random forest model for identifying malignancy based on the salivary microbiota at the genus level. This model showed a high accuracy in distinguishing GC from non-malignant lesions, yielding an AUC of 0.91 (95% confidence interval 0.778-0.99) ( Figure 4B). Moreover, the random forest modeling approach was also able to distinguish SG, AG, and GC subjects, resulting in an AUC of 0.84, 0.76, and 0.877 for SG, AG, and GC, respectively (Supplementary Figure S4B). Among the top 10 genera with highest contributions to the model classification performance ( Figure 4C; Figure S4C), unclassified Streptophyta and Streptococcus were also identified as GC-associated microbial features by LEfSe ( Figure 4A; Figure S4A), enhancing their potential in becoming biomarkers for GC diagnosis.
In addition, we performed a network analysis to visualize the commensal relationships among the salivary microbiota of the three histological types ( Figure S5). Interestingly, we found that the correlations between Prevotella and other taxa dominated the negative relationships for all the three histological types, with their number reduced in AG and GC.

Salivary Microbiome Functional Capacity in SG, AG, and GC
Analysis of PICRUSt2 revealed differentially abundant metabolic functions in the bacterial communities across the histological FIGURE 2 | The salivary microbiota composition in patients with malignant and non-malignant gastric lesions. The salivary microbiota composition in patients with malignant and non-malignant gastric lesions. Relative abundance of salivary taxa of the three groups were compared at the genus level. Significance was determined by using Kruskal-Wallis rank sum test with BH-adjusted P < 0.001, and Wilcoxon rank-sum tests for each of the two groups with *BH-adjusted P < 0.05, **BHadjusted P < 0.01, and ***BH-adjusted P < 0.001. FIGURE 3 | Heatmap summarizing the significant associations between oral bacteria and metadata. Color key: -log(q-value)*sign(coefficient). Cells that denote significant associations are colored (red or blue) and overlaid with a plus (+) or minus (−) sign that indicates the direction of association.

DISCUSSION
Previous studies have shown that periodontal disease and poor oral health status were associated with increased incidence of malignant diseases (Dizdar et al., 2017;Michaud et al., 2018), pointing to a potentially oncogenic role of oral microorganisms in the development of cancer. Some species have been identified that correlate strongly with oral squamous cell carcinoma (OSCC), such as Capnocytophaga gingivalis, Prevotella melaninogenica, and Streptococcus mitis (Mager et al., 2005), which were suggested as diagnostic markers since they predicted 80% of cancer cases. Oral microbes are also detected in tumors distant to the oral cavity. For example, many works have shown that the oral periopathogens Fusobacterium nucleatum and Porphyromonas gingivalis are essential in the development of colorectal and pancreatic cancer, respectively (Rubinstein et al., 2013;Fan et al., 2018a). The oral microbiota was shown to reflect an inflammatory status of the stomach in patients with H. pylori infection (Zhao et al., 2019), and could detect GC with a high accuracy (Coker et al., 2018;Wu et al., 2018). In this study, we demonstrated that the salivary microbiota could identify GC among patients with non-malignant gastric diseases including SG and AG, yielding a high accuracy (AUC of 91%). With a cohort consisting of 37 GC patients and 13 healthy individuals, a previous study also showed a high sensitivity rate (AUC of 97%) of using oral microbiota screening gastric cancer (Sun et al., 2018), further enhancing the diagnostic potential of the oral bacteria for gastric malignancy. The microbiome is exclusive to the individual and influenced by lifestyle and phenotypic and genotypic determinants. For example, alcohol consumption and tobacco usage have been shown to influence the oral microbiome composition (Wu et al., 2016;Fan et al., 2018b). Therefore, lifestyle should be considered as confounding factors when identifying diagnostic microbial markers from the oral microbiome. Multivariate analysis method revealed that enrichment of Faecalibacterium was negatively associated with smoking, and no significant correlation was found between salivary bacteria and alcohol or HP-infection. Thus, the potential biomarkers identified based on our data seemed not affected by the recorded lifestyle and HP infection. Our data showed that alpha diversity of the salivary microbiota was similar among patients with different gastric histological types, which was consistent with another study (Kageyama et al., 2019). One previous study found that the microbial diversity of saliva and dental plaque significantly increased in GC patients (Sun et al., 2018), whereas another one suggested that the microbiota diversity significantly reduced in the tongue coating of GC patients (Cui et al., 2019). Taken together, the microbial diversity of oral microbiota seems not strongly associated with the development of GC.
Data from our recent study (Wang et al., 2020) as well as others (Dicksved et al., 2009;Castano-Rodriguez et al., 2017;Chen et al., 2019) showed that commensals of the oral cavity including Fusobacterium, Peptostreptococcus, Prevotella, Streptococcus, and Veillonella were found to have higher relative abundances in the gastric mucosa of GC patients. Notably, these genera are also commensals of oral cavity, but their translocation and expansion may be involved in the onset and development of multiple diseases including cancers. One possible mechanism of oral microbiota participating carcinogenesis is enrichment of pro-inflammatory oral bacterial species. For example, Streptococcus bovis has been shown to promote the development of colon cancer by enhancing the inflammation (Abdulamir et al., 2011). We observed that Streptococcus genus was enriched in the saliva microbiota of GC patients, which agrees with the findings in a recent study (Sun et al., 2018). Interestingly, an enrichment of Streptococcus spp. was also reported across several types of cancer such as colorectal adenocarcinomas (Abdulamir et al., 2011). Taken together, these results indicate a potential of some strains of Streptococcus being involved in gastric carcinogenesis. In addition, Corynebacterium genus was also found to be enriched in the saliva of GC patients, which was in line with (Wu et al., 2018) a higher level Corynebacterium; this genus was found higher in the tongue coating microbiota community of GC patients than that of the healthy controls. Species of Corynebacterium are widely distributed in the microbiota of human skin, and most of them are innocuous while some species are known to cause infection such as C. diphtheria. In recent years, they have been increasingly reported as emerging opportunistic pathogens in immunocompromised patients with cancer, hematologic malignancy, and critical condition ). Thus, a higher level of Corynebacterium spp., which appeared in the oral cavity, may reflect immune deficiency in cancer patients. Altogether, an enrichment of proinflammatory bacteria in the oral cavity is likely an import factor contributing to the development and progression of GC.
Several bacterial taxa were found reduced in the salivary microbiota of GC patients, including Bulleidia, Fusobacterium, Haemophilus, Lachnoanaerobaculum, Neisseria, Parvimonas, Peptostreptococcus, Porphyromonas, and Prevotella. Intriguingly, a decreased carriage of Bulleidia was also captured in the oral cavity of patients with esophageal squamous cell carcinoma (Chen et al., 2015). Moreover, some taxa of these genera were found to be enriched in tumor and stool samples of colorectal cancer patients, such as Fusobacterium nucleatum, Parvimonas micra, Porphyromonas asaccharolytica, and Peptostreptococcus stomatis, Prevotella intermedia (Ternes et al., 2020). And we recently found a higher load of Fusobacterium in the gastric mucosa of GC patients compared to SG (Wang et al., 2020). In the present study, a low level of these genera was observed in the oral cavity of GC patients as compared with SG and/or AG. In fact, bacterial abundance is majorly regulated by nutrient availability and antimicrobial signals specific to their environmental conditions. Thus, albeit these bacteria colonize and expanded in the tumor site (such as gut of patients with colorectal cancer), they may not overgrow in their original localization such as the oral cavity.
Network analysis revealed that Prevotella was negatively correlated with a variety of oral bacteria in the oral cavity of all three histological stages, and the number of its negative relationships decreased in AG and GC groups. In the GC patients, the abundance of Prevotella in the salivary cavity was lower than that of SG and AG groups, which is opposite to the findings in Sun et al.'s study (Sun et al., 2018). This discrepancy at the genus level may be explained by increasing the phylogenetic resolution via metagenomic sequencing and identifying the specific species/strains that related to gastric cancer.
We previously found that patients with intraepithelial neoplasia had higher relative abundances of Haemophilus parainfluenzae and Nitrospirae family in the gastric mucosa, which decreased in that of GC patients (Wang et al., 2020). Both Haemophilus and Nitrospirae are nitrate-reducing bacteria, which convert nitrate to nitrite, and also to nitric oxide (NO), which can be absorbed through the blood vessels in the oral cavity or through being swallowed into the gastrointestinal system. Accumulation of N-nitroso compounds in the gastrointestinal tract is likely to increase the risk of carcinogenesis (Forsythe and Cole, 1987;Bryan et al., 2012). Thus, the decreased abundances of Haemophilus in the salivary microbiota may contribute to the formation of gastric tumor.
Functional analysis based on the PICRUSt2-predicted pathways suggested that metabolic functions of salivary microbiota changed along with the disease progression in the stomach. In particular, pathways involved in isoleucine and valine biosynthesis were highly expressed by the salivary microbiota in GC patients compared to the non-malignant stages. Interestingly, an upregulation of amino acids including isoleucine and valine was also detected in human gastric tumor tissues (Jung et al., 2014;Wang et al., 2016). Higher levels of most amino acids and their primary derivatives in gastric tumor tissues were thought to be related to two main sources: the degradation of extracellular matrix by matrix metalloproteinases and the autophagic degradation of intracellular proteins (Hirayama et al., 2009). The production of amino acid from microbes in the oral cavity and gastrointestinal tract has not been quantified and deserves further investigation in terms of proliferation and survival of gastric cancer cells.
There were several limitations in this study. Firstly, we didn't collect samples from healthy individuals as control. Secondly, amplicon sequencing of 16S rRNA has limited resolution in determining the bacterial species or strains, and it is therefore difficult to access the functions of specific bacteria involved in gastric cancer development and progression. In addition, the 16S rRNA gene V1-V3 region has been shown to provide superior taxonomic resolution for the bacterial microbiota of the human oral and respiratory tracts compared to the V3-V4 region (Zheng et al., 2015;Escapa et al., 2018). Thus, some other oral taxa with diagnostic potential for gastric cancer might not have been detected in the present data. Finally, independent clinical cohorts from multiple centers are required to evaluate the diagnostic value of the identified GC-associated saliva bacteria.

CONCLUSIONS
We demonstrated, with a large cohort, that the salivary microbiota can be used to predict GC as well as its nonmalignant stages. The contributions of the oral microbiota in the pathogenesis of GC include an accumulation of proinflammatory bacteria and a decline in those reducing carcinogenic N-nitroso compounds.

DATA AVAILABILITY STATEMENT
The data presented in the study are deposited in European Nucleotide Archive (https://www.ebi.ac.uk/ena/submit/sra/ #studies), accession number PRJEB42657.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of the Chinese PLA General Hospital (No. S2016-057-02). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
YY conceived the study and revised the manuscript. KH, LW, BY, ZW, XZ, LP, JY, and GS performed the subject's enrolment and sample collection. XG and KH analyzed clinical and sequencing data. KH and XG wrote this manuscript. All authors contributed to the article and approved the submitted version. Supplementary Figure 1 | The salivary microbiota biodiversity in AG patients with and without intestinal metaplasia. The alpha diversity of salivary microbiota was measure at the ASV level by using (A) Shannon index, (B) Chao1. Comparison between intra-group and inter-group community distances by (C) unweighted UniFrac distance matrix and (D) weighted UniFrac distance matrix revealed no significant difference between the microbiota compositions between AG patients with and without intestinal metaplasia. P value was calculated by Anosim comparing the intra-group distances with between-group distances.

SUPPLEMENTARY MATERIAL
Supplementary Figure 2 | The salivary microbiota biodiversity in GC patients with different histological grades. The alpha diversity of salivary microbiota was measure at the ASV level by using (A) Shannon index, and (B) Chao1 index. Comparison between intra-group and inter-group community distances by (C) unweighted UniFrac distance matrix and (D) weighted UniFrac distance matrix revealed no significant difference between the microbiota compositions among patients with well differentiated (W), moderately differentiated (M), and poorly differentiated(P) gastric tumor. P value was calculated by Anosim comparing the intra-group distances with between-group distances. Kruskal-Wallis test P<0.05 and log 10 LDA score>3.4. (B) ROC curves analysis to evaluate the discriminatory potential of salivary bacteria in identifying GC out of premalignant lesions. (C)The top 10 bacterial genera that are most important for discriminating between SG, AG, and GC. Each genus is ranked according to an importance score (mean decrease accuracy).
Supplementary Figure 5 | Network analyses reveal commensal relationships among the salivary bacteria. Spearman correlation network analyses showing the commensal relationships among the top 30 most abundant genera in the salivary microbiota of (A) superficial gastritis, (B) atrophic gastritis, and (C) gastric cancer. Taxa are represented as nodes, taxa abundance as node size, and are colored based on their belonging phylum. Edges represent significant correlations (Holmcorrected P < 0.05) among these taxa. Red and blue edges represent positive and negative correlations, respectively.
Supplementary Figure 6 | Functional changes in the salivary microbiota are associated with the progression of gastric carcinoma. PICRUSt2 predicted metabolic pathways that significantly different in the salivary microbiota of patients at different progressive histological stages of gastric tumorigenesis. Significance was determined by using Kruskal-Wallis rank sum test with BH-adjusted P < 0.05.
Supplementary Code 1 | R code and taxa abundance table used for constructing the random forest model and generating the AUC ROC curve.