Identifying and ranking causal microbial biomarkers for colorectal cancer at different cancer subsites and stages: a Mendelian randomization study

Introduction The gut microbiome is directly involved in colorectal carcinogenesis, but much of the epidemiological evidence for the effect of the gut microbiome on colorectal cancer (CRC) risk comes from observational studies, and it is unclear whether identified microbial alterations are the cause or consequence of CRC development. Methods Univariate Mendelian randomization (MR) analysis and multivariate MR analysis based on Bayesian model averaging were performed to comprehensively explore the microbial risk factors associated with CRC. The Network Module Structure Shift method was used to identify microbial biomarkers associated with CRC. Mediation analysis was used to explore the dietary habits-microbiota-CRC pathway. Results The results of the four methods showed that 9 bacteria had a robust causal relationship with the development of CRC. Among them, Streptococcus thermophilus reduced the risk of CRC; Eubacterium ventriosum and Streptococcus were beneficial bacteria of malignant tumors of colon (CC); Erysipelotrichaceae was a protective factor for malignant tumors of rectal (CR); Bacteroides ovatus was a risk factor for benign tumors. Finally, the mediation analysis revealed 10 pathways by which dietary regulation bacteria affected the risk of CRC, including alcohol consumption increased the risk of CC by reducing the abundance of Eubacterium ventriosum (mediated proportion: 43.044%), and the mediated proportion of other pathways was 7.026%-34.22%. Discussion These findings will contribute to the understanding of the different carcinogenic mechanisms of intestinal flora in the colon and rectum and the risk of tumor transformation, thereby aiding CRC prevention, early screening, and the development of future strategies to reduce CRC risk.


Introduction
Colorectal cancer (CRC) is the third most common cancer (1), and it is not a single entity, colon and rectal cancers have their own characteristics in terms of genetics, anatomy, treatment methods, and metastasis patterns (2,3). Most CRC tumors are thought to be caused by precancerous changes in the adenoma-cancer pathway (4). Screening and removal of colorectal adenomas in asymptomatic individuals can reduce CRC morbidity and mortality (5).
The relationship between intestinal microecology and the occurrence and development of CRC has attracted more and more attention. First, a large number of population studies have found significant differences in the gut microbiome between people with CRC and healthy people (6)(7)(8). Further cohort studies showed that gut microbiota composition is different at different stages of CRC and that the interaction between intestinal flora gradually complicates with the progression of the disease (6,9,10). These results suggest that changes in bacteria play a driving role in the initiation and progression of CRC. Experimental evidence also supports the role of bacteria in CRC (11)(12)(13)(14). In addition, the gut microbiota can be rapidly altered by diet, and people who eat different diets have significantly different gut microbial compositions, which in turn are associated with different CRC risks. For example, fat consumption and red meat intake are related to the abundance of sulfide bacteria (15).
Most relevant studies were observational, and it was difficult to draw causal conclusions. While animal experiments can verify the specific mechanism by which a small number of bacteria respond to CRC, it is difficult to screen out bacteria with truly causal effects from tens of thousands of bacteria. Mendelian randomization (MR) can use single nucleotide polymorphisms (SNPs) as instrumental variables (IVs) to establish causal relationships between exposure and outcomes (16). The two-sample multivariate MR method based on Bayesian model averaging (MR-BMA) can detect true causal risk factors when candidate risk factors are highly correlated (17). Gut microbial traits are strongly correlated and high-throughput, so MR-BMA method is a suitable method to find microbial risk factors associated with disease. What's more, a non-MR method, Network Module Structure Shift (NetMoss) (18), can identify microbial biomarkers associated with various diseases.
Currently, only a few bacteria have clearly demonstrated a causal relationship with CRC (11)(12)(13)(14). Therefore, univariate MR, MR-BMA, and NetMoss methods were used to identify causal bacteria for different cancer subsites (colorectal, colon, and rectum) and stages (benign tumors and malignant tumors) of CRC. Given that gut microbiota can be rapidly altered by diet, we performed a two-step MR analysis to investigate the causal pathway from dietary habits to CRC by bacteria.

Data sources
We collected GWAS statistics related to gut microbiome from the Netherlands, including 207 microbial taxa (5 phyla, 10 classes, 13 orders, 26 families, 48 genera, and 105 species) and 205 functional pathways (Table 1) (19). The GWAS summary statistics of CRC at different cancer subsites and stages (Table 1) were obtained from the FinnGen biobank (https://r4.finngen.fi/) (20).Colorectal cancer data include: colorectal cancer (CRC, N = 221814); malignant tumors of the colon (CC, N = 220595); malignant tumors of the rectal (CR, N = 219870); benign tumors of the colorectum (BCR, N = 228104); benign tumors of the colon (BC, N = 218792); benign tumors of the rectal (BR, N = 220900) ( Table 1).The different phenotypes of colorectal cancer are defined according to International Classification of Diseases (ICD) codes retrieved from the Finnish National Registry. And GWAS summary statistics for 14 dietary habits and 6 non-CRC diseases were collected from the publicly available GWAS summary statistics database (https://gwas.mrcieu.ac.uk/) (  (19)(20)(21). The above data were mainly used for MR analysis. And we built a multipopulation cohort (Table S1) for further non-MR validation analyses. Details on how the cohort was constructed are provided in the Supplementary Material.

Instruments variable selection
According to the previous studies, SNPs with low significance thresholds have the largest explanatory variance for microbial traits (22, 23), so we set the thresholds (P < 1×10 -5 ) to select the IVs. Palindromic SNPs with non-derived allele frequencies (minor allele frequency (MAF) > 0.3) were excluded. If they were correlated with R > 0.01 within a 10,000 kb window, they were considered SNPs in linkage disequilibrium (LD) and should be excluded. F-statistics were computed to quantify the strength of IVs. IVs with Fstatistics < 10 were considered weak IVs and were excluded. What's more, we searched each SNP in the PhenoScanner GWAS database to detect possible pleiotropy (24).

Univariable MR analysis
Univariate MR was used to assess the causal relationship between gut microbiota and CRC at different cancer subsites and stages and 6 non-CRC diseases. MR analyses were performed and reported in accordance with the STROBE-MR guidelines (25, 26). a list of the STROBE-MR guidelines can be found in Supplementary File. The inverse variance weighted (multiplicative random effects) [IVW(M)] method is mainly used (27). For exposures for which only 1 IV could be identified, the Wald ratio is used to estimate its causal effect (28). We used P < 0.05 as the potential significance threshold. We also derived false discovery rate (FDR)-corrected P-values with the Benjamini-Hochberg (BH) method and used P fdr < 0.2 as the FDRcorrected significance threshold. We used a threshold of P < 5×10 -8 to select other traits-related IVs, and used the same method to explore the causal relationship between diet and other traits.
A series of additional analyses were conducted to assess the reliability of the results. The MR Steiger test was used to estimate the possible direction of causality between microbial traits and outcome. For microbial traits that have a potential causal effect on outcomes, we applied ("coloc") to check whether the variation responsible for influencing these factors was the same variation that influences the outcome (29). If the threshold value of PP.H4 > 0.8, it is considered that MR hypothesis was violated. In addition, Cochran Q statistics were used to assess the global heterogeneity of the selected SNPS. MR-Egger regression was used to capture horizontal pleiotropy (30). Finally, we used MR-PRESSO to detect and correct potential outliers (31).

Potential biomarkers ranking
MR-BMA is a multivariate MR method that prefers causal risk factors from high-dimensional candidate risk factors in a Bayesian framework (17). Like conventional MVMR (32), multiple exposures using overlapping IVs allow adequate handling of "pleiotropism of measurements" (17). We used MR-BMA to rank agnostic causal importance for several microbial markers that had a potential causal relationship with outcomes in univariate MR analysis (P < 0.05). All independent genetic variants strongly associated with any biomarker (P < 1×10 -5 ) were included in the analysis (CRC: 87; CC: 135; CR: 125; BCR: 103; BC: 101; BR: 158). Genetic associations between biomarkers were then examined, and biomarkers with genetic associations greater than 0.985 were deleted. The marginal inclusion probability (MIP) (i.e., the sum of the posterior probabilities of the model in which the risk factor exists) and the model average causal effect (MACE) (representing a conservative estimate of the direct causal effect of the risk factor on the average outcome of these models) of each risk factor are calculated in the model. For all BMA analyses, we set z to 10,000, prior probability to 0.1, and prior variance (s) to 0.5. A sensitivity analysis of this part is provided in the Supplementary Material. Full details of the MR-BMA method can be found elsewhere (17).

Microbial risk factors identification
The gut microbiota showed significant correlations, both phenotypically and genetically ( Figure S1). MR-BMA provides a method that allows multiple microbial traits to be modeled together. This approach allows related microbial traits to be disentangled to identify which may be driving the "true" causal signal over others. MR-BMA can adequately handle the "measured pleiotropy" as well as traditional multivariate MR (17), but compared with traditional MVMR (32), MR-BMA is especially suitable for high-throughput and highly correlated data. We used MR-BMA to identify exposures that were truly causally associated with outcomes from a high dimensional set of related candidate risk factors, 207 the gut microbiota. The analysis method was largely consistent with the potential biomarkers ranking analysis, as detailed in the Supplementary Materials. The top ten bacteria with MIP were interpreted as the strongest "true" causal candidates of all the bacteria provided in the model. In the sensitivity analysis, the pp threshold for Cd to identify strong influence points is shown in Table S2.

NetMoss analysis
The ASV data processing method for multi-population cohorts is described in the Supplementary Material. Subsequently, Sparcc was used to analyze the correlation of ASV-level bacteria to obtain correlation coefficient tables for different phenotypes (healthy, adenoma, and CRC) in each cohort. Using the relative abundance tables and correlation coefficient tables of different phenotypes in different cohorts, we further utilize the Sparcc (33) network modulebased NetMoss2 (18) method to effectively reduce the batch effect while assessing the importance of bacteria between colorectal adenomas and healthy people or CRC and healthy people.

Mediation analysist
Our study used a two-step MR method to assess the mediating role of gut microbial traits in dietary habits affecting CRC at different cancer subsites and stages using data from 14 dietary habits associated with colorectal cancer and 9 microbial markers that were considered to have robust causal relationship with outcome. First, a two-sample MR analysis was used to assess the total effect of dietary habits on the six disease phenotypes. IVs for dietary habits are subsequently used to estimate the causal effect of exposure on potential mediators. If there is evidence that dietary habits affect the intestinal microbiota and that this dietary habit has an impact on the risk of CRC development, we used the "coefficient product" method to estimate the indirect effect of dietary habits on outcomes through gut microbial traits (34), that is, the causal effect value of dietary habits on each microbial trait is multiplied by the causal effect value of each microbial trait on the outcome, so as to obtain the mediating effect of each gut microbial trait. In addition, the indirect effect was divided by the total effect of dietary habits on the outcome to obtain the proportion mediated by each indirect factor. Figure 1D outlines this approach. Finally, the delta method is used to obtain the standard error of indirect effects (35). The study design flow is shown in Figure 1.

Results
Comprehensively evaluating univariate MR, potential biomarkers ranking analysis, and microbial risk factors identification analysis based on the MR-BMA and the NetMoss method, we found 9 causal microbial markers for CRC at different cancer subsites and stages ( Table 2). In addition, mediation analysis identified 10 pathways by which diet regulates gut bacteria to influence disease risk (Table 3).

Univariable MR
After excluding SNPs that did not meet the criteria for IVs, 3310 SNPs were strongly associated with 412 microbial traits (Table S3). To assess the strength of the IV, the F value of each SNP is calculated, and the F statistic of the IV is between 408 and 797, both greater than 10, indicating that there is no weak tool variable bias (Table S4). Using Phenoscanner query, rs2450114 was found to be closely related to BCR and BR (P < 1×10 -5 ), so the SNP was deleted and the MR analysis continued.
Using the IVW(M), Wald ratio, and MR Egger methods, we found a potential causal relationship between 40 microbial traits and CRC, 56 microbial traits and CC, 43 microbial traits and CR, 45 microbial traits and BCR, 40 microbial traits and BC, and 43 microbial traits and BR (P < 0.05), of which a significant causal relationship between 12 microbial traits and CRC, 20 microbial traits and CC, 12 microbial traits and CR, 10 microbial traits and BCR, 9 microbial traits and BC, and 14 microbial traits and BR were found (IVW(M) MR, P fdr < 0.2). And there are differences in gut    (Table S5). Although there are potential biomarkers (e.g., Bifidobacterium adolescentis and Sutterellaceae, outcomes; 8 effect of exposure on mediator; 9 effect of mediator on outcomes; 10 the mediator effect of exposure regulatory mediations to affect outcomes, 10 = b2*b3; 11 = b/b1 the probability of the mediation effect.  ); nsnp, the number of IVs; 95%CI or X95.CI, OR (95% confidence interval); significance, Whether the P-value of the false discovery rate (FDR) correction is less than 0.2 (< 0.2**, > 0.2*); p.adjust, the P-value corrected for the false discovery rate (FDR) was derived using the Benjamini-Hochberg (BH) method.

etc.) that show no correlation in the results of IVW(M), MR-Egger analysis showed a level of pleiotropism between genetic variants of these biomarkers, and the results of MR-Egger and MR-PROSSO
showed a potential causal relationship with the outcome. In addition, MR-Egger analysis showed that there was no horizontal pleiotropy (P > 0.05) for other microbial traits (Table S5) Table S5. What's more, MR Steiger analysis showed a forward causal direction from exposure to outcome (all P < 7×10 -5 , Table S6), and colocation analysis found that the variation in exposure and outcome was not attributable to the same underlying genetic variation (based on PP.H4.abf < 0.8, Table S7), suggesting that the causal regression returns unbiased estimates for the causal effect. We used a two-sample univariate MR of these microbial traits with six non-CRC diseases to explore whether these bacteria are specific biomarkers for colorectal cancer. The results found that Sutterellaceae unclassified was also a significant protective microorganism for non-alcoholic fatty liver disease (OR = 0.811, 95% CI = 0.705~0.934, P = 0.004), and Erysipelotrichaceae significantly reduced the risk of irritable bowel syndrome (OR = 0.897, 95% CI = 0.844~0.954, P = 0.0006).

Potential biomarkers ranking
We used MR-BMA to rank the microbial biomarkers that were nominally significantly associated with outcome in the MR according to their MIP > 0.  (Table S8) (Tables S8, S9). The MACE directions for these biomarkers also exhibited consistency with our MR results. In the preliminary analysis of CC and CR, the detection of the influence point highlighted rs12736307 and rs76321722, respectively, which had a greater impact on the analysis ( Figure S2). The genetic variation in the remaining results was not consistent with the large q-statistic or Cook distance ( Figure S3). Additional details can be found in the Supplementary Material.

Microbial risk factors identification
In this section, we selected the top 10 microorganisms in terms of MIP as the "true" causal risk factors for the outcome. Among them, the results of MIP greater than 0.2 and cross-validation with other analysis results are as follows: For malignancy (Table S10) Figure S4). None of the genetic variants in the remaining results had a large q-statistic or consistent Cd ( Figures S4, S5), and there were no outliers or influential points that needed to be removed. Other outcomes are presented in the Supplementary Material.

NetMoss analysis
Using the NetMoss method, we found that Bacteroides ovatus and Sutterellaceae unclassified were biomarkers that distinguish adenomas from healthy people ( Figure 4A). Erysipelotrichaceae, Sutterellaceae, Streptococcus, Bacteroides ovatus, and Eubacterium siraeum were biomarkers that distinguish CRC from healthy people ( Figure 4D). And these biomarkers in our above method suggest a causal relationship with the development of CRC. Meanwile, we found some microorganisms that did not duplicate the results of MR (Figures 4A-I). Detailed results can be found in the Supplementary Material.

Mediation analysis
Since dietary habits are essential for the prevention and management of CRC, and the intestinal microbiota may be a mediator of the influence of dietary habits on CRC at different cancer subsites and stages. Through the "coefficient product" method, we identified a total of 20 causal pathways in which dietary habits regulate gut bacteria and thus affect the occurrence and development of CRC ( Figure 5 and Table 3), of which the mediating direction of 10 causal pathways is consistent with the direction of dietary habit-outcome. Including: The effect of current drinking status on CC was partially mediated by Eubacterium ventriosum (indirect effects (b) = 4.748, P = 0.002, mediated proportion: 43.044%); Cereal type: biscuit cereals (e.g. Vita) might reduce the risk of CRC by reducing Erysipelotrichaceae abundance (b = -0.417, P = 0.941, mediated proportion: 13.973%); Excessive ferritin intake led to a decrease in the abundance of Streptococcus thermophilus, which in turn resulted in an increased risk of CRC (b = 0.09, P = 0.02, mediated proportion: 34.22%); Bacteroides ovatus mediated the effects of minerals and other dietary supplements: calcium (b = 0.327, P = 0.053, mediated proportion: 11.854%) and bread intake (b = 0.068, P = 0.041, mediated proportion: 12.639%). Finally, the indirect effect of never/rarely drinking milk on BCR was estimated by Bifidobacterium adolescentis, and it was found that the mediating effect of Bifidobacterium adolescentis was 23.453, and the mediated proportion was 435.399%.

Discussion
CRC has a high mortality rate when detected at an advanced stage, so understanding the causes of CRC at different cancer subsites and stages and identifying its risk factors are important for early screening and prevention of CRC. In this study, a variety of methods were used to systematically investigate microbial biomarkers of CRC at different cancer subsites and stages in European populations. The results showed that in the MR-based univariate, potential biomarkers ranking, microbial risk factors identification, and NetMoss method, 3 or 4 methods consistently found that 9 bacteria were closely related to the development of CRC. In the mediation analysis of diet-gut microbiota-disease, 10 diet-gut bacteria-CRC causal pathways were found.
Although two MR studies have examined the causal relationship between gut flora and CRC (36, 37), compared to these two MR analyses, the present study more comprehensively examined the causal relationship between gut microbes and CRC, and multiple validation analyses and extended analyses were performed, including: 1) Instead of simply exploring the causal relationship between gut flora and CRC, this study also examined the relationship between gut flora and CRC at different cancer subsites and stages; 2) the use of univariate MR and MR-BMA analyses to identify and rank causal microbial markers of CRC at different cancer subsites and stages, and the use of multi-population cohort data for validation on the other hand; 3) after finding robust causal microbes, this study investigates dietary habits that influence these microbes, providing theoretical support for the dietary habitsgut flora-colorectal cancer pathway; 4) this study conducts corresponding analyses on smaller bacterial taxa species.
A large number of studies have reported the correlation between gut microbiology and CRC, and intestinal microbiota is believed to be directly involved in CRC (6,9,10,38,39). A multicohort study found that Streptococcus thermophilus was dramatically reduced in stool samples from patients with CRC (10). Streptococcus thermophilu and a commercial probiotic, Lactobacillus rhamnosus GG, have similar in vitro probiotic properties as well as anticancer activity and folate production (40). Intriguingly, a recent experiment combining cells with mice further found that Streptococcus thermophilus was a novel preventive measure for CRC prevention in mice (39). Streptococcus thermophilus could secrete b-galactosidase to inhibit cell proliferation, reduce colony formation, induce cell cycle arrest, promote apoptosis of cultured CRC cells, and delay the growth of CRC xenografts. And Streptococcus thermophilus can increase the gut abundance of known probiotics, including Bifidobacterium and Lactobacillus, through b-galactosidase (39). These conclusions are consistent with our findings that Streptococcus thermophilus is a potential probiotic for CRC. This could further demonstrate the robustness of our results.
This study also found that Eubacterium ventriosum and Streptococcus are CC protecting microorganisms, and Erysipelotrichaceae, Coprococcus sp_ART55_1 and Eubacterium siraeum are protective factors for CR. Previous studies have found that the abundance of Streptococcus and Erysipelotrichaceae is significantly higher in adjacent tissues than in tumors (38,41), and that abundance in the oral cavity is associated with a reduced risk of CRC (42). But there are also studies showing that Streptococcus and Erysipelotrichaceae are more common in patients with advanced colorectal adenomas or CRC (43, 44). Erysipelotrichaceae was significantly higher in the tumor group of 1,2-dimethylhydrazine-induced colon cancer animal models (45). Butyrate inhibits the development of CRC, and a significant decrease in butyrate-producing bacteria in the intestine, including Eubacterium (Eubacterium siraeum and Eubacterium ventriosum), is generally observed in CRC patients (46). The abundance of Eubacterium spp. was lower in the advanced colorectal adenoma group than in the healthy control group (44). At present, there are relatively few studies on these bacteria and colorectal cancer proposed in this study, and more studies and experiments are needed to verify these results and explore the specific mechanisms.
Understanding the biology of colorectal adenomas can lead to new strategies to screen for and reduce or slow the progression of these CRC precursors (4). Previous studies have found that Bacteroides in familial adenoma polyp mice are enriched compared to wild-type mice (47). Bacteroides are enriched in the Diet-gut bacteria-colorectal cancer pathway diagram in mediated analysis. intestinal type of patients with adenomas, and Bacteroides ovatus increases significantly when progressing from advanced adenomas to cancer (6). The accumulation of Bacteroides ovatus in peripheral blood drives the proliferation of CD27-MAIT cells that produce IL-17, a pro-inflammatory factor (48). But some studies have also shown that Bacteroides ovatus is a probiotic. Bacteroides ovatus promotes IL-22 production and reduces trinitrobenzenesulfonic acid-driven colonic inflammation (49). Since Bacteroides ovatus has the capacity to reestablish the equilibrium between lymphocytes and macrophages, its absence could throw the body's natural immune system off balance, cause inflammation, and lead to the death of intestinal epithelial cells (50). The results of our study suggest that Bacteroides ovatus is a new risk biomarker for early CRC, and since this relationship has not been fully established in prior studies, more research is required to validate and advance early CRC prevention and treatment. Diet is one of the most important and modifiable variables influencing the gut microbiome. Our findings and those of earlier research indicate that regular alcohol use alters the gut microbiota, which raises the chance of colon cancer. First, research on both humans and animals have demonstrated that long-term ethanol consumption causes dysbiosis, which lowers the abundance of Firmicutes and Bacteroidetes while increasing the abundance of butyrate-producing taxa in Clostridiales (51). In chronic alcoholism, the number of anaerobic bacteria decreases and the number of Streptococcus increases (52, 53). Intriguingly, our research revealed that alcohol consumption decreases the amount of the anaerobic bacterium Eubacterium ventriosum, which raises the risk of colon cancer, and that Coprococcus and Eubacterium ventriosum can work together to create more butyric acid. This is in line with the findings of earlier studies, which suggest that alcohol consumption regulates a causal pathway in which Eubacterium ventriosum increases the risk of CRC and that further mechanistic studies are required to confirm this causal pathway.
In addition, there is strong evidence that consuming more dairy and milk reduces the risk of CRC (51). Other ingredients in dairy products also have antitumor activity, including conjugated linoleic acid, lactose, butyrate, and lactic acid-producing bacteria, and a recent intervention study in patients with irritable bowel syndrome, a pathology associated with inflammation and CRC, showed that consumption of fermented dairy products containing dairy starter cultures and Bifidobacterium animalis enhanced SCFA production and reduced the abundance of Bilophila wadsworthia, The influence of microorganisms and other compounds in dairy products on the composition and function of gut microbes has been shown (54). Our study came to a similar conclusion: never drinking milk increases the abundance of Bifidobacterium adolescentis, which leads to an increased probability of BCR.
Bacteroides is an obligate or strictly Gram-negative anaerobic bacteria, and its composition and metabolic activity are largely regulated by diet. Bacteroides are associated with high fat and protein intake. Xylan-regulated human keratinocyte growth factor-2 is delivered to the inflammatory colon via Bacteroides ovatus (55). Our results also found three dietary factors: Vitamin and mineral supplements: multivitamin +/-minerals, minerals and other dietary supplements: Calcium and bread intake can increase the risk of BCR by regulating the abundance of Bacteroides ovatus.
This study has several advantages. 1) A comprehensive assessment of the causal relationship between gut microbial traits and CRC at different cancer subsites and stages, and the results of multiple methods consistently demonstrate the robustness of our findings. Most of our results support the findings of previous studies that served as positive controls for our approach. Our study allowed to generate hypotheses proposing some biomarkers for which there was little previous evidence of causal association, such as Eubacterium ventriosum and Bacteroides ovatus, among others. 2) To further elucidate how some dietary factors may influence CRC risk in different cancer subsites and stages by modulating the gut microbiota. 3) Because microbial traits show strong correlations both phenotypically and genetically, MVMR analysis may be more appropriate to assess the causal relationship between microbes and phenotype. However, because microbial trait data are highthroughput, and traditional MVMR methods are designed for a small number of risk factors, they cannot be extended to highthroughput dimensions. Therefore, this study is the first of its kind to use the MR-BMA method to go for causal ranking of microbial markers to select the truly likely risk factors from a large number of candidate risk factors, which in turn enhances the robustness of our findings.MR-BMA (a method capable of explaining the multieffectiveness of measurements) largely confirms univariate findings. Among other things, the MR-BMA approach proposes multivariate models of combinations of microorganisms that can be used to evaluate the role of microbial combinations for disease and applied to the early screening of benign tumors.
The present study also has some limitations. 1) GWAS of intestinal flora is still in its infancy in terms of sample size, the population samples of intestinal flora we use are not large enough and the loci identified so far are still very limited. 2) The threshold for our screening gut microbial instrumental variables was set at P < 1 × 10 -5 , and although steps have been taken to ensure by calculating the F statistic for each instrument validity of the SNP, we cannot exclude the possibility of false negative errors due to insufficient statistical efficacy. The efficacy of IVs is also a significant drawback in MR-BMA analysis. 3) Our understanding of the microbiomes of different cancer subsites (e.g. colon, rectum or colorectum) is still limited, and easily accessible fecal material may reflect a suitable substitute for colorectal microbiota, however, there may be errors in going from fecal microbes to causal inquiry, but this is a limitation of current data, and hopefully more studies will be available in the future to fill this gap.
In conclusion, this study conducted a comprehensive exploratory MR study that identified 6 protective bacteria for malignancies, 2 risk bacteria and a protective bacterium for benign tumors. The findings support the hypothesis that the gut microbiota is the etiology of CRC and that the effects on CRC are different for different cancer subsites and stages, suggesting that microorganisms are specific for the prevention, treatment, and improvement of CRC. Among them, the protective effect of Streptococcus thermophilus on CRC has been verified by cellular and animal experiments. At the same time, the results of the mediation analysis provide both theoretical support and empirical evidence for modifying dietary habits to regulate gut bacteria and thus influence CRC at different cancer subsites and stages, suggesting that controlling gut flora may be a promising strategy for colorectal cancer prevention in specific dietary populations. In addition, these findings can provide ideas and directions for further mechanistic studies such as animal models or biomarker-based human trials to help guide the development and clinical translation of potential microbiota-based cancer prevention strategies.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Author contributions
LZ and GZ designed the research. HL and DS collected the data and analyzed it. HL and CJ performed the literature search and drafted the article. LZ and GZ supervised the study. All authors contributed to the article and approved the submitted version.

Funding
The study was supported by National Natural Science Foundation of China (no.82172320), TaiShan Industrial Experts Program (no. tscy20190612), TaiShan Scholars Program of Shandong Province (no. tshw20120206), Shandong University Outstanding Young Scholars Program (to LZ).