Tumor microbiome analysis provides prognostic value for patients with stage III colorectal cancer

Introduction Although patients with colorectal cancer (CRC) can receive optimal treatment, the risk of recurrence remains. This study aimed to evaluate whether the tumor microbiome can be a predictor of recurrence in patients with stage III CRC. Methods Using 16S rRNA gene sequencing, we analyzed the microbiomes of tumor and adjacent tissues acquired during surgery in 65 patients with stage III CRC and evaluated the correlation of the tissue microbiome with CRC recurrence. Additionally, the tumor tissue microbiome data of 71 patients with stage III CRC from another center were used as a validation set. Results The microbial diversity and abundance significantly differed between tumor and adjacent tissues. In particular, Streptococcus and Gemella were more abundant in tumor tissue samples than in adjacent tissue samples. The microbial diversity and abundance in tumor and adjacent tissues did not differ according to the presence of recurrence, except for one genus in the validation set. Logistic regression analysis revealed that a recurrence prediction model including tumor tissue microbiome data had a better prediction performance than clinical factors (area under the curve [AUC] 0.846 vs. 0.679, p = 0.009), regardless of sex (male patients: AUC 0.943 vs. 0.818, p = 0.043; female patients: AUC 0.885 vs. 0.590, p = 0.017). When this prediction model was applied to the validation set, it had a higher AUC value than clinical factors in female patients. Conclusion Our results suggest that the tumor microbiome of patients with CRC be a potential predictor of postoperative disease recurrence.


Introduction
Colorectal cancer (CRC) is the second leading cause of cancer deaths worldwide.Compared with the 2020 estimates, the global burden of CRC is predicted to increase by 63% in 2040 (1).Moreover, the incidence of early-onset CRC (before age 50 years) is increasing in high-income countries (2).For resectable nonmetastatic CRC, colectomy with en bloc removal of regional lymph nodes is the preferred treatment; however, several studies reported that approximately 25-30% of patients with stage III CRC experienced disease recurrence within the first 5 years after surgery (3)(4)(5)(6).In addition, although adjuvant chemotherapy has demonstrated benefits in patients with stage III CRC, it can reduce the risk of recurrence by only approximately 30% (7,8).The mortality rates for CRC are consistently higher in men compared to women across different regions worldwide, with men having a mortality rate approximately 25% higher than women (9).Several retrospective studies have shown that female CRC patients typically have longer survival rates than males (10)(11)(12).However, some studies have failed to find any survival benefit for women (13).Several prognostic factors for CRC recurrence have been recognized, including a poorly differentiated histology, greater tumor depth, higher number of positive lymph nodes, lymphovascular invasion, perineural invasion, and tumor budding (14)(15)(16)(17).In contrast, high microsatellite instability (MSI) and abundant tumor-infiltrating T-cells have been associated with a favorable prognosis in patients with CRC (18)(19)(20).Recently, the detection of circulating tumor DNA after surgery has been suggested as a predictor of a high risk of recurrence (21,22).Nevertheless, a more precise prediction of the risk of CRC recurrence after surgery is still required in clinical practice.
Emerging evidence has demonstrated the microbial composition and ecological changes in patients with CRC and the roles of several bacteria in colorectal carcinogenesis and treatment (23).The gut microbiome, which includes Faecalibacterium, Akkermansia, and Bifidobacterium species, is expected to play an important role in mediating the outcomes of chemotherapy and immunotherapy in patients with melanoma and lung cancer, as it affects immune system activation and tumor responses to treatment (24)(25)(26).In particular, the presence of abundant Fusobacterium nucleatum (F.nucleatum) DNA in tissues has been associated with worse clinical outcomes in patients with CRC (27).One study of patients with pancreatic cancer demonstrated that the diversity and composition of the tumor microbiome are important determinants of long-term survival (28).A recent study of patients with CRC showed that two pathogenic bacteria, F. nucleatum and Bacteroides fragilis (B.fragilis), were more abundant in patients without recurrence than in those with recurrence (29).However, the association between the tumor microbiome and clinical outcomes in patients with CRC remains unclear.
We designed this study to investigate the potential role of the tumor microbiome in predicting postoperative recurrence in patients with stage III CRC.To verify the results, we also analyzed the tumor microbiome data of patients with stage III CRC from another center.

Patients and sample collection
Two pairs of tumor tissues and adjacent normal-appearing mucosal tissues (hereinafter "adjacent tissues") from patients with CRC who underwent colorectal resection at Kosin University Gospel Hospital (Busan, Republic of Korea) were previously collected and stored immediately in a deep freezer (−80°C).From these samples, we selected and analyzed tumor and adjacent tissues from patients with stage III CRC who underwent adjuvant chemotherapy.Patients with pathological stage I or II CRC who had clinical stage III disease before surgery, those with < 3 months of adjuvant chemotherapy, and those with < 24 months of follow-up were excluded from the analysis.Further, tumor tissue samples from patients with stage III CRC who underwent surgery and adjuvant chemotherapy at Yonsei University Severance Hospital (Seoul, Republic of Korea) were used as a validation set.Detailed clinical data, such as age, sex, height, weight, ABO blood type, history of smoking and alcohol drinking, family history of CRC, comorbid diseases, tumor location, histology, lymphovascular invasion, perineural invasion, Kirsten rat sarcoma viral oncogene homolog (KRAS) mutation, MSI status, T stage, N stage, and laboratory findings (including carcinoembryonic antigen [CEA] level), were assessed.The study protocol was reviewed and approved by the institutional review board of Kosin University Gospel Hospital (approval no.KUGH 2021-01-028).

Adjuvant chemotherapy and definition of recurrence
Patients with CRC who underwent colorectal resection received were given either FOLFOX, CAPEOX, or FL as adjuvant chemotherapy for a duration of 6 months.The FOLFOX regimen includes intravenous administration of oxaliplatin 85 mg/m 2 , leucovorin 400 mg/m 2 , and a bolus of 5-fluorouracil 400 mg/m 2 on day 1.This is followed by a continuous infusion of 5fluorouracil 1200 mg/m 2 /day for 2 days.The treatment cycle is repeated every 2 weeks.The CAPEOX regimen includes intravenous administration of oxaliplatin 130 mg/m 2 on day 1 and oral administration of capecitabine 1000 mg/m 2 twice a day for 14 days.The treatment cycle is repeated every 3 weeks.The FL regimen consists of intravenous administration of leucovorin 400 mg/m 2 , and a bolus of 5-fluorouracil 400 mg/m 2 on day 1.This is followed by a continuous infusion of 5-fluorouracil 1200 mg/m 2 / day for 2 days.
The recurrence of CRC was diagnosed on endoscopic biopsy, surgical resection, and/or radiological imaging study.In this study, we defined recurrence as both locoregional and distant recurrence.Locoregional recurrence was defined as a recurrence at the site of original surgical resection or at the draining lymph nodes.Distant recurrence was defined as a recurrence of CRC developing spread to distant sites including the liver, lung, peritoneum, ovaries, adrenal glands, bone, and brain.

DNA extraction and bacterial 16S rRNA sequencing
The samples collected at Kosin University Gospel Hospital were transported to Hecto Healthcare Co., Ltd.(Seoul, Korea) and immediately frozen at −80°C.Microbial DNA was extracted using the Maxwell ® RSC PureFood GMO and Authentication Kit (Promega, Madison, WI, USA) according to the manufacturer's instructions.To determine DNA concentrations, we used an ultraviolet-visible spectrophotometer (NanoDrop 2000c; Thermo Fisher Scientific, Waltham, MA, USA).QuantiFluor ® ONE dsDNA System (Promega) was used for quantification.The DNA samples were stored at −20°C until required for experiments.A sequencing library was prepared according to the Illumina 16S Metagenomic Sequencing Library Preparation Guide (Illumina, San Diego, CA, USA).The V3-V4 region of the bacterial 16S rRNA gene was amplified using primer sets F319 (5′-TCGTCGGCAGCGT-CAGATGTGTATAAGAGA CAGCCTACGG-GNGGCWGCAG-3′) and R806 (5′-GTCTCGT GGGCTCGGAGATGTGTATAAGAGAC-AGGACTACHVGGG TATC-TAATCC-3′).The amplified products were purified using Agencourt ® AMPure XP beads (Beckman Coulter, Brea, CA, USA), and the quality of the library was confirmed using the Bioanalyzer 2100 system (Agilent, Santa Clara, CA, USA).The pooled libraries were sequenced with 300-bp paired-end reads on the MiSeq platform using the MiSeq version 3 Reagent Kit (Illumina).To prevent contamination, all experimental procedures were conducted inside a biosafety cabinet (BSC).DNA extraction was performed using sterile disposable Petri dishes and surgical blades to cut the sample into appropriate sizes while it was still frozen on dry ice.During the analysis stage, library pooling was performed by mixing Phix control at a 30% ratio with filtered real sequences used as raw data.The resulting data was then subjected to quality filtering, denoising, and sequencing error removal using QIIME2 software before proceeding with further analysis.

Data analysis and statistical analysis
Raw sequencing data were processed using the Quantitative Insight into Microbial Ecology software package 2 (QIIME 2, version 2021.4;http://qiime2.org).Denoising was performed using the Deblur algorithm, and a taxonomy table was created using the SILVA database (version 138).The non-archaeal/bacterial sequences were removed according to the taxonomic classification results.FASTQ reads were filtered, trimmed, and merged in DADA2 to generate a table of amplicon sequence variants.Taxonomy was assigned to the amplicon sequence variants using a naive Bayes classifier and compared to the SILVA version 138.99 reference database.Alpha diversity was assessed using the Shannon index, Chao1 index, Simpson index, and observed operational taxonomic units, whereas beta diversity was evaluated using principal coordinate analysis based on the Bray-Curtis distance.These analyses were performed using QIIME 2 and R (version 4.1.3;R Foundation for Statistical Computing, Vienna, Austria).To compare the taxa, we selected only those with a mean relative abundance greater than or equal to 1%.Data visualization was performed using the ggplot2 package in R, and statistical analysis was conducted using the Wilcoxon signed rank test and PERMANOVA from the vegan package.Linear discriminant effect size analysis was performed using the online platform, Galaxy (https://huttenhower.sph.harvard.edu/galaxy).
The patients' demographic and clinical data were compared using Student's t-test and Fisher's exact test.Continuous data with a normal distribution are expressed as mean ± standard deviation, and categorical data are presented as numbers (percentage).The Wilcoxon signed-rank test was used to compare microbial abundance between tumor and adjacent tissues, as well as according to the presence of recurrence.Logistic regression analysis was performed to evaluate factors predicting disease recurrence.The 'glm' function in R was used to fit a logistic regression model to our data, including predictors such as clinical variables and microbiome to predict the binary outcome variable of recurrence.The 'step' function was then used to perform backward selection and select the final model.Receiver operating characteristic (ROC) and area under the curve (AUC) analyses were performed to estimate the thresholds of variables.A random forest model was used to assess the mean decrease in the Gini coefficient.To control for the false discovery rate (FDR), statistical significance was determined using the Benjamini-Hochberg procedure with a threshold of FDR-adjusted p value < 0.05.All statistical analyses were performed using R.

Baseline characteristics and evaluation of clinical variables affecting recurrence
Patients with stage III CRC who underwent surgery followed by adjuvant chemotherapy at Kosin University Gospel Hospital (65 patients, discovery set) and Yonsei University Severance Hospital (71 patients, validation set) were enrolled in this study.The baseline characteristics of the patients are summarized in Table 1.The mean age in the discovery set was younger than that in the validation set (60.0 ± 9.3 vs. 64.7 ± 11.4 years, p = 0.010).Additionally, the discovery set had a higher prevalence of current smokers (27.7% vs. 9.9%, p = 0.027) and lymphovascular invasion (63.1% vs. 36.6%,p = 0.004) compared to the validation set.In the discovery set, 60 patients (92.3%) received FOLFOX and 5 patients (7.7%) received CAPEOX.In the validation set, 59 patients (83.1%) received FOLFOX, 8 patients (11.3%) received CAPEOX, and 4 patients (5.6%) received FL.All of the patients received treatment for a minimum of 5 months or more.
We compared the clinical variables according to the presence of recurrence, and no differences were observed in all factors, including tumor location, histology, lymphovascular invasion, perineural invasion, KRAS mutation, MSI status, T stage, N stage, and laboratory findings (Table 2).We evaluated clinical factors as predictors of tumor recurrence; however, none of the factors were found to be significant (Figure 1).

Microbiome differences between adjacent and tumor tissues
On the basis of previous results (27, 28), we hypothesized that the tissue microbiome of patients with CRC could be a predictor of tumor recurrence after surgery.We focused on the individual differences in the microbiome and attempted to evaluate the possibility that the tissue microbiome can predict recurrence in patients with stage III CRC who underwent surgery and adjuvant chemotherapy.We compared the microbiome differences between adjacent and tumor tissues in patients in the discovery set.Alpha diversity was not different but beta diversity was significantly different between the two tissues, and the taxonomic composition showed differences at the phylum, genus, and species levels (Figure S1).Microbial abundance was remarkably different between adjacent and tumor tissues.At the phylum level, Fusobacteriota, Verrucomicrobiota, and Bacteroidota were more abundant in tumor tissue samples (Figure 2A).At the genus level, Streptococcus and Gemella were more abundant in tumor tissue samples  (Figure 2B).In contrast, the phyla Firmicutes, Proteobacteria, and Actinobacteriota (Figure 2A), and the genera Parabacteroides, Faecalibacterium, and Parasutterella were more abundant in adjacent tissue samples (Figure 2B).Further, linear discriminant effect size analysis confirmed that the microbial abundance in adjacent tissues was distinct from that in tumor tissues (Figure 2C).

Microbiome differences according to the presence of recurrence
Figure S2 displays the Kaplan-Meier curves for overall survival and disease-free survival differences between the discovery set and validation set.We assessed differences in the tissue microbiome according to the presence of recurrence.As shown in Figure S3, the taxonomic composition of the tissue microbiome was not different at the phylum, genus, and species levels between patients with and without recurrence in both the discovery and validation sets.In the discovery set, alpha diversity, beta diversity, and microbial abundance at the phylum and genus levels in adjacent and tumor tissues were not significantly different according to the presence of recurrence (Figure 3).Similar results were obtained when the data were divided into male and female groups (Figure S4).In the validation set, alpha and beta diversities did not differ according to the presence of recurrence, and microbial abundance at the phylum and genus levels were also not different, except for the genus Prevotella (Figure 4).Similar results were obtained when the data were divided into male and female groups; however, Prevotella was more abundant in tumor tissue samples from male patients without recurrence (Figure S5).

Generation and validation of a prediction model for CRC recurrence
Although we found no significant differences in tissue microbial diversity and abundance between patients with and without recurrence, we attempted to generate a recurrence prediction model including microbiome data using logistic regression analysis in the discovery set.When the analysis was performed by combining clinical factors (age, CEA level, histology, lymphovascular invasion, perineural invasion, stage T, and stage N) and tumor microbiome data (selecting only the genera with a relative abundance greater than or equal to 1%), we found that CEA level, T stage, and perineural invasion (among clinical factors), as well as the tumor tissue microbiome (including Gemella, Parabacteroides, Parasutterella, and Prevotella) were significant.We obtained the following estimation formula for the prediction model (see Supplementary Data):   We applied the prediction model in generating the ROC curve and compared it to clinical factors (combination of CEA level, T stage, and perineural invasion) without microbiome.The AUC value of this model was 0.846 (95% confidence interval [CI], 0.754-0.938) in the total patients, and a good AUC value was obtained in both male and female patients (Figure 5).When compared with the ROC curve of clinical factors without microbiome, the prediction model showed a significantly better AUC value than clinical factors in the total patients (0.846 vs. 0.679, p = 0.009) (Figure 5A), regardless of sex (0.943-0.818, p = 0.043 in male; 0.885 vs. 0.590, p = 0.017 in female) (Figures 5C, D).In the random forest model analysis, Gemella, Parabacteroides, and Prevotella had a mean decrease in the Gini coefficient of > 3.0 (Figure 5B).
When the prediction model was applied to the validation set, it showed an AUC value of 0.740 (95% CI, 0.606-0.873),which was not better than the AUC value of clinical factors without microbiome in the analysis of the total patients (Figure 6A).However, the prediction Microbial abundance between adjacent and tumor tissues in patients in the discovery set.(A) Phylum level.(B) Genus level.(C) Linear discriminant analysis effect size.ns, non-significant; LDA, linear discriminant analysis.'*', p < 0.05; '**', p < 0.01; '***', p < 0.001; '****', p < 0.0001.model showed a better AUC value than clinical factors in female patients (0.858 vs. 0.624, p = 0.022) (Figure 6D), but not in male patients (Figure 6C).In the random forest model analysis of the validation set, Faecalibacterium, Prevotella and Gemella had a mean decrease in the Gini coefficient of > 3.0 (Figure 6B).

Discussion
In the present study, we assessed a model combining clinical factors and tumor tissue microbiome data for predicting recurrence in patients with stage III CRC.This model showed better AUC values than clinical factors.Our data suggest that analysis of the tumor tissue microbiome combined with clinical factors may help predict recurrence in patients with CRC.
Recent studies have identified Fusobacterium, Bacteroides, Peptostreptococcus, Gemella, and Parvimonas as genera that are potentially associated with CRC, and emerging evidence has demonstrated their oncogenic functions; however, inter-individual variations in tumor-associated mucosal microbiome remain a barrier to elucidating the role of the microbiome in colorectal tumorigenesis.Concerning intra-individual variations in microbial patterns, several studies have shown that the microbiome structure of cancerous tissues significantly differs from that of the intestinal lumen, and that the microbiome of CRC tissues remarkably differs from that of adjacent tissues (30)(31)(32).Consistent with previous studies, our study showed significant differences in the beta diversity and abundance of microbiome between tumor and adjacent tissues.In particular, Streptococcus and Gemella were more abundant in tumor tissue samples than in adjacent tissue samples.An analysis of paired samples of CRC-adjacent mucosa and colonic mucosa from healthy controls showed differences in microbial community configurations (33).These results suggest that the microbial communities in the colorectal mucosa show distinct alterations according to the stage of colorectal carcinogenesis.
The observed association between the gut microbiome and clinical outcomes has raised the possibility that bacteria can serve as prognostic markers.Several studies reported that increased abundance of F. nucleatum and B. fragilis was associated with poor clinical outcomes and late-stage CRC (34,35).In a recent study investigating the profiles of the gut mucosal microbiome in patients with CRC recurrence, a total of 17 bacteria were suggested as potential biomarkers for CRC recurrence and patient prognosis (36).In addition, the persistence of F. nucleatum after neoadjuvant chemoradiotherapy in patients with locally advanced rectal cancer was found to be correlated with high relapse rates (37).In the present study, we assessed microbial differences according to the presence of recurrence, and found no significant differences in microbial diversity and the abundance of each microbial group between patients with and without recurrence, except for one genus in the validation set.This lack of difference may be explained by the possibility that a network of numerous microbiomes, rather than the presence of a characteristic microbiome in tumor tissues, contributes to the development of recurrence.
We generated a prediction model for CRC recurrence by combining clinical factors and tumor tissue microbiome data.The model finally included several genera, such as Gemella, Parabacteroides, Parasutterella, and Prevotella.This prediction model had a good AUC value in patients with CRC regardless of sex and showed significantly better performance in predicting recurrence than the clinical factors.These results suggest that gut microbiome assessment has a potential role in predicting CRC recurrence; however, further studies with larger sample sizes are needed.
Adjuvant chemotherapy has demonstrated benefits in patients with stage III CRC, it can reduce the risk of recurrence by approximately 30% (7,8).According to the NCCN guidelines, for low-risk (T1-3, N1) stage III CRC patients, CAPEOX (3 months) or FOLFOX (3-6 months), as well as other options like capecitabine (6 months) or 5-FU (6 months), are recommended.On the other hand, for high-risk (T4, N1-2; any T, N2) stage III CRC patients, the recommended options include CAPEOX (3-6 months) or FOLFOX (6 months), as well as other options like capecitabine (6 months) or 5-FU (6 months) (38).Liquid biopsy is a promising alternative strategy for directly evaluating circulating tumor DNA (ctDNA) from the blood.It aims to detect evidence of minimal residual disease, which could potentially be the source of a later clinical recurrence.Recently, in a study of 455 stage II CRC patients, ctDNA-guided management led to a reduced rate of adjuvant chemotherapy usage, and ctDNA-positive patients who received adjuvant chemotherapy exhibited a three-year recurrence-free survival of 86.4% (39).Although further research is needed, the combined analysis of liquid biopsy and tumor microbiome has the potential to offer more promising insights into predicting patient prognosis and determining the need for additional chemotherapy after surgery in stage III CRC patients.
The strength of our study is that the results obtained by analyzing tumor and adjacent tissue samples from one center were validated by comparing them with tumor tissue data from another center.However, our study had several limitations.First, the tumor tissue samples from the two centers were collected at different times and stored in different locations, which may have introduced heterogeneity in the results.Second, we could not compare the microbiomes of adjacent tissues in the validation set because no adjacent tissue data were collected from the other center.Third, the prediction model generated using the discovery set did not show a better AUC value than the clinical factors for the total patients and male patients in the validation set.We believe that this was due to data heterogeneity and the small number of samples.Fourth, the study's sample size was small, which could reduce the reliability of our results.To overcome these limitations, further well-designed studies with larger sample sizes are needed.In summary, we conducted a comprehensive investigation of the differences in microbial diversity and abundance between tumor and adjacent tissues, as well as their association with recurrence in CRC patients.Additionally, we developed a prediction model using tissue microbiome data to forecast postoperative recurrence.While the predictive performance of our model, measured by AUC values, did not surpass that of the clinical factors alone in the validation set, we did observe a relatively higher AUC value for the new model using microbiome data in female patients.However, we acknowledge the need for further research to explore potential gender-based differences in the microbiome profile's predictive capacity for CRC recurrence.Therefore, the approach for the generalization of these findings should proceed with caution, and we refrain from unequivocally concluding that the tumor microbiome can predict postoperative disease recurrence in all patients.Nevertheless, we believe that our study contributes to emphasizing the importance of the tissue microbiome in diagnosing and predicting the recurrence of CRC.
The studies were conducted in accordance with the local legislation and institutional requirements.The participants provided their written informed consent to participate in this study.

3
FIGURE 3 Microbial diversity and abundance in adjacent and tumor tissues according to the presence of recurrence in the discovery set.(A) Alpha diversity.(B) Beta diversity.(C) Phylum level.(D) Genus level.OTUs, operational taxonomy units; PCoA, principal coordinate analysis; ns, non-significant.

5
FIGURE 5 Receiver operating characteristic (ROC) curve and random forest model analyses in the discovery set.(A) ROC curves of the prediction model and the clinical factors in the total patients.(B) Random forest model evaluating tissue microbiomes.(C) ROC curves of the prediction model and the clinical factors in male patients.(D) ROC curves of the prediction model and the clinical factors in female patients.† Includes clinical factors (CEA level, T stage, and perineural invasion) and tumor tissue microbiome (Gemella, Parabacteroides, Parasutterella, and Prevotella).¶ Includes CEA level, T stage, and perineural invasion.AUC, area under the curve; RFM, random forest model.

TABLE 2
Comparison between patients with and without recurrence.