- 1First Clinical Medical College, Shanxi Medical University, Taiyuan, China
- 2Microbiological Laboratory of Ophthalmology, Shanxi Eye Hospital, Taiyuan, China
- 3Fifth Clinical Medical College, Shanxi Medical University, Taiyuan, China
- 4Assisted Reproduction Center, First Hospital of Shanxi Medical University, Taiyuan, China
- 5Department of Hepatobiliary Surgery and Liver Transplantation Center, First Hospital of Shanxi Medical University, Taiyuan, China
- 6Shanxi Provincial Key Laboratory for Digestive Diseases and Organ Transplantation, First Hospital of Shanxi Medical University, Taiyuan, China
- 7Department of Biliopancreatic Surgery, First Hospital of Shanxi Medical University, Taiyuan, China
Background: Hepatocellular carcinoma (HCC) is a prevalent and lethal malignancy worldwide. Gut microbiota play crucial roles in liver disease progression and may offer noninvasive diagnostic value, yet microbial signatures specific to advanced HCC remain unclear.
Methods: Seventy-six participants, including early-stage HCC (HCC12), advanced HCC (HCC34), liver cirrhosis (LC), and healthy controls (CG), were prospectively enrolled. Fecal samples underwent 16S rRNA sequencing to characterize microbial diversity and community composition. Differential taxa were identified using Kruskal–Wallis tests, linear discriminant analysis effect size (LEfSe), and zero-inflated negative binomial regression (ZINB). Machine learning models were constructed using clinical features, representative microbiota, and their combination. External validation was performed using 74 published HCC cases.
Results: Advanced HCC exhibited reduced microbial richness and diversity, accompanied by substantial community structure alterations. Enterococcus, Enterococcaceae, Enterobacteriaceae, and Escherichia–Shigella were enriched in HCC34, whereas Ruminococcus and Blautia were depleted. These taxa correlated strongly with liver injury markers and HCC-specific biomarkers. The extreme gradient boosting model showed high diagnostic potential when using either clinical or microbial features alone, while the combined model achieved improved accuracy (AUC = 1.0 in the primary test set). External validation supported the good generalizability of the model (AUC = 1.0 in the external cohort). Feature importance analysis identified Enterococcus as the most influential discriminator of advanced HCC.
Conclusion: This study reveals distinct gut microbial signatures associated with advanced HCC and suggests that Enterococcus may serve as a potentially important microbial marker linked to disease severity. Integrating gut microbiota profiling with clinical features may offer a promising noninvasive strategy for the accurate identification of advanced HCC and provides hypothesis-generating insights for microbiome-based therapeutic interventions.
1 Introduction
According to the latest data from the International Agency for Research on Cancer (GLOBOCAN 2022), liver and intrahepatic bile duct malignancies rank sixth in global cancer incidence, with approximately 866,000 new cases each year and an age-standardized incidence rate of 8.6 per 100,000 individuals. Liver cancer accounts for an estimated 759,000 deaths annually, making it the third leading cause of cancer-related mortality worldwide (Bray et al., 2024). Hepatocellular carcinoma (HCC) is the most common type of liver cancer, representing nearly 80% of all cases (Rumgay et al., 2022). China has the highest global burden of HCC, with both incidence and mortality accounting for nearly half of the worldwide total. Due to its insidious clinical presentation and the lack of precise biomarkers, many patients are diagnosed at an advanced stage. Currently, various therapeutic options are available for HCC at different stages, including surgical resection, local ablation, locoregional interventions, and systemic therapy. Selecting an individualized treatment strategy depends critically on the accurate staging of HCC. Combined diagnostic approaches integrating alpha-fetoprotein (AFP), alpha-fetoprotein Lens culinaris agglutinin 3 (AFP-L3), and prothrombin induced by the absence of vitamin K or antagonist-II (Pivka II) with imaging modalities are now widely used and have significantly improved diagnostic sensitivity and specificity (Johnson et al., 2014; Berhane et al., 2016; Best et al., 2016; Tayob et al., 2023). At the molecular level, circulating tumor cells (CTCs), cell-free DNA (cfDNA), and circulating tumor DNA (ctDNA) have shown substantial promise in early detection, diagnosis, prognosis prediction, disease monitoring, and therapeutic response assessment in HCC (Chan et al., 2024). However, current diagnostic methods still face limitations, including suboptimal specificity, missed detection of small lesions, high cost, low detection rates, and limited sensitivity. Therefore, continued exploration of diverse biological markers for the precise diagnosis of HCC remains critically important.
With the expanding application of microbiome research, the relationship between gut microbiota and malignant tumors has gained increasing attention. Gut microbial dysbiosis can promote tumor initiation and progression through multiple mechanisms (Schwabe and Greten, 2020; El Tekle and Garrett, 2023). Overgrowth of pathogenic bacteria disrupts the intestinal mucosal barrier and triggers sustained inflammatory responses, which, in turn, drive aberrant cell proliferation and elevate cancer risk (Kim and Lee, 2021). Certain pathogenic taxa, such as Clostridium and Escherichia, produce carcinogenic metabolites, including nitrosamines and secondary bile acids, that directly damage epithelial DNA and induce gene mutations. The gut microbiota also play a crucial role in hepatocarcinogenesis through lipopolysaccharide (LPS) and its receptor toll-like receptor 4 (TLR4) signaling pathways (Dapito et al., 2012; Roje et al., 2024). In contrast, specific beneficial microbes and their metabolites exert antitumor effects (Redman et al., 2014). For instance, Bifidobacteria and Lactobacillus secrete short-chain fatty acids (SCFA) (e.g., acetate and propionate) that regulate intestinal pH, inhibit pathogen overgrowth, protect the mucosal barrier, and directly suppress tumor cell proliferation while inducing apoptosis (Lee and Hase, 2014). Additionally, gut microbes shape both innate and adaptive immunity by enhancing immune cell activation and improving antitumor responses (Zhou et al., 2021). As key modulators of cancer immunity, gut microbiota dynamically influence therapeutic responsiveness through bidirectional interactions with the host immune system (Yu and Schwabe, 2017).
Given the pivotal role of gut microbiota in tumor biology, increasing efforts have focused on leveraging microbial signatures as biomarkers for cancer diagnosis, disease progression, and prognosis prediction (Gok Yavuz et al., 2023). In HCC, the gradual transition from chronic hepatitis to cirrhosis and eventually to hepatocellular carcinoma is accompanied by progressive alterations in the gut microbiome, making microbial profiling clinically informative for distinguishing stages of liver disease (Liu and Yang, 2023). In this study, we characterized gut microbial features across early-stage HCC, advanced HCC, liver cirrhosis, and healthy individuals using 16S rDNA sequencing. By integrating microbial signatures with commonly used clinical serological markers, we developed machine learning models to identify advanced HCC and validated their performance using an external dataset. This work provides preliminary insights into the gut microbial characteristics of advanced HCC and highlights their potential value in clinical diagnosis.
2 Materials and methods
2.1 Study population and design
We prospectively and randomly enrolled 38 patients with HCC, including 18 with stage I–II (HCC12) and 20 with stage III–IV disease (HCC34), as well as 19 patients with liver cirrhosis (LC) and 19 healthy adults (CG), who presented to the First Hospital of Shanxi Medical University between September 2023 and July 2024 for diagnosis and/or clinical management. Fresh fecal samples from patients with liver cirrhosis and HCC were collected on the first day of admission, prior to any therapeutic intervention. Samples were rapidly frozen in liquid nitrogen for 15 min, and subsequently stored at −80 °C. Clinical features, including medical history, laboratory findings, clinical presentation, and disease classification, were retrieved from electronic medical records and the laboratory information system. The diagnosis of liver cirrhosis was established based on hematological tests combined with imaging or histopathology. HCC diagnosis was confirmed using AFP, AFP-L3, and Pivka-II in combination with at least two imaging modalities or histopathological examination. HCC staging followed the latest China liver cancer staging (CNLC) system (Zhou et al., 2025), which incorporates performance status scoring to comprehensively assess treatment tolerance. Staging information for enrolled HCC patients is summarized in Table 1.
All participants had no constipation, hematochezia, diarrhea, dyspepsia, or other gastrointestinal symptoms. None had taken antibiotics or acid suppressants within the preceding month. None of the patients with HCC had undergone surgical resection, local ablation, transarterial therapies, systemic therapy, or immunotherapy at the time of sample collection. No patient had a history of systemic malignancies other than HCC.
2.2 16S rRNA amplicon sequencing
Genomic DNA was extracted from fecal samples via the Magnetic Soil and Stool DNA Kit (TianGen, China; Catalog No. DP712) following the manufacturer’s instructions. DNA quality was assessed using 1% agarose gel electrophoresis, and qualified samples were diluted with nuclease-free sterile water to a final concentration of 1 ng/μL. The V3–V4 hypervariable regions of the 16S rRNA gene were amplified using primers 341F (CCTAYGGGRBGCASCAG) and 806R (GGACTACNNGGGTATCTAAT). Each PCR reaction contained 15 μL of Phusion® High-Fidelity PCR Master Mix (New England Biolabs), 0.2 μM of each primer, and 10 ng of genomic DNA. The thermal cycling protocol consisted of an initial denaturation at 98 °C for 1 min; 30 cycles of 98 °C for 10 s, 50 °C for 30 s, and 72 °C for 30 s; followed by a final extension at 72 °C for 5 min. PCR amplicons were purified using magnetic beads, and target fragments were recovered via the Universal DNA Purification Kit (TianGen, China; Catalog No. DP214). Library preparation was performed via the NEBNext® Ultra™ II FS DNA PCR-Free Library Prep Kit (New England Biolabs, United States; Catalog No. E7430L). Libraries were quantified via Qubit 2.0 fluorometry and qPCR prior to sequencing on the NovaSeq 6000 platform with a paired-end 250 bp strategy.
Raw sequencing data were processed by merging paired-end reads, performing stringent quality filtering, and removing chimeric sequences to obtain high-quality effective tags. The DADA2 module in quantitative insights into microbial ecology 2 (QIIME2, v2022.02) was used for denoising to generate amplicon sequence variants (ASVs) and a feature table. Taxonomic annotation was conducted via the classify-sklearn algorithm in QIIME2 against the SILVA 138.1 reference database.
2.3 Bioinformatic analysis
Multiple sequence alignment of all ASV representative sequences was conducted in QIIME2 to infer phylogenetic relationships. To minimize sequencing depth bias, all samples were rarefied to the minimum sequencing depth observed across the dataset. Alpha diversity indices were calculated via QIIME2 and visualized with R (v4.0.3). Beta diversity was assessed based on weighted and unweighted UniFrac distances. Beta diversity heatmaps, and non-metric multi-dimensional scaling (NMDS) analyses were performed via QIIME2, R (v4.0.3), and Perl (v5.26.2). Microbial functional prediction was performed via Tax4Fun (v0.3.1).
2.4 Machine learning model construction
Based on the top 15 microbial families and genera identified by the Kruskal–Wallis test in the microbiome analysis, together with clinical features from 38 HCC patients collected at our center, we randomly divided the dataset into a training set and a test set at a 7:3 ratio. Using the scikit-learn package in Python (v3.5.0), we constructed three ensemble machine learning models: random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGB). Model performance was evaluated across multiple metrics, including accuracy, recall, F1-score, Matthews correlation coefficient (MCC), and area under the ROC curve (AUC), to identify the optimal classifier. For the final selected model, we further assessed stability and generalizability using 5-fold cross-validation (CV) and 200 bootstrap resampling iterations. SHAP analysis was then applied to quantify the contribution of each feature to the model’s classification output and to identify key predictive features based on importance ranking. Finally, external validation was performed using 74 published HCC cases (Zhang et al., 2021), with additional evaluation using 10-fold CV and 200 bootstrap iterations. Because Enterococcus and Bacilli were not directly available in the external dataset, they were substituted with Enterococcaceae and Lachnospiraceae, respectively. This substitution does not imply strict taxonomic equivalence and was applied to enable approximate feature alignment across datasets. In addition, and missing values for three features (Pivka II, Pseudomonas, and Moraxellaceae) were imputed using the mean values from the training set.
2.5 Statistical analysis
Clinical data were analyzed via the t-test, Kolmogorov–Smirnov test, analysis of variance (ANOVA), Welch’s ANOVA, or the Kruskal–Wallis test, as appropriate. No clinical data were missing. Differences in gut microbial communities were evaluated via zero-inflated negative binomial regression (ZINB), Kruskal–Wallis rank-sum tests, and linear discriminant analysis effect size (LEfSe, v1.1.01). All analyses and visualizations were performed via QIIME2 (v2022.02), Perl (v5.26.2), Python, and R (v3.4.3). A p-value <0.05 was considered statistically significant, and multiple comparisons were adjusted via the false discovery rate (FDR) according to the Benjamini–Hochberg procedure.
3 Result
3.1 Characteristics of the study population
A total of 76 participants were ultimately recruited for this study. The clinical features of all participants are summarized in Table 2. Males predominated in all groups, consistent with the known epidemiological features of cirrhosis and HCC. To ensure that the healthy participants were free of any disease, their mean age was significantly lower than that of the other groups, averaging 41.37 years. Alanine aminotransferase (ALT) levels in the HCC12 group were higher than those in the LC group, whereas aspartate aminotransferase (AST) levels did not differ among the HCC12, HCC34, and LC groups. Compared with the HCC12 group, the HCC34 group exhibited significantly lower albumin (ALB) and cholinesterase (ChE) levels, along with significantly higher total bilirubin (TBil) levels and Child–Pugh scores. Furthermore, the lg(Pivka II) values were significantly higher in the HCC34 group than in the HCC12 group.
3.2 Gut microbial diversity in HCC
Venn diagram analysis based on 16S rRNA amplicon sequencing revealed 624 and 727 unique ASVs in the HCC12 and HCC34 groups, respectively (Figure 1A). Gut microbial diversity appeared to play a role in distinguishing HCC patients. The α-diversity indices of both the HCC12 and HCC34 groups were significantly lower than those of healthy controls (Figures 1B–E). Specifically, analyses via the Chao1, Shannon, Simpson, and Pielou’s evenness indices indicated that both the diversity and abundance of the gut microbiota in HCC12 and HCC34 patients were reduced compared with the CG group. The Chao1 index further demonstrated that microbial richness in the HCC12 group was lower than that in the LC group. Pielou’s evenness index showed that species evenness in the LC group was decreased relative to the control group. Although the HCC34 group did not differ significantly from the LC group, all α-diversity indices were lower than those in the LC group.
Figure 1. Groupwise comparisons of gut microbial α-diversity and β-diversity. (A) Venn diagram. (B–E) Comparisons of Chao1 richness, Shannon index, Simpson index, and Pielou’s evenness among the four groups. (F) Heatmap of the UniFrac distance matrix across groups. The upper and lower values within each square represent the weighted and unweighted UniFrac dissimilarity coefficients between samples, respectively; smaller coefficients indicate lower differences in microbial diversity between the corresponding samples. (G) NMDS analysis based on unweighted UniFrac distances among groups. Each point represents a sample, and the distances between points reflect differences in community structure (stress <0.2 indicates a reliable NMDS solution). (H,I) Intergroup differences were assessed using unweighted and weighted UniFrac Kruskal–Wallis tests. (*p < 0.05, **p < 0.01, and ***p < 0.001).
The distance matrix heatmap showed that the HCC34 group exhibited the greatest dissimilarity with the CG group, whereas the HCC12 group showed the smallest dissimilarity with the LC group (Figure 1F). NMDS analysis indicated that samples from the CG group were tightly clustered, while the HCC12 and HCC34 groups were more dispersed, and the LC group overlapped with the other three groups (Figure 1G). Statistical analysis using unweighted and weighted UniFrac Kruskal–Wallis tests revealed that β-diversity differed significantly between the CG group and the other groups, as well as between the LC group and the other groups; however, no significant differences were observed between the HCC12 and HCC34 groups (Figures 1H,I).
3.3 Dominant gut microbial composition in advanced HCC patients
Analysis of the top 15 taxa at the order, family, and genus levels showed that the relative abundances of Enterobacterales, Enterobacteriaceae, and Escherichia-Shigella (the old NCBI hierarchical classification) progressively increased from the LC to HCC12 and HCC34 groups, reaching their highest levels in the HCC34 group (Figures 2A–C). Furthermore, the relative abundances of Lactobacillales, Enterococcaceae, and Enterococcus were also elevated in the HCC34 group (Figures 2A–C). This trend indicates a stepwise enrichment of specific gut microbiota as liver disease progresses from cirrhosis to HCC.
Figure 2. Species differences among groups. (A–C) Bar plots showing the top 15 most relatively abundant taxa at the order, family, and genus levels. (D,E) Differential families and genera (top 10) identified using zero-inflated negative binomial (ZINB) regression. The upper panels display clustering patterns and intergroup significance, and the heatmaps present Z-scores (standard scores). (F) Differential taxa among groups identified by LEfSe analysis (LDA score >4, p < 0.05). (G) Cladogram illustrating the phylogenetic relationships of the differential taxa. (H,I) Differential taxa at the family and genus levels (top 15) identified using the Kruskal–Wallis test. The p-values are shown on the right, with those highlighted in red indicating significance after FDR correction.
To identify taxa with significant differences between groups, we applied multiple statistical approaches. At the genus level, Escherichia-Shigella, Enterococcus, and Succinivibrio, and at the family level, Enterobacteriaceae, Enterococcaceae, and Succinivibrionaceae were dominant in the HCC34 group, with Enterococcus and Enterococcaceae showing particularly pronounced enrichment (Figures 2D,E). LEfSe analysis further identified four significantly enriched taxa across the groups (LDA score >4, p < 0.05). Notably, taxa within the same evolutionary lineage, Enterobacterales, Enterobacteriaceae, and Escherichia-Shigella, were enriched in HCC34, as were Enterococcaceae and Enterococcus (Figures 2F,G). The HCC12 group exhibited significant enrichment of Ruminococcus, whereas the LC group was dominated by Streptococcaceae, Streptococcus, and Veillonella (Figure 2F). Kruskal–Wallis rank-sum tests of family- and genus-level relative abundances yielded consistent results: Enterobacteriaceae, Enterococcaceae, Lactobacillaceae, Escherichia-Shigella, and Enterococcus were enriched in HCC34 (Figures 2H,I; Supplementary Figures S1A–E). Additionally, Ruminococcus was enriched in HCC12 and the control group but markedly reduced in HCC34, whereas Veillonella was elevated in both HCC34 and LC groups (Figure 2I; Supplementary Figures S1F,G).
We performed LEfSe analysis specifically within the HCC groups, which revealed that Enterococcus, Bacilli, Lactobacillales, and Enterococcaceae were significantly associated with distinguishing the HCC34 group (LDA score >4, p < 0.05) (Supplementary Figures S2A,B).
3.4 Associations between representative microbiota and clinical features
We examined the correlations between the top 15 family- and genus-level taxa identified by Kruskal–Wallis rank-sum tests and clinical features. At the family level (Figure 3A), Enterococcaceae, Enterobacteriaceae, Streptococcaceae, and Lactobacillaceae were positively correlated with ALT, AST, prothrombin time (PT), alkaline phosphatase (ALP), TBil, and Child–Pugh scores, with Enterococcaceae showing the strongest associations. In contrast, Eubacterium coprostanoligenes group, Rikenellaceae, and Oscillospiraceae were negatively correlated with ALT, AST, TBil, and Child–Pugh scores. These findings indicate that liver function impairment is accompanied by a relative decrease in Eubacterium coprostanoligenes group, Rikenellaceae, and Oscillospiraceae, and a relative increase in Enterococcaceae, Enterobacteriaceae, Streptococcaceae, and Lactobacillaceae. Conversely, ChE levels were significantly negatively correlated with Enterococcaceae and Streptococcaceae, and positively correlated with Ruminococcaceae, Moraxellaceae, Erysipelatoclostridiaceae, Eubacterium coprostanoligenes group, Rikenellaceae, Oscillospiraceae, and Pseudomonadaceae. Body mass index (BMI) exhibited a trend of negative association with liver cell injury, consistent with disease progression and malnutrition-related wasting.
Figure 3. Spearman correlation heatmaps between clinical features and the top 15 gut microbial taxa identified by the Kruskal–Wallis test. (A) Correlations between liver function–related clinical features and gut microbiota at the family level. (B) Correlations between liver function–related clinical features and gut microbiota at the genus level. (C) Correlations between HCC-associated clinical characteristics and gut microbiota at the family level. (D) Correlations between HCC-associated clinical characteristics and gut microbiota at the genus level (*p < 0.05 and **p < 0.01).
At the genus level (Figure 3B), Agathobacter, Alistipes, Ruminococcus, Blautia, Dialister, and Roseburia were negatively correlated with liver injury-related clinical parameters, including TBil, Child–Pugh scores, ALP, ALT, and AST. In contrast, Enterococcus, Veillonella, and Clostridioides showed positive correlations with these clinical features. These findings suggest that Enterococcus, Veillonella, and Clostridioides, together with their corresponding families (Enterococcaceae, Enterobacteriaceae, Streptococcaceae, and Lactobacillaceae), are associated with the severity of liver injury.
In addition, Erysipelatoclostridiaceae, Lachnospiraceae, and Blautia were negatively correlated with the HCC biomarkers Pivka II, AFP, and AFP-L3 (Figure 3C), which may be related to treatment responses in HCC. Conversely, Streptococcaceae and Streptococcus showed positive correlations with these biomarkers (Figure 3D), suggesting the potential diagnostic value of Streptococcus in HCC. Ruminococcaceae exhibited a notably negative association with Pivka II (Figure 3C), a biomarker with high specificity for diagnosing HCC, indicating that Ruminococcaceae may also possess diagnostic potential for hepatocellular carcinoma.
3.5 Construction of a classification model for advanced HCC and identification of core microbial biomarkers
Among the 38 patients with HCC, we first constructed machine learning classification models for advanced HCC using clinical variables, including RF, GBDT, and XGB. The XGB model demonstrated the best performance (AUC = 0.889) (Figures 4A–H). We then developed models using the top 15 family- and genus-level microbial taxa identified by Kruskal–Wallis rank-sum tests. The XGB model based solely on microbial features showed strong discriminatory ability for identifying HCC34, with the highest performance (AUC = 0.926) (Figures 4I–P). Finally, integrating clinical features with key microbial taxa further improved the diagnostic performance for HCC34, achieving optimal discrimination (AUC = 1) (Figures 5A–D). To minimize overfitting, we applied both bootstrap resampling and k-fold CV for internal validation of each model. The results indicated that differential microbial taxa possess diagnostic value for HCC staging, and that combining clinical variables with microbial features yields the best performance (XGB with bootstrap: AUC = 0.943; XGB with k-fold CV: AUC = 0.766) (Figures 5E–H).
Figure 4. Construction of machine learning classification models for advanced HCC based on clinical features and representative microbiota. (A–D) ROC curves, decision curve analysis (DCA), precision-recall (PR) curves, and calibration plots for RF, GBDT, and XGB models constructed using clinical features. (E,F) k-fold CV of the XGB model based on clinical features. (G,H) Bootstrap validation of the XGB model based on clinical features. (I–L) ROC curves, DCA, PR curves, and calibration plots for RF, GBDT, and XGB models constructed using representative microbiota identified by the Kruskal–Wallis test. (M,N) k-fold CV of the XGB model based on representative microbiota. (O,P) Bootstrap validation of the XGB model based on representative microbiota.
Figure 5. Construction of machine learning classification models for advanced HCC based on clinical features combined with representative microbiota. (A–D) ROC curves, DCA, PR curves, and calibration plots for RF, GBDT, and XGB models. (E,F) k-fold CV of the XGB model. (G,H) Bootstrap validation of the XGB model. (I) SHAP bar plot displaying the importance ranking of feature variables in discriminating advanced HCC. (J) SHAP bees plot illustrating the distribution of SHAP values for each feature; each dot represents the SHAP value of a given feature in an individual sample, with color indicating the feature value.
Feature importance analysis revealed that Enterococcus, Pivka II, Child–Pugh scores, Blautia, ALT, ALP, TBil, Moraxellaceae, and Pseudomonas were key contributors to distinguishing HCC34 from HCC12 (Figures 5I,J). Except for ALT, the remaining eight features were positively associated with HCC34. Notably, PIVKA-II and Child–Pugh scores are integral components of the CNLC staging system (Zhou et al., 2025). Enterococcus, Moraxellaceae, and Pseudomonas were specifically enriched in the HCC34 group, consistent with the earlier differential abundance analyses. Among them, Enterococcus showed the strongest discriminatory value for differentiating HCC34 from HCC12.
Using the clinical characteristics and key microbial taxa reported by Zhang et al. (2021) for 74 patients with HCC, we reclassified their early and intermediate groups as HCC12 and their terminal group as HCC34 (Supplementary Table S1). External validation using our optimized XGB model demonstrated that the top nine features selected by SHAP (Figure 5I) showed excellent discriminatory performance for advanced HCC (AUC = 1) (Figure 6A). Independent validation with bootstrap resampling and k-fold CV also yielded robust results (bootstrap: AUC = 0.924; k-fold CV: AUC = 1) (Figures 6B,C). Moreover, the feature importance ranking derived from the XGB model, which highlighting Child–Pugh scores, ALP, ALT, Enterococcaceae, and Lachnospiraceae, was consistent with the key features identified in our cohort (Figure 6D). Evaluation metrics for all models are provided in Supplementary Table S2.
Figure 6. External validation of XGB models for advanced HCC. (A) ROC curves of the XGB model evaluated using the external dataset. (B) k-fold CV of the XGB model based on the external dataset. (C) Bootstrap validation of the XGB model based on the external dataset. (D) SHAP bar plot based on the external dataset, displaying the importance ranking of feature variables in discriminating advanced HCC. Features highlighted in blue represent key variables consistently identified as important in our study cohort.
3.6 Functional prediction of the gut microbiome in advanced HCC
Based on t test analyses of microbiome functional predictions, the HCC34 group exhibited enrichment of peptidases, glutathione metabolism, and K03564 (thioredoxin-dependent peroxiredoxin) compared with the CG group (Supplementary Table S3). The enhanced peptidase activity may reflect the hypermetabolic and catabolic state characteristic of advanced HCC, in which increased proteolysis accelerates the breakdown of luminal proteins. The upregulation of glutathione metabolism and thioredoxin-dependent peroxiredoxin suggests pronounced oxidative stress within the gut microenvironment. Glutathione represents a major endogenous antioxidant, while K03564 is crucial for detoxifying peroxides. Their concurrent elevation indicates increased levels of reactive oxygen species (ROS), triggering peroxiredoxin-mediated peroxide removal and compensatory activation of glutathione metabolism. This oxidative stress–driven feedback loop may further exacerbate disease progression in advanced HCC.
4 Discussion
This study systematically characterized the gut microbiota of healthy adults, patients with liver cirrhosis, and HCC patients at different stages using 16S rRNA amplicon sequencing. From a microbiological perspective, we identified microbial taxa enriched in advanced HCC and, by integrating clinical features, developed a highly effective machine learning model. These findings provide novel insights and potential biomarkers for the precise identification of advanced HCC.
This study, from multiple perspectives, for the first time suggests the potential diagnostic relevance of Enterococcus in advanced HCC. Enterococcus was consistently identified as significant across various statistical analyses, including Kruskal–Wallis tests, LEfSe, and ZINB, in agreement with previous studies (Liu et al., 2019; Ponziani et al., 2019; Iida et al., 2021). Machine learning results further indicated that Enterococcus is one of the most critical features for distinguishing advanced HCC. Notably, a classification model based solely on gut microbial features achieved high discriminatory performance for advanced HCC, which was further improved when combined with clinical indicators. Enterococcus, together with PIVKA II and Child–Pugh scores, emerged as key discriminative features for advanced HCC. Moreover, Enterococcus was positively correlated with liver injury markers such as ALT, TBil, and Child–Pugh scores, as well as with HCC-specific biomarkers including PIVKA II and AFP-L3. These findings underscore the potential diagnostic value of Enterococcus in advanced HCC. Integration of Enterococcus with the existing GALAD model, which includes gender, age, AFP, PIVKA-II, and AFP-L3, may enhance diagnostic accuracy, particularly offering a simpler, noninvasive approach for detecting late-stage liver cancer (Huang et al., 2022).
Enterococcus is a common commensal bacterium in the human gut, but under conditions of gut dysbiosis, it can induce inflammatory responses by activating the toll-like receptor 4/nuclear factor-κB (TLR4/NF-κB) pathway, thereby promoting the progression of chronic liver disease to HCC (Seki et al., 2007; Iida et al., 2021). From a biodiversity perspective, gut microbial diversity is significantly reduced in HCC patients, particularly in advanced stages, reflecting a gradual depletion of microbiota and progressive dysbiosis along the hepatitis–cirrhosis–HCC continuum (Trebicka et al., 2021). This observation aligns with the “gut–liver axis” concept, whereby impaired liver function, increased portal vein pressure, and intestinal barrier disruption collectively drive escalating microbial imbalance, thereby facilitating HCC initiation and progression (Tripathi et al., 2018; Hsu and Schnabl, 2023; Li et al., 2025). Dysbiosis is also accompanied by enrichment of pathogenic bacteria and altered metabolites, such as increased LPS, which can modulate immune responses and trigger inflammation (Dapito et al., 2012). Hepatic cells, including Kupffer cells, hepatic stellate cells (HSCs), and hepatocytes, express the pattern recognition receptor TLR4, which specifically recognizes gut-derived LPS. Binding of LPS to TLR4 activates downstream signaling pathways; in HSCs, TLR4 activation can promote disruption of hepatocyte apoptosis mediated by NF-κB, ultimately facilitating HCC development (Schwabe and Greten, 2020; Li et al., 2022). This mechanistic insight is consistent with our functional predictions, which revealed pronounced activation of oxidative stress responses in the gut microbiome of advanced HCC. Specifically, peptidase activity, glutathione metabolism, and thioredoxin-dependent peroxiredoxin functions were significantly elevated in the HCC34 group, suggesting that the gut microbiota in advanced HCC may be associated with inflammatory regulation and antioxidant stress adaptation. These functional alterations further suggest that specific microbial taxa in advanced HCC may be associated with tumor progression and provide a theoretical basis for exploring microbiota-targeted metabolic interventions.
In contrast, bacteria producing SCFAs, such as Ruminococcus, Blautia, and Dialister, were enriched in the gut microbiota of early-stage HCC and control groups, while Alistipes was more abundant in healthy individuals. These microbes help maintain intestinal barrier integrity, regulate immune metabolism, and suppress inflammatory responses. Their reduction may indicate impaired gut defense mechanisms and represents a key feature of microbial dysbiosis in advanced tumor stages (Liu et al., 2019; Zhang et al., 2019; Bi et al., 2021; Medawar et al., 2021). The enrichment of Ruminococcus, Blautia, and Dialister in early HCC may be associated with a compensatory defensive response. As the disease progresses, the abundance of beneficial bacteria sharply declines in advanced HCC, whereas pathogenic bacteria become highly enriched, accompanied by the activation of oxidative stress responses. These changes may be associated with HCC progression. Therefore, accurate identification of advanced HCC and early correction of gut microbial dysbiosis could potentially be beneficial for disease management.
This study collected a real-world clinical cohort and combined microbiome analysis with machine learning models to focus on the gut microbiota of advanced HCC, providing preliminary evidence for its accurate diagnosis. However, several limitations should be acknowledged. First, although multiple internal and external validation strategies were applied, the relatively limited sample size may increase the potential risk of model overfitting. Therefore, the extremely high AUC values observed in this study should be interpreted with caution and regarded as proof-of-concept findings rather than clinically validated diagnostic assays. Prospective, large-scale, multicenter studies will be essential to further assess the robustness and translational potential of these microbiome-based models before their application in routine clinical practice. Second, the machine learning model for advanced HCC was externally validated using data from a single published study with a relatively limited sample size. Due to differences in taxonomic resolution, several microbial taxa were unavailable and were therefore substituted with higher-level taxonomic categories, and missing values for three features were imputed using the mean values from the training cohort. Although this approach allowed for exploratory external validation, it may have introduced classification bias and potentially inflated model performance. Consequently, the external validation results should be interpreted cautiously and warrant further confirmation in independent datasets with consistent taxonomic resolution. Third, 16S rRNA sequencing cannot resolve microbial taxa at the strain level or fully characterize metabolic functions, highlighting the need for complementary metagenomic and metabolomic analyses. Developing animal models and conducting in vitro experiments to elucidate the causal mechanisms linking core taxa (such as Enterococcus) to HCC will be essential for further investigation. In addition, precise strain-level quantification of Enterococcus using digital PCR may further enhance the diagnostic accuracy for advanced HCC.
Overall, this study provides insights into the potential associative role of the gut microbiota in HCC progression and suggests that combining Enterococcus with clinical features may offer noninvasive diagnostic value for identifying advanced HCC. In addition, it proposes that correcting gut dysbiosis might serve as a potential adjunctive therapeutic strategy. These findings provide exploratory targets and preliminary insights for the noninvasive diagnosis and precision management of advanced HCC.
Data availability statement
The original contributions presented in the study are publicly available. This data can be found at the NCBI Sequence Read Archive: https://www.ncbi.nlm.nih.gov/, accession number PRJNA1397951.
Ethics statement
The studies involving humans were approved by the Ethics Committee of the First Hospital of Shanxi Medical University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
YW (1st author): Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Writing – original draft, Writing – review & editing. ZY: Data curation, Formal analysis, Writing – review & editing. CL: Data curation, Investigation, Writing – review & editing. YL: Investigation, Methodology, Writing – review & editing. ZB: Investigation, Methodology, Writing – review & editing. WM: Investigation, Methodology, Writing – review & editing. TZ: Funding acquisition, Methodology, Validation, Writing – review & editing. YW (8th author): Conceptualization, Supervision, Writing – review & editing. XL: Conceptualization, Funding acquisition, Supervision, Writing – review & editing. ZL: Conceptualization, Funding acquisition, Project administration, Writing – review & editing. JX: Conceptualization, Funding acquisition, Supervision, Writing – review & editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This study was supported by the Natural Science Foundation of Shanxi (Grant No. 202103021224408); the Shanxi Provincial Department of Science and Technology (Grant Nos. 202204010931008, 202302130501013, and 202203021221248); and the First Hospital of Shanxi Medical University Introduction Talent Fund (Grant No. SYYYRC-2022006), the Shanxi Provincial Department of Education (Grant No. 2022L138), and National Natural Science Foundation of China (82470693).
Acknowledgments
The authors are grateful to the research team members for their insightful discussions and contributions to this work. The authors also sincerely thank Novogene Co., Ltd. for their invaluable support in bioinformatics analysis, specifically through their Novomagic cloud analysis platform.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2026.1760859/full#supplementary-material
References
Berhane, S., Toyoda, H., Tada, T., Kumada, T., Kagebayashi, C., Satomura, S., et al. (2016). Role of the GALAD and BALAD-2 serologic models in diagnosis of hepatocellular carcinoma and prediction of survival in patients. Clin. Gastroenterol. Hepatol. 14, 875–886.e6. doi: 10.1016/j.cgh.2015.12.042,
Best, J., Bilgi, H., Heider, D., Schotten, C., Manka, P., Bedreli, S., et al. (2016). The GALAD scoring algorithm based on AFP, AFP-L3, and DCP significantly improves detection of BCLC early stage hepatocellular carcinoma. Z. Gastroenterol. 54, 1296–1305. doi: 10.1055/s-0042-119529,
Bi, C., Xiao, G., Liu, C., Yan, J., Chen, J., Si, W., et al. (2021). Molecular immune mechanism of intestinal microbiota and their metabolites in the occurrence and development of liver cancer. Front. Cell Dev. Biol. 9:702414. doi: 10.3389/fcell.2021.702414,
Bray, F., Laversanne, M., Sung, H., Ferlay, J., Siegel, R. L., Soerjomataram, I., et al. (2024). Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74, 229–263. doi: 10.3322/caac.21834,
Chan, Y. T., Zhang, C., Wu, J., Lu, P., Xu, L., Yuan, H., et al. (2024). Biomarkers for diagnosis and therapeutic options in hepatocellular carcinoma. Mol. Cancer 23:189. doi: 10.1186/s12943-024-02101-z,
Dapito, D. H., Mencin, A., Gwak, G. Y., Pradere, J. P., Jang, M. K., Mederacke, I., et al. (2012). Promotion of hepatocellular carcinoma by the intestinal microbiota and TLR4. Cancer Cell 21, 504–516. doi: 10.1016/j.ccr.2012.02.007,
El Tekle, G., and Garrett, W. S. (2023). Bacteria in cancer initiation, promotion and progression. Nat. Rev. Cancer 23, 600–618. doi: 10.1038/s41568-023-00594-2,
Gok Yavuz, B., Datar, S., Chamseddine, S., Mohamed, Y. I., LaPelusa, M., Lee, S. S., et al. (2023). The gut microbiome as a biomarker and therapeutic target in hepatocellular carcinoma. Cancers 15:4875. doi: 10.3390/cancers15194875,
Hsu, C. L., and Schnabl, B. (2023). The gut-liver axis and gut microbiota in health and liver disease. Nat. Rev. Microbiol. 21, 719–733. doi: 10.1038/s41579-023-00904-3,
Huang, C., Fang, M., Xiao, X., Wang, H., Gao, Z., Ji, J., et al. (2022). Validation of the GALAD model for early diagnosis and monitoring of hepatocellular carcinoma in Chinese multicenter study. Liver Int. 42, 210–223. doi: 10.1111/liv.15082,
Iida, N., Mizukoshi, E., Yamashita, T., Yutani, M., Seishima, J., Wang, Z., et al. (2021). Chronic liver disease enables gut Enterococcus faecalis colonization to promote liver carcinogenesis. Nat. Cancer 2, 1039–1054. doi: 10.1038/s43018-021-00251-3,
Johnson, P. J., Pirrie, S. J., Cox, T. F., Berhane, S., Teng, M., Palmer, D., et al. (2014). The detection of hepatocellular carcinoma using a prospectively developed and validated model based on serological biomarkers. Cancer Epidemiol. Biomarkers Prev. 23, 144–153. doi: 10.1158/1055-9965.EPI-13-0870,
Kim, J., and Lee, H. K. (2021). Potential role of the gut microbiome in colorectal cancer progression. Front. Immunol. 12:807648. doi: 10.3389/fimmu.2021.807648,
Lee, W. J., and Hase, K. (2014). Gut microbiota-generated metabolites in animal health and disease. Nat. Chem. Biol. 10, 416–424. doi: 10.1038/nchembio.1535,
Li, C., Cai, C., Wang, C., Chen, X., Zhang, B., and Huang, Z. (2025). Gut microbiota-mediated gut-liver axis: a breakthrough point for understanding and treating liver cancer. Clin. Mol. Hepatol. 31, 350–381. doi: 10.3350/cmh.2024.0857,
Li, S., Han, W., He, Q., Zhang, W., and Zhang, Y. (2022). Relationship between intestinal microflora and hepatocellular cancer based on gut-liver axis theory. Contrast Media Mol. Imaging 2022:6533628. doi: 10.1155/2022/6533628,
Liu, Q., Li, F., Zhuang, Y., Xu, J., Wang, J., Mao, X., et al. (2019). Alteration in gut microbiota associated with hepatitis B and non-hepatitis virus related hepatocellular carcinoma. Gut Pathog. 11:1. doi: 10.1186/s13099-018-0281-6,
Liu, S., and Yang, X. (2023). Intestinal flora plays a role in the progression of hepatitis-cirrhosis-liver cancer. Front. Cell. Infect. Microbiol. 13:1140126. doi: 10.3389/fcimb.2023.1140126,
Medawar, E., Haange, S. B., Rolle-Kampczyk, U., Engelmann, B., Dietrich, A., Thieleking, R., et al. (2021). Gut microbiota link dietary fiber intake and short-chain fatty acid metabolism with eating behavior. Transl. Psychiatry 11:500. doi: 10.1038/s41398-021-01620-3,
Ponziani, F. R., Bhoori, S., Castelli, C., Putignani, L., Rivoltini, L., Del Chierico, F., et al. (2019). Hepatocellular carcinoma is associated with gut microbiota profile and inflammation in nonalcoholic fatty liver disease. Hepatology 69, 107–120. doi: 10.1002/hep.30036,
Redman, M. G., Ward, E. J., and Phillips, R. S. (2014). The efficacy and safety of probiotics in people with cancer: a systematic review. Ann. Oncol. 25, 1919–1929. doi: 10.1093/annonc/mdu106,
Roje, B., Zhang, B., Mastrorilli, E., Kovacic, A., Susak, L., Ljubenkov, I., et al. (2024). Gut microbiota carcinogen metabolism causes distal tissue tumours. Nature 632, 1137–1144. doi: 10.1038/s41586-024-07754-w,
Rumgay, H., Ferlay, J., de Martel, C., Georges, D., Ibrahim, A. S., Zheng, R., et al. (2022). Global, regional and national burden of primary liver cancer by subtype. Eur. J. Cancer 161, 108–118. doi: 10.1016/j.ejca.2021.11.023,
Schwabe, R. F., and Greten, T. F. (2020). Gut microbiome in HCC—mechanisms, diagnosis and therapy. J. Hepatol. 72, 230–238. doi: 10.1016/j.jhep.2019.08.016,
Seki, E., De Minicis, S., Osterreicher, C. H., Kluwe, J., Osawa, Y., Brenner, D. A., et al. (2007). TLR4 enhances TGF-beta signaling and hepatic fibrosis. Nat. Med. 13, 1324–1332. doi: 10.1038/nm1663,
Tayob, N., Kanwal, F., Alsarraj, A., Hernaez, R., and El-Serag, H. B. (2023). The performance of AFP, AFP-3, DCP as biomarkers for detection of hepatocellular carcinoma (HCC): a phase 3 biomarker study in the United States. Clin. Gastroenterol. Hepatol. 21, 415–423.e4. doi: 10.1016/j.cgh.2022.01.047,
Trebicka, J., Macnaughtan, J., Schnabl, B., Shawcross, D. L., and Bajaj, J. S. (2021). The microbiota in cirrhosis and its role in hepatic decompensation. J. Hepatol. 75, S67–S81. doi: 10.1016/j.jhep.2020.11.013,
Tripathi, A., Debelius, J., Brenner, D. A., Karin, M., Loomba, R., Schnabl, B., et al. (2018). The gut-liver axis and the intersection with the microbiome. Nat. Rev. Gastroenterol. Hepatol. 15, 397–411. doi: 10.1038/s41575-018-0011-z,
Yu, L. X., and Schwabe, R. F. (2017). The gut microbiome and liver cancer: mechanisms and clinical translation. Nat. Rev. Gastroenterol. Hepatol. 14, 527–539. doi: 10.1038/nrgastro.2017.72,
Zhang, N., Gou, Y., Liang, S., Chen, N., Liu, Y., He, Q., et al. (2021). Dysbiosis of gut microbiota promotes hepatocellular carcinoma progression by regulating the immune response. J. Immunol. Res. 2021:4973589. doi: 10.1155/2021/4973589,
Zhang, Z., Tang, H., Chen, P., Xie, H., and Tao, Y. (2019). Demystifying the manipulation of host immunity, metabolism, and extraintestinal tumors by the gut microbiome. Signal Transduct. Target. Ther. 4:41. doi: 10.1038/s41392-019-0074-5,
Zhou, J., Sun, H., Wang, Z., Cong, W., Zeng, M., Zhou, W., et al. (2025). China liver cancer guidelines for the diagnosis and treatment of hepatocellular carcinoma (2024 edition). Liver Cancer 14, 779–835. doi: 10.1159/000546574,
Keywords: advanced hepatocellular carcinoma, biomarkers, Enterococcus, gut microbiome, liver cirrhosis, machine learning, noninvasive diagnosis
Citation: Wang Y, Yang Z, Liu C, Liu Y, Bai Z, Miao W, Zhang T, Wang Y, Li X, Lai Z and Xu J (2026) Gut microbial signatures of advanced hepatocellular carcinoma and their potential diagnostic value. Front. Microbiol. 17:1760859. doi: 10.3389/fmicb.2026.1760859
Edited by:
Guijie Chen, Nanjing Agricultural University, ChinaReviewed by:
Xiang Zhang, Shandong University, ChinaChaobin Wang, Peking University People’s Hospital, China
Copyright © 2026 Wang, Yang, Liu, Liu, Bai, Miao, Zhang, Wang, Li, Lai and Xu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Zhiyong Lai, NjA5Nzc0NzIyQHFxLmNvbQ==; Jun Xu, anVueHV0eXRnQDE2My5jb20=
Chuang Liu1