Gut Bacterial Characteristics of Patients With Type 2 Diabetes Mellitus and the Application Potential

Type 2 diabetes mellitus (T2DM) is a complex disorder comprehensively influenced by genetic and environmental risk, and research increasingly has indicated the role of microbial dysbiosis in T2DM pathogenesis. However, studies comparing the microbiome characteristics between T2DM and healthy controls have reported inconsistent results. To further identify and describe the characteristics of the intestinal flora of T2DM patients, we performed a systematic review and meta-analysis of stool microbial profiles to discern and describe microbial dysbiosis in T2DM and to explore heterogeneity among 7 studies (600 T2DM cases, 543 controls, 1143 samples in total). Using a random effects model and a fixed effects model, we observed significant differences in beta diversity, but not alpha diversity, between individuals with T2DM and controls. We identified various operational taxonomic unit (OTUs) and bacterial genera with significant odds ratios for T2DM. The T2DM signatures derived from a single study by stepwise feature selection could be applied in other studies. By training on multiple studies, we improved the detection accuracy and disease specificity for T2DM. We also discuss the relationship between T2DM-enriched or T2DM-depleted genera and probiotics and provide new ideas for diabetes prevention and improvement.


INTRODUCTION
According to the 2019 Ninth International Diabetes Federation Diabetes Atlas, there are approximately 463 million diabetic patients worldwide (1). It is expected that the number of diabetic patients will increase from 578.4 million in 2030 to 700.2 million in 2045, representing an increasing public health threat throughout the world (1). Epidemiologically, Type 2 diabetes mellitus (T2DM) characterized by glucose intolerance accounts for approximately 90% of all diabetic patients worldwide (2,3), and is a complex multifactorial metabolic disorder involving genetic (e.g. Tcf7l2, Kcnq1) and environmental lifestyle factors (e.g. intake of energy-dense refined food, sedentary behavior) (4)(5)(6)(7). Meanwhile, the imbalance between immune cells results in the production of excess chemokines and proinflammatory cytokines that promote systemic inflammation and lead to peripheral insulin resistance. Subsequently, this immunological dysfunction leads to diabetic patients being more risky toward many infectious diseases (diabetic foot, diabetic nephropathy, et al.) (8,9). Therefore, the study of pathological mechanisms is of great significance for the effective prevention and treatment of T2DM.
With the development of high-throughput sequencing technology, increasing evidence has shown that gut microbiota dysbiosis, as an important environmental factor, may lead to diabetes (10)(11)(12)(13)(14)(15). Microbial diversity indexes including the phylogenetic diversity and Chao1 were significantly decreased in T2DM (16). Studies have also revealed that the gut microbiome of T2DM is characterized by an enrichment of opportunistic pathogens (11) and sulfate-reducing bacteria (17,18) and depletion of probiotics (19) and butyrate-producing bacteria (11,17,18,20). For example, butyrate-producing Roseburia has been shown to causally improve glucose tolerance (21,22). Wu et al. found that Bifidobacterium and Bacteroides were less represented in the diabetic group than in the nondiabetic group (19). A Chinese study suggested that Clostridium coccoides and Clostridium leptum were significantly lower, while the fecal count of Lactobacillus was significantly higher in diabetic patients than in healthy controls (23), which is in line with previous literature indicating that Lactobacillus might contribute to chronic inflammation in diabetes development (10,24). Moreover, several studies have investigated the effects of modulation of gut microbiota on improvements of T2DM. A randomized, double-blind, and placebo-controlled study (25) showed that consumption of yogurt containing Bifidobacterium lactis BB-12 and Lactobacillus acidophilus LA-5 for 6 weeks significantly reduced the levels of blood glucose and glycated hemoglobin (HbA1) and increased the levels of erythrocyte superoxide dismutase (SOD) and glutathione peroxidase (GPx) activity and total antioxidant capacity. Similarly, the blood glucose, insulin, homeostasis model assessment for insulin resistance (HOMA-IR) index and inflammation were significantly reduced by probiotic intervention in a randomized double-blind placebo-controlled study of 61 Saudi T2DM patients (26). Recently, Mocanu et al. found that fecal microbiota transplantation (FMT) combined with low-fermentable fibers interventions regulated gut microbiota and improved HOMA2-IR and insulin sensitivity of obesity and metabolic syndrome patients (27). Therefore, gut microbiota dysbiosis is associated with T2DM, and gut microbial modulation is likely an effective strategy to improve T2DM by precision supplement of probiotics and even FMT.
Although many studies have monitored the gut microbiota and investigated its relationship with T2DM in different populations (28)(29)(30)(31)(32), inconsistent results describing microbial differences have been reported between diabetic and healthy individuals.
For example, Larsen et al. found that the proportions of the phylum Firmicutes and class Clostridia were significantly reduced in T2DM patients compared to the control group (10); whereas one Pakistani study with 60 individuals revealed that bacteria from Firmicutes along with those from Clostridia and Negativicutes were predominant in obese T2DM patients (28 (28,29). In short, the key issue associated with the gut microbiota differences between T2DM and healthy controls is the lack of apparent reproducibility in different studies when identifying the microbiome characteristics in T2DM.
Here, we systematically reviewed, collected, and analyzed 16S rRNA gene raw sequencing data from 7 studies that investigated the intestinal microbiome of T2DM patients in relation to controls, and performed a meta-analysis on gut bacterial alpha-diversity, beta-diversity, community composition, as well as the analyses of classification model and bacterial correlation. We were aiming to better understand the gut microbe differences between T2DM patients and controls across countries, develop a complementary approach for the risk assessment of T2DM, and reveal the potential of probiotic therapeutic measures for T2DM from the perspective of intestinal microecology.

Database Search and Study Selection
In adherence with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (33), a systematically computerized literature search of PubMed, EMBASE, and Web of Science was conducted until May 2020. The search strategy was as follows: diabetes (T2DM) and fecal microbiota and human and 16S rRNA. Additionally, the reference lists of identified original articles and reviews were reviewed manually for potential studies that might have been missed during the search. After an overview of the titles and abstracts, 22 publications were retained for further review of the full texts (Table S1). Studies were finally included if they met the following inclusion criteria: 1) studies were based on human fecal samples from T2DM patients and healthy subjects; 2) samples were sequenced by NGS for the 16S rRNA gene; and 3) raw sequencing data, barcodes, and metadata were publicly available or provided by the authors until October 20, 2020 upon request by email. Finally, sequencing datasets and metadata from 7 studies were obtained for subsequent analyses (16,(28)(29)(30)(31)(32), excluding the other 15 studies due to incomplete information on sequences, barcodes, or metadata (10,15,23,(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44) ( Table 1). The baseline clinical characteristics of participants recruited in the 7 studies were summarized in Table S2. The other five data-sets downloaded for model validation were generated from patients who suffered from the following diseases: colorectal cancer (CRC) (45), Parkinson (46), inflammatory bowel disease (IBD) (47), non-alcoholic fatty liver disease (NAFLD) (48), and fat syndrome (49).

Microbiome Data Processing
The V4 or V3-V4 region of the 16S rRNA gene was the most frequently sequenced fragment with the Illumina (MiSeq or HiSeq) or Ion Torrent platform (PGM or S5) among the included studies ( Table 1). Despite the different sequencing platforms and hypervariable regions of the 16S rRNA gene, we applied a uniform analytical pipeline to minimize the impact of these differences. Briefly, raw reads were quality filtered by Usearch (50) with -fastq_maxee 0.5 or were assembled using FLASH (v1.2.11) by with -x 0.2 and -M 200 for V3-V4/-M 250 for V3-V5/-M 150 for the V4 region. Closed-reference OTU picking at 97% identity was performed with Usearch against the SILVA132 database (51). For all taxonomic and diversity analyses, samples with sequencing depths less than 10000 sequences in the OTU table were not used for downstream analyses. The OTU table was rarefied to the lowest sequencing depth within each study.

Statistical Analysis
The a diversity indexes, bacterial richness (observed OTUs), Shannon index, and evenness (J) were calculated based on OTU tables of each study. Significance tests between T2DM patients and healthy controls were conducted by the Wilcoxon test method. Differences in community structure across samples (b diversity) were visualized by principal coordinates analysis (PCoA) plots based on Bray-Curtis distance. Significance tests were determined using permutational multivariate analysis of variance (PERMANOVA) with 10 4 permutations in vegan (52). Meta-analysis of bacterial alpha diversity indexes and microbial taxa among the 7 studies was performed to determine the consistency using both the random effects (RE) model and fixed effects (FE) model in the metafor package (53). Generally, we calculated the odd ratios (ORs) of these metrics by assigning any value above the median of the metric within the study as positive.
Random forest (RF; number of trees, 500) models were trained for individual studies, and datasets combined all studies together at the OTU and genus levels to test whether a mixture of featured taxa can predict T2DM. We evaluated their performance using leave-one-out (LOO) cross-validation and scored the predictive power in a receiver operating characteristic (ROC) analysis. Meanwhile, to refine microbiome signatures for diabetic detection, we developed a two-step procedure modeling workflows with rigorous external validation to avoid overfitting and overoptimistic reports of model accuracy. In the first step, we ranked the common OTUs and genera by their relative abundances. Next, as a precaution against overoptimistic evaluation, stepwise feature selection was employed to select predictive microbial features and eliminate uninformative features based on 10-fold cross-validation (the depict in Figure S1). The discriminatory power of OTUs and genera was calculated as the area under the ROC curve (AUC). Subsequently, we further explored the interaction between different genera and probiotics by Cytoscape (v3.5.1) (54). All statistical and correlation analyses were conducted in R (v3.5.3) (55). Figures were plotted mainly used ggplot2 (v3.0.0) (56) and gridExtra (57).

Characteristics of Included Studies
Following quality filtering, a total of 1143 samples (543 healthy controls and 600 T2DM patients) from 7 studies were retained for downstream analyses ( Table 1). Overall gut microbial community structures in T2DM patients were significantly different from those in healthy individuals (PERMANOVA, F=16.706, p<0.001) when combining all samples from the 7 individual studies together. However, samples were distinctly clustered primarily by individual studies in PCoA (Figure 1), probably due to different populations (ethnicity) worldwide, as well as strong variables such as DNA extraction methods, 16S rRNA gene regions investigated, and sequencing platforms adopted by individual studies. This large variability in the gut microbiota across studies prompted us to perform a further meta-analysis.

Microbiome Profile Differences Between T2DM and Controls
The differences in alpha diversity metrics between T2DM patients and controls were first analyzed. When calculating the odds ratios (ORs), none of the ORs of alpha diversity metrics were significantly higher than 1.0 for T2DM in either the RE model or FE model with low heterogeneity (Figure 2A), indicating nonsignificant differences in microbial alpha diversity between T2DM patients and controls. Even compared within individual studies, significantly higher microbial richness in controls than T2DM was observed in only 2 of 7 studies, while significantly higher Shannon diversity and evenness were observed in only one study (Supplementary Table 3A). However, when measuring differences in the entire community between T2DM and controls by PERMANOVA, significant differences in overall communities between T2DM and healthy individuals were obtained in 6 of 7 studies (Supplementary Table 3B). Again, by calculating the ORs based on the Bray-Curtis metric in each study, we found significant bacterial community differences between T2DM and controls in both RE models and FE models with high heterogeneity ( Figure 2B).
To further identify the significantly different taxa between healthy controls and T2DM patients, we calculated the ORs and relative abundance of all common taxa in each study ( Figures 2C  and S2A-S5B). Taxonomic abundances of bacterial phyla grouped by individual study showed consistent trends: increased relative abundances of Firmicutes (class Negativicutes or order Selenomonadales or family Veillonellaceae) and Actinobacteria (class Actinobacteria) and decreased relative abundances of Bacteroidetes (class Bacteroidia or order Bacteroidales) in patients with T2DM, which coincided with the RE model in our pooled meta-analysis ( Figures S2A-S5B). The relative abundance and OR values of other species, includingbacterial phyla, class, order, and family, were depicted in Figures S2A-S5B, respectively. At the genus level, a total of 24 genera were identified as significantly associated with T2DM ( Figure 2C). Six genera had significant ORs higher than 1.0 for the absence of diabetes in the RE and FE models, including Barnesiella, Butyrivibrio, Coprobacter, Tyzzerella 3, and Paraprevotella. Eighteen genera possessed significant ORs lower than 1.0 for the presence of diabetes, three of which were thought to be harmful to humans, including Desulfovibrio, Enterobacter, and Neisseria. In addition, there were some genera, such as Lactobacillus, Prevotella_6, and Eubacteria (58), which were beneficial to the human. These results showed that there were dependable and significant community-wide changes in the bacterial community structures of diabetic patients.
Subsequently, we assessed how well the classifier trained on one study can be generalized to the other six studies. Crossvalidation performance as quantified by AUC showed poor prediction performance of other studies on the predictor of one single study [median AUC = 0.58, ranging in (0.45, 0.78) for OTUs and median AUC = 0.59, ranging in (0.44, 0.77) for genera], compared to the single study's own test set [median AUC = 0.94, ranging in (0.77, 1.0) for OTUs and median AUC = 0.95, ranging in (0.73, 1.0) for genera] (Figures 3B and S6B). We further assessed whether including data from all but one study in model training could improve prediction in the remaining holdout study (LOOS validation). The LOOS performance of OTUlevel models ranged from 0.74 to 0.85, while the LOOS performance of genus-level models ranged from 0.75 to 0.87 ( Figures 4A and S7). These results suggest that the inclusion of multiple studies in the training set of a classifier can substantially A C B FIGURE 2 | Forest plot of (A) the alpha diversity metrics and (B) Bray-Cutris distances between the individual with diabetics and the controls and (C) the genus metrics (Among them, the full name of L_AC2044_group is Lachnospiraceae_AC2044_group). The error bar depicts the 95% confidence interval. The value less than 1.0 (left side of the dashed line) depicts that the metric is higher in T2DM than the control. The values bigger than 1.0 (right side of the dashed lines) depicts that the metric is lower in T2DM than the control. There were significantly difference between the case and the control, if there was no cross between the dashed line and the error bar.
improve its predictive performance relative to models trained on data from a single study. Then, by performing feature importance ranking on features obtained by feature screening based on shared OTUs of the total study ( Figure 4B), we found that this model ranked OTUs belonging to Dorea, Clostridium_sensu_stricto_1, and Lactobacillus as the top 3 features in terms of mean decrease accuracy (Table S4). Meanwhile, we assessed the prediction performance of our T2DM classifiers based on studies for colorectal cancer (45), Parkinson's disease (46), inflammatory bowel disease (47), NAFLD (48), and fat patients (49) ( Figure 4C). Interestingly, we found that our OTU classification models were significantly improved over those observed for classifiers trained on other diseases, calibrated to have an average value of 0.87± 0.01 on T2DM data sets (t-test, p<0.05, Figure 4C). However, the average value of predicted probabilities on other disease data sets ranged from 0.48± 0.02 to 0.68 ±0.01. At the same time, the difference test found that diabetes is significantly different from other diseases ( Figure 4C).

The Correlation of Featured Genera With Probiotics
In the context of diabetes mellitus, experimental and clinical studies have demonstrated that different species of bacteria reduce oxidative stress, showing antidiabetic effects (59). Thus, to further discuss the effects of probiotics on metabolic control in T2DM subjects, we studied the interaction between selected genera and probiotics. Based on the 149 differentially enriched genera and the top 30 genera corresponding to the importance of OTUs selected based on feature selection, 24 overlapping genera were selected for downstream analysis (Tables 2 and S5). In addition to distinguishing between individuals with and without T2DM, twenty-two of the 24 genera showed associations with a number of probiotics (Spearman correlation p < 0.05, abs(rvalue) >0.1, Figure 5). For example, T2DM-depleted genera, including Clostridium_sensu_stricto_1, Blautia, and Dorea, were positively correlated with Bifidobacterium breve and Bifidobacterium adolescnts and were negatively correlated with L. acidophils. Meanwhile, Clostridium_sensu_stricto_1, Blautia, and Lactobacillus were correlated negatively with Bifidobacterium bifidum. Also, Lactobacillus that was enriched in T2DM patients was positively correlated with Lactobacillus delbruecki, B. breve, and Lactobacillus salivarius. By observing correlations with probiotics, supplementation with specific probiotics may effectively regulate the gut microbiota and improve T2DM, implying that balancing the intestinal microecology could provide a new prevention and treatment method for T2DM patients. Indeed, studies have shown that there is a significant association between certain genera and diabetes.

DISCUSSION
This study systematically evaluated the differences in intestinal flora between T2DM patients and healthy controls based on 1143 samples from China (16,31), Colombia (32), Pakistan (28), Africa (30), Mexico (29). We observed significant differences in overall microbial communities (beta diversity) between T2DM and the controls but no differences in alpha diversity indexes. Various OTUs and bacterial genera with significant odds ratios were identified for T2DM. Through ranking abundance and stepwise feature selection, the RF models based on single studies maintained their accuracy in other studies. By training on multiple studies, we further improved the accuracy and specificity of models for T2DM. Finally, the correlations between T2DM-associated genera and probiotics provide support for diabetes intervention or prevention by probiotics. Our sequence-based analysis portrayed non-significant differences in OTU richness, evenness and Shannon diversity index between T2DM cases and controls. This is consistent with the results of most included studies, among which only two studies reported significant differences in Chao1 in patients with T2DM compared to controls (16,29) and one study reported significant differences in observed and Shannon indicators (30). Meanwhile, comparing alpha diversity based on OR values, the PRJNA472187 study showed that diabetic patients tended to have less evenness and Shannon diversity than controls, while other studies showed the opposite trend. The absence of (B) The prediction probability relative to T2DM classification models trained on fecal samples from patients with other conditions. The "****" indicates that the predictive performance of the model between other diseases and diabetes is significantly different (t-test, p < 0.05). (C) The relative abundance, AUC value and feature ranking of the selected 96 OTUs based on two-step feature selection. consistency indicates no significant differences in alpha diversity between T2DM and healthy controls, possibly due to the relatively small sample sizes or methodological variability for generating microbiome data. Significant differences in beta diversity metrics between T2DM cases and controls were reported by five studies (16,(28)(29)(30)(31). There were also significant differences between T2DM and HC samples in a meta-analysis with higher heterogeneity for the RE model. Meanwhile, we observed distinct clustering of 7 individual studies by Bray-Curtis metrics (p<0.001). This heterogeneity in 7 studies of cases and controls may be due to the methodological, clinical, or study heterogeneity (geography, ethnicity, diet) of the included studies. Meanwhile, studies have reported a positive correlation between Lactobacillus and T2DM (19,20,44,60,61). In agreement with these results, Lactobacillus and Eubacteria were significantly enriched with diabetes in the meta-analysis. Eubacteria have been reported as SCFA producers, including propionate and butyrate. Sanna et al. found that butyrate produced by intestinal microorganisms can improve the body's insulin response and further promote immune modulation, while propionate abnormalities can increase the risk of T2DM (22). Moreover, the reported microbial profiles of patients with T2DM were analogical across the 7 included studies. Chavez-Carbajal et al. reported that the phylum Actinobacteria was highly abundant in patients with T2DM, whereas the phylum Bacteroidetes was less abundant (29). Five studies and some literatures reported the predominance of Firmicutes in T2DM patients (15,29,62). In our pooled meta-analysis, we found a consistent trend toward increased relative abundances of the phyla Firmicutes (class Negativicutes or family Veillonellaceae) and Actinobacteria and decreased relative abundances of Bacteroidetes (class Bacteroidia or family Bacteroidaceae) for T2DM. If dysbiosis with increased Actinobacteria and decreased Bacteroidetes is associated with T2DM, then measures to balance these taxa by FMT or other methods may be beneficial and feasible to improve T2DM.
We developed random forest classification models using microbiota data at the OTU and genus levels. Through extensive and statistically rigorous validation, our metaanalysis firmly establishes that gut microbial signatures are highly predictive of diabetes. In particular, 16S rRNA classifiers trained on OTU and genus profiles from multiple studies maintained a median AUROC of 0.83 [ranging in (0.77, 0.87)] and 0.84 [ranging in (0.75, 0.87)], respectively, in six out of seven data sets compared to a single study [median AUC = 0.56, ranging in (0.52, 0.62)]. This may be attributed to the fact that the samples studied by a single center are not universal. Meanwhile, our RF analysis identified several OTUs related to Dorea, Clostridium_sensu_stricto_1, and Lactobacillus as the most important features for predicting diabetes. The relationship between T2DM-associated (enriched or depleted) genera and probiotics shows that Clostridium_sensu_stricto_1 and Blautia were positively correlated with B. breve, and that Lactobacillus enriched in T2DM patients was correlated negatively with B. bifidum. Previously, Clostridium_ sensu_stricto_1 was reported to be negatively correlated with insulin (63), C-peptide and triacylglycerol (64). Lactobacillus was significantly positively correlated with glucose and glycated hemoglobin (23). L. reuteri (65) used as a monoprobiotic have been reported to improve T2DM-related symptoms in humans. Our results support prior work suggesting adjustment of the intestinal microecology to provide new prevention and treatment strategies for T2DM. There were limitations in this study. First, the study methodology in the included studies varied. These studies were performed with different sequencing platforms and sequencing regions, and differences in research methods have certain effects on the intestinal microbiota. Second, the included reports had relatively small sample sizes, with three of the seven studies recruiting more than 50 participants with T2DM. Some authors were reluctant to share data. Third, the studies included in our analysis used 16S rRNA sequencing to analyze the changes in bacterial groups, which underestimated the complexity of the gut microbiota. Despite these limitations, we systematically searched all raw sequencing data and meta-data and analyzed them in a suitable and uniform manner to minimize heterogeneity, which is important to detect alterations in the gut microbiota in patients with diabetes.
In summary, our study analyzed diverse fecal 16S rRNA gene sequencing datasets in a uniform manner and revealed shifts in fecal bacterial diversity and taxa in T2DM. By selecting bacterial features and building an RF model, we raise the possibility of a fecal bacterial mode of monitoring gut health and a complementary approach for risk assessment of T2DM. Furthermore, by analyzing the interaction between T2DMassociated genera and probiotics, we provide evidence for the therapeutic potential of probiotics applied in T2DM to restore and maintain a healthy gut microbiota state.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
BZ, CX, KH, and MC conceived of the study. JH, ZC, AL, LY, QZ, QC, CY, ZW, DZ, and FC were responsible for collecting published data sets, YQ, MC, JH, and BZ were involved with the meta-analysis, YQ, MC, CX, ZC, and KH completed the statistical analysis, and YQ, MC, JH, CX, KH, and BZ wrote the paper. All authors contributed to the article and approved the submitted version. FUNDING FIGURE 5 | Associations between T2DM-enriched or depleted genera and probiotics. Differentially enriched 24 genera (q-value <0.05, FDR-controlled Wilcoxon rank-sum test, Table 2) were analyzed for associations with probiotics. Twenty-two of the 24 genera with significant correlation [P-value <0.05, abs(r-value)>0.1] were selected for the visualization. T2DM-enriched genera are represented by red circles, and control-enriched genera are represented by cyan circles. The size of each circle indicates connectivity. Green lines indicate positive association, and red lines-negative associations.

ACKNOWLEDGMENTS
We thank the authors who shared the raw sequencing data and associated metadata when their studies were published or when requested by emails.