- 1National Medical Products Administration (NMPA) Key Laboratory for Safety Evaluation of Cosmetics, Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
- 2Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou, China
- 3Guangdong Provincial Center for Disease Control and Prevention, Guangzhou, China
Background: The “gut–skin axis” has been proposed to play an important role in the development and symptoms of atopic dermatitis. Therefore, we have constructed an interpretable machine learning framework to quantitatively screen key gut flora.
Methods: The 16S rRNA dataset, after applying the centered log-ratio transformation, was analyzed using five different machine learning models: random forest, light gradient boosting machine, extreme gradient boosting, support vector machine with radial kernel, and logistic regression. Interpretable machine learning methods, such as SHAP values, were used to identify significant features associated with atopic dermatitis.
Results: Random forest performed better than the other “tree” models in the validation partitions. The SHAP global dependency plot indicated that Bifidobacterium ranked as the strongest predictive factor across all prediction horizons, although the SHAP values for some features were still higher in support vector machine and logistic regression models. The SHAP partial dependency plot for “tree” models showed that the best segmentation point for Bifidobacterium was further from the origin compared to other features in the respective models, quantitatively reflecting differences in gut microbiota.
Conclusion: Machine learning models combined with SHAP could be used to quantitatively screen key gut flora in atopic dermatitis patients, providing doctors with an intuitive understanding of 16S rRNA sequencing data to support precision medicine in care and recovery.
Introduction
Atopic dermatitis (AD), alternatively referred to as eczema or atopic eczema, is one of the most prevalent inflammatory skin disorders encountered in the pediatric population, with its incidence increasing globally over the past few decades, affecting approximately 20% of children (1, 2). This condition is characterized by severe pruritus, which often leads to skin injury, significant sleep disruption, and a negative impact on overall quality of life (2).
Although the precise etiology of AD remains elusive, emerging evidence suggests that it results from a complex interaction between dysfunction of the epidermal barrier integrity, immune dysregulation, and the influence of environmental and infectious triggers (3–5).
This synergistic interaction elicits T-cell-mediated immune responses within the skin, including a predominantly T-helper 2 (Th2) cell response, which leads to the release of chemokines, proinflammatory cytokines, IgE production, and systemic inflammatory responses, giving rise to pruritic inflammation of the skin.
Recent studies have highlighted the pivotal role of intestinal flora development in facilitating optimal intestinal function and immunological development (6). The concept of the “gut–skin axis”, which emphasizes the reciprocal influence between gut flora and the skin, has emerged as a significant factor in the pathogenesis and manifestation of AD. The mechanisms underlying the gut-skin axis are multifaceted. First, metabolic pathways involving gut microbiota metabolites, such as short-chain fatty acids (SCFAs), play a crucial role. These metabolites enhance epithelial barrier function and reduce permeability (7, 8), while other metabolites contribute to the formation of a protective mucus layer (9). Secondly, gut microbiota plays a crucial role in the activation of both innate and adaptive immune mechanisms, collectively safeguarding the host and maintaining intestinal homeostasis. This includes modulating the differentiation of naive T cells to prevent excessive production of IgE and IgG4, as well as influencing interactions with Toll-like receptors (TLRs) and T-helper cell activity (10–15). Lastly, the “gut–brain–skin axis” further extends the interplay, linking microbiota modulation to stress-induced systemic and inflammatory skin disorders (16). The key neuromodulators involved in this axis are norepinephrine, serotonin, acetylcholine, and tryptophan (17).
However, a current challenge in this field is the complexity of 16S rRNA sequencing data, which poses difficulties in identifying key flora and the impact of intestinal flora quantitatively on the initiation and progression of AD. Addressing this challenge is crucial for advancing our understanding and treatment of AD.
Over the past few years, there has been a significant increase in the utilization of machine learning (ML) techniques in biomedical diagnosis and the identification of critical features. These techniques provide powerful tools capable of discerning intricate patterns and correlations within extensive datasets (18). ML algorithms have shown considerable utility across a range of clinical applications, including predicting disease outbreaks and personalizing treatment strategies (19, 20). Lundberg et al. (21) introduced an algorithm called “Shapley additive explanations (SHAP)”, a post-hoc interpretable algorithm that uses additive attribution to compute SHAP values, thereby enhancing the interpretability of previously opaque ML algorithms (22).
The main contributions of our study can be summarized as follows. We assessed the prediction performance of five different supervised ML algorithms including random forest (RF), Light Gradient Boosting Machine (LGBM), eXtreme Gradient Boosting (XGB), Support Vector Machine with the radial kernel (SVM), and logistic regression (LR) applied to analyze 16S rRNA sequencing data. To enhance classification accuracy and reduce overfitting, we adjusted the hyperparameters governing the sample weight distribution. Subsequently, we integrated ML algorithms with SHAP to develop an ML framework that enhances interpretability and identifies crucial features influencing the diagnosis of AD. This approach facilitates statistical and data-structure-related insights, contributing to an intuitive understanding of 16S rRNA sequencing data.
Materials and methods
Data collection
The dataset we analyzed and used to construct machine learning models was downloaded from the NCBI BioSample Database (https://www.ncbi.nlm.nih.gov/) with the accession number PRJNA501811. This BioProject is from a study conducted by Zhang et al. (23). The dataset contains 112 fecal samples, 43 from children with atopic dermatitis, and 69 from the healthy control group. To characterize the composition of the gut microbiome, the V4 hypervariable region of the 16S ribosomal RNA (rRNA) genes was amplified following DNA extraction from stool specimens.
Sequences were quality filtered, clustered into amplicon sequence variants (ASV) using the Unoise2 pipeline in USEARCH (https://drive5.com/usearch/) and taxonomically classified against the Silva-123 database (http://www.arb-silva.de/). To verify if the dataset reached a sufficient sequencing depth, alpha rarefaction analysis was performed using USEARCH, and classification was carried out up to the genus level.
Taxonomic analysis
We analyzed the α-diversity (ACE and chao1 index) and β-diversity between the AD and HC groups. As for the β-diversity, both principal co-ordinates analysis (PCoA) based on the Bray–Curtis distance matrix and ADONIS test were employed to determine the significance of the difference.
To identify differentially abundant taxonomic features, linear discriminant analysis effect size (LEfSe) was applied via the “microeco” R package and “trans_diff$new()” function (24). The LEfSe bar graph plotted the significant difference in taxonomical features with a linear discriminant analysis (LDA) score of more than 2.0.
Centered log-ratio
The centered log-ratio (CLR) transformation was initially applied to the abundance data pertaining to the designated bacterial genera. Given an observational vector comprising D “count” entities—such as sequencing reads or ASVs—in a given sample, denoted as X = [X1, X2, …, XD], the CLR transformation for that sample can be estimated by the following formula:
Where Xi is the list of features in a sample, g(Xi) represents the geometric mean of the “count” vector Xi, X1i is the first feature in a sample, and is the last feature in a sample of D values.
The CLR transformation of the initial dataset requires replacing zero-count values to ensure accurate calculation, as the presence of zero-count values in the denominator of the CLR formula would make the computation infeasible. One approach to addressing this issue is to replace “0” counts with a value smaller than the detection limit (a count of 1). We followed the methodology proposed by Martín-Fernandez et al. (25), where we replaced each “0” count with 0.65 in our dataset, as our detection limit corresponds to a count of 1.
Supervised ML modeling
Five kinds of different supervised ML algorithms were trained with the features of bacterial taxa using Scikit-learn: RF, LGBM, XGB, SVM, and LR.
The entire dataset was randomly divided into a training set (70%) and a testing set (30%), followed by hyperparameter optimization to enhance the performance of these algorithms. This process involved utilizing the GridSearchCV package (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) from Python 3 within the training set to tune the parameters through a 10-fold crossvalidation process. Importantly, during the hyperparameter tuning phase, the proposed models solely relied exclusively on the training set to obtain the optimized hyperparameters, ensuring that no information from the testing set was utilized. The specific optimized hyperparameters for each ML model are listed in Supplementary Table S1.
Model evaluation
We obtained accuracy, precision, sensitivity, specificity, F1 score, and the receiver operating characteristic curve with area under the curve (AUC) by calculating the confusion matrix in Python 3. These metrics assess the ML models from several aspects.
SHAP interpretability framework
ML was regarded as a black-box model since the impact of each feature was hard to assess on its prediction, especially in sequence analysis. In our study, the SHAP algorithm (21) treats each feature as a “contributor” to the outcome prediction and explains its significance in the particular prediction made by the ML model. The sum of the cumulative Shapley value for the specific prediction and the average prediction value provides the contribution of each feature for the individual. The contribution of each feature from the 16S rRNA data is given by the sum of the cumulative SHAP value for the particular prediction and the average prediction value.
In this manner, the SHAP value of each feature indicated how it influenced the prediction. Positive SHAP values (> 0) represent a positive effect on the predicted outcome, while negative SHAP values (< 0) manifest an adverse impact, showing a protective effect in our study. Global explanations calculate and rank the average SHAP values of each feature to visualize the importance of different features, while local explanations for “tree” models (XGB, LGBM, and RF) assess how SHAP values vary with abundance transformed by CLR and determine the best segmentation point by fitting the curve to analyze the features quantificationally.
Statistical analysis
The statistical analysis in this study was performed using R software (version 4.2.0) and Python 3. Continuous variables were reported as mean ± standard deviation for normally distributed data and as median (interquartile range) for skewed variables. Two-sample t-tests or Wilcoxon’s rank sum tests were utilized to draw statistical inferences. Prior to building the ML models, random sampling was employed to split the dataset into training (70%) and test (30%) sets, respectively. The level of significance was set at p< 0.05.
Result
Overview of the 16S rRNA gene sequencing data
According to phylogenetic taxonomic levels, we performed 16S rRNA gene sequencing of all stool samples (n = 112) with AD groups (n = 43) and health control (HC) group (n = 69), and a total of 9,155,243 sequences were sorted into 2,811 ASVs.
The Shannon rarefaction curve was employed to verify whether the dataset reached sufficient sequencing depth. As shown in Supplementary Figure S1, with the increasing sequencing volume, the number of species did not increase significantly, indicating that the sample size was sufficient for our study and that the sequencing depth met the demands of the subsequent data analysis.
Richness and diversity of gut microbiota in ADs
AD samples (n = 43) demonstrated increased gut microbiota richness and diversity compared with health control (HC) samples (n = 69), as measured by the Ace index (p = 0.019) and chao1 index (p = 0.028) (Figures 1A, B), respectively. Beta diversity was applied to assess the structural composition similarity of gut flora. As shown in Figure 1C, PCoA analysis based on the Bray–Curtis distance revealed that Co-ordinate 1 accounted for 12.7% and Co-ordinate 2 occupied 5.7%. ADONIS demonstrated significant differences between the AD and HC groups (R2 = 0.02, p = 0.009) (Figure 1C).

Figure 1. Comparison of alpha and beta diversity between AD and HC groups. Each point in the figure represents a sample. (A) ACE index; (B) Chao1 index; (C) principal co-ordinates analysis, where shorter distances between samples indicate greater similarity in species composition; (D) cladogram of linear discriminant analysis (LDA) effect size (LEfSe) analysis of microbial abundance. *P<0.05.
In the LEfSe results, significant differences in microbial proportion were observed between the AD and HC groups. Eight genera were overrepresented in HCs, and three genera were overrepresented in ADs at the genus level (Figure 1D). Bifidobacterium showed the most significant differences, with the LDA score greater than 4. These findings suggest that there is gut microbial dysbiosis in AD patients, and different states have their own unique characteristics.
Model evaluation
A gut microbiota-based signature transformed by CLR can be used for predicting atopic dermatitis. Based on the comprehensive 16S rRNA analysis above, we next assessed the performance of gut microbiota as biomarkers using ML models, including XGB, LGBM, RF, SVM, and LR. In this study, model performance evaluation was calculated including accuracy, recall, precision, sensitivity, specificity, F1 score, and AUROC. As shown in Table 1, the algorithms based on “tree” models performed better than SVM and LR, demonstrating the high performance of the models. The RF performed better than the other “tree” models in the test set. The LGBM also performed well except for precision (85.70%) and specificity (81.30%), which were lower than RF’s 100.00% and 100.00%, respectively. The accuracy and specificity of the XGB model were higher than those of the LGBM model, meanwhile, the sensitivity and recall were lower than those of the other two models based on the decision tree classifier. These findings indicated that according to different statuses, we can choose different models.
SHAP interpretability framework
In consideration of the “black box” dilemma, the SHAP algorithm was conducted to visually exhibit each feature’s importance to the AD predicted by the ML models. The SHAP global dependency plot ranks the SHAP value of the features, as shown in Figure 2, including the top 30 significant features most correlated with the outcome in descending order. The vast majority of bacterial genera belong to four dominant phyla, and Bifidobacterium ranked as the strongest predictive factor for all prediction horizons, playing a decisive role in all models, though the SHAP values of some features were still higher in the SVM and LR models.

Figure 2. ML-SHAP global dependency plot analysis showing the distribution of feature importance in AD occurrence risk.
However, the SHAP global dependency plot showed limitations in depicting the association between each feature and its SHAP value visually. Consequently, to gain a deeper understanding, we drew the SHAP partial dependency plot for “tree” models, as depicted in Figure 3. The performance of the SVM and LR fell flat, so we did not take the two models into account. In the plot, each point represents an abundance and its SHAP value, with the abscissa indicating the magnitude of the transformed feature value and the ordinate showing corresponding SHAP values.

Figure 3. ML-SHAP partial dependence plot analysis showing the distribution of feature importance for AD occurrence risk.
We noted that different models provided multiple interpretations of 16S rRNA data. A specific predictive behavior was exhibited in the XGB model, which predicted a designated threshold to detect the positive and negative relationships of the predictors with the outcome result, and the greater the distance, the stronger the effect. The LGBM model showed a clear dose–response relationship and the RF indicated more smoothly. The best segmentation point falls on 6.65(XGB), 6.73(LGBM), and 6.41(RF), as for Bifidobacterium, the intersection points between the asymptote and line where y = 0, which are greater than other features in respective models, quantitatively reflecting the CLR-abundance of gut microbiota in AD and HC group.
Discussion
This study aimed to explore and discuss the significantly dysregulated microbiota in the AD group compared to the HC group, using ML algorithms combined with SHAP techniques to visualize features and their respective weights. Through 16S rRNA analysis, we found significant differences in both α- and β-diversity between the AD and NC groups. Five types of machine learning algorithms, combined with the SHAP algorithm, were used to identify significantly dysregulated microbial taxa.
Statistical methods used in bioinformatics and model construction consistently indicated that Bifidobacterium, an intestinal probiotic, was the strongest predictive factor. Previous studies have shown that Bifidobacterium is enriched in healthy infants (26, 27) and markedly reduced in the gut of infants with AD (28), as well as in those with other atopic infantile diseases (27).
The pathogenesis of AD involves multiple mechanisms, including a reduction in regulatory T cells (Tregs), which can impair immune responses and lead to an imbalance in the activation of TH1/TH2 cytokines (29). Recent evidence has increasingly highlighted the vital role of intestinal microbiota in regulating the immune system (6). Alterations in the microbiome can impact both host immunity and its response to antigens, thereby contributing to the development of allergies. Bifidobacterium has been proven to induce a regulatory dendritic cell (DC) phenotype that enhances the induction of Tregs (30, 31). Patients lacking Bifidobacterium tend to exhibit a more proinflammatory Th2/Th17 profile, whereas those with abundant Bifidobacterium display a more anti-inflammatory profile (32). Furthermore, early colonization by Bifidobacterium modulates B-cell responses, with infants colonized early showing higher levels of memory B cells at 4 and 18 months (33), and increased salivary secretory IgA at 6 months (34). These findings suggest that the presence of Bifidobacterium promotes B-cell activation, maturation, and ultimately antibody production, thereby limiting immune activation and sensitization.
The synthesis and secretion of microbial metabolites constitute a pivotal mechanism through which the gut microbiota exerts its modulatory effects on immune function, thereby exerting a significant impact on overall health outcomes. SCFAs, one of the microbial metabolites, mainly produced by Bifidobacterium (35), are by-products of bacterial fermentation and abundant microbial metabolites present in the colon. Low fecal concentrations of these SCFAs (such as propionate, butyrate, and acetate) have been associated with the occurrence and development of allergic diseases (e.g., atopic dermatitis, food allergy, asthma, allergic rhinitis) (36, 37). In addition, SCFAs might potentially interact with mesenchymal stem cells (MSCs) through G protein-coupled receptors (GPCRs), modifying their differentiation potential through the inhibition of histone deacetylase (HDAC) activity, thereby inducing distinct infant immune responses (38). Furthermore, evidence suggests that SCFAs enhance intestinal epithelial integrity and regulate various immune cell populations, including dendritic cell maturation, Treg differentiation, and antibody production (39).
Previous studies have shown that the proportion of Peptoclostridium is more likely to relate to intestinal health and recovery from imbalances (40). Anaerofilum (41, 42) and the genus Lachnoclostridium (43, 44) were associated with an increased risk of immune diseases, consistent with the trend in Figure 3: the higher the abundance, the higher the likelihood of having atopic dermatitis. In addition, there is a correlation between the Ruminococcaceae_NK4A214_group and SCFAs (45).
As for model evaluation, the “tree” models had greater performance. Bifidobacterium ranked as the strongest predictive factor for all prediction horizons and exclusively demonstrated the decisive role in all models. Perhaps due to interference from other features, SVM and LR had lower efficiency in forecasting. In contrast, the “tree” models, which may eliminate interference, manifested as a decrease in the weight of other features. RF showed the best performance, which was similar to the findings of other studies (46, 47). LGBM and XGB, both of which learned from the mistakes of previous models using boosting techniques, also showed good performance in other fields (48, 49).
It is always a challenge to correctly interpret the great predicted performances and understand why they perform well, especially in ML’s black-box tree integration model. Therefore, we applied the SHAP algorithm to evaluate the importance of features in all possible combinations of the permutations and visualize them in every prediction model, helping us provide valuable insights into the most influential features (50). To visually represent the relationship, SHAP partial dependency plots were constructed, which can exhibit how the “tree” models presorted the features according to the numerical value, followed by the fitted curve. We can quantitatively observe the relationship between abundance (or the position of abundance proportion distribution) and SHAP value.
The SHAP value in all “tree models indicates that as the proportion increases, the influence shifts from promoting to preventing, or vice versa, at a certain point. This suggests that there may be a threshold for gut microbial dysbiosis associated with AD, with different features having different thresholds. Since the CLR used a zero-average matrix data approach, the segmentation point, located at the intersection between the fitted curve and the y-axis, can be defined as a quantitative reference point for the relationship between different features and the outcome. As for Bifidobacterium, its critical value is higher than that of other genera, indicating that in the HC group, its anti-eczema effect can only be exerted when Bifidobacterium becomes dominant and its relative abundance exceeds the threshold. In contrast, in the AD group, the homeostasis of the intestinal flora appears to be disrupted. On the contrary, Ruminococcaceae_NK4A214_group, a member of the Ruminococcaceae family, showed an opposite trend to Bifidobacterium, with increased abundance reported in children who have or are likely to develop AD (28). According to the CLR formula, CLR values can be converted back to abundance values using the equation: Abundance = e^CLR value*g(Xi), where g(Xi) is the geometric mean of the feature in a sample. The reference values corresponding to the segmentation points—6.65(XGB), 6.73(LGBM), and 6.41(RF)—are 777.78, 607.89, and 837.15, respectively. This indicates that in order to exert bifidobacterium’s protective effect, its abundance must be restored to approximately 600 to 800 times the geometric mean. In contrast, the reference values for the other mentioned genera ranged from 1.2 to 1.8 times the geometric mean, which may serve as a reference for assessing the restoration of intestinal flora homeostasis.
In summary, the SHAP algorithm can visually explain which specific characteristics related to gut microbial dysbiosis are associated with a higher (or lower) risk of AD. While machine learning algorithms have the potential to assist medical researchers in clinical and mechanistic studies, their ambiguous processes and high predictive performance create a “black box” dilemma. To address this, noninvasive gut microbiota data were analyzed, and ML algorithms combined with SHAP were used to visualize the influence of each relevant feature on the model output. This approach, supplemented by medical knowledge, offers new perspectives and profound insights for the subsequent exploration of mechanisms and practical therapeutic targets.
This study also has several limitations. The hospital-based sample size was not large enough to cover all aspects of the pathogenic mechanism in gut microbiota, and other confounders have not yet been considered, which may limit the generalizability of our findings. Future research will expand the study to include different hospitals and regions, as well as other confounders, to verify the reliability of our results.
Conclusions
The results of our study indicate that the machine learning models combined with SHAP feature attribution analysis could be used to screen key gut flora, such as Bifidobacterium, and quantify their relationships. Prospective trials are needed to validate these findings and further refine the understanding of association and causality. For researchers and clinicians, interpretable machine learning algorithms are valuable tools for extracting insights and making accurate predictions from 16S rRNA sequencing data to support precision medicine in patient care and recovery.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.
Ethics statement
Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and the institutional requirements.
Author contributions
JM: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Software, Visualization, Writing – original draft. YF: Data curation, Formal Analysis, Writing – original draft. SL: Formal Analysis, Visualization, Writing – original draft. LZ: Formal Analysis, Supervision, Writing – review & editing. SC: Formal Analysis, Supervision, Writing – review & editing. ZL: Formal Analysis, Supervision, Writing – review & editing. GJ: Writing – review & editing. XY: Writing – review & editing. WW: Project administration, Supervision, Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This research was funded by the National Natural Science Foundation of China (82373600, 82073599), Special Support Program of Guangdong Province (2021TX06N107), National Key Scientific Instrument and Equipment Development Projects of China during the 14th Five-year Plan Period (2022YFF0711100) and Nutrition Science Research Foundation of BYHEALTH (TY202102002).
Acknowledgments
We appreciate all the authors and patients who participated in this study. We are also grateful to those who provided significant assistance during the writing process.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2025.1528046/full#supplementary-material
References
1. Kapur S, Watson W, Carr S. Atopic dermatitis. Allergy Asthma Clin Immunol. (2018) 14:52. doi: 10.1186/s13223-018-0281-6
2. Lugović-Mihić L, Meštrović-Štefekov J, Potočnjak I, Cindrić T, Ilić I, Lovrić I, et al. Atopic dermatitis: disease features, therapeutic options, and a multidisciplinary approach. Life (Basel Switzerland). (2023) 13:1419. doi: 10.3390/life13061419
3. Egawa G, Kabashima K. Multifactorial skin barrier deficiency and atopic dermatitis: essential topics to prevent the atopic march. J Allergy Clin Immunol. (2016) 138:350–8.e1. doi: 10.1016/j.jaci.2016.06.002
4. Nomura T, Kabashima K. Advances in atopic dermatitis in 2015. J Allergy Clin Immunol. (2016) 138:1548–55. doi: 10.1016/j.jaci.2016.10.004
5. Fonacier LS, Dreskin SC, Leung DY. Allergic skin diseases. J Allergy Clin Immunol. (2010) 125:S138–49. doi: 10.1016/j.jaci.2009.05.039
6. Lee H, Li Z, Christensen B, Peng Y, Li X, Hernell O, et al. Metabolic phenotype and microbiome of infants fed formula containing lactobacillus paracasei strain F-19. Front Pediatr. (2022) 10:856951. doi: 10.3389/fped.2022.856951
7. De Pessemier B, Grine L, Debaere M, Maes A, Paetzold B, Callewaert C. Gut-skin axis: current knowledge of the interrelationship between microbial dysbiosis and skin conditions. Microorganisms. (2021) 9(2):353. doi: 10.3390/microorganisms9020353
8. Mariadason JM, Catto-Smith A, Gibson PR. Modulation of distal colonic epithelial barrier function by dietary fibre in normal rats. Gut. (1999) 44:394–9. doi: 10.1136/gut.44.3.394
9. Cani PD, Possemiers S, Van de Wiele T, Guiot Y, Everard A, Rottier O, et al. Changes in gut microbiota control inflammation in obese mice through a mechanism involving glp-2-driven improvement of gut permeability. Gut. (2009) 58:1091–103. doi: 10.1136/gut.2008.165886
10. Kim JE, Kim HS. Microbiome of the skin and gut in atopic dermatitis (Ad): understanding the pathophysiology and finding novel management strategies. J Clin Med. (2019) 8(4):444. doi: 10.3390/jcm8040444
11. Moniaga CS, Tominaga M, Takamori K. An altered skin and gut microbiota are involved in the modulation of itch in atopic dermatitis. Cells. (2022) 11(23):3930. doi: 10.3390/cells11233930
12. Chassaing B, Ley RE, Gewirtz AT. Intestinal epithelial cell toll-like receptor 5 regulates the intestinal microbiota to prevent low-grade inflammation and metabolic syndrome in mice. Gastroenterology. (2014) 147:1363–77.e17. doi: 10.1053/j.gastro.2014.08.033
13. Samuelson DR, Welsh DA, Shellito JE. Regulation of lung immunity and host defense by the intestinal microbiota. Front Microbiol. (2015) 6:1085. doi: 10.3389/fmicb.2015.01085
14. Fujimura KE, Demoor T, Rauch M, Faruqi AA, Jang S, Johnson CC, et al. House dust exposure mediates gut microbiome lactobacillus enrichment and airway immune defense against allergens and virus infection. Proc Natl Acad Sci U.S.A. (2014) 111:805–10. doi: 10.1073/pnas.1310750111
15. Trompette A, Gollwitzer ES, Yadava K, Sichelstiel AK, Sprenger N, Ngom-Bru C, et al. Gut microbiota metabolism of dietary fiber influences allergic airway disease and hematopoiesis. Nat Med. (2014) 20:159–66. doi: 10.1038/nm.3444
16. Park DH, Kim JW, Park HJ, Hahm DH. Comparative analysis of the microbiome across the gut-skin axis in atopic dermatitis. Int J Mol Sci. (2021) 22(8):4228. doi: 10.3390/ijms22084228
17. Clarke G, Grenham S, Scully P, Fitzgerald P, Moloney RD, Shanahan F, et al. The microbiome-gut-brain axis during early life regulates the hippocampal serotonergic system in a sex-dependent manner. Mol Psychiatry. (2013) 18:666–73. doi: 10.1038/mp.2012.77
18. Stafford IS, Kellermann M, Mossotto E, Beattie RM, MacArthur BD, Ennis S. A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases. NPJ Digit Med. (2020) 3:30. doi: 10.1038/s41746-020-0229-3
19. Kim J, Ahn I. Infectious disease outbreak prediction using media articles with machine learning models. Sci Rep. (2021) 11:4413. doi: 10.1038/s41598-021-83926-2
20. Wu J, Qin C, Fang G, Shen L, Li M, Lu B, et al. Machine learning approach to screen new diagnostic features of adamantinomatous craniopharyngioma and explore personalised treatment strategies. Transl Pediatr. (2023) 12:947–66. doi: 10.21037/tp-23-152
21. Lundberg S, Lee S-I. A Unified Approach to Interpreting Model Predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems; Long Beach, California, USA: Curran Associates Inc. (2017). p. 4768–77. doi: 10.5555/3295222.3295230
22. Xue B, Li D, Lu C, King CR, Wildes T, Avidan MS, et al. Use of machine learning to develop and evaluate models using preoperative and intraoperative data to identify risks of postoperative complications. JAMA Netw Open. (2021) 4:e212240. doi: 10.1001/jamanetworkopen.2021.2240
23. Zhang Y, Jin S, Wang J, Zhang L, Mu Y, Huang K, et al. Variations in early gut microbiome are associated with childhood eczema. FEMS Microbiol Lett. (2019) 366(9):fnz020. doi: 10.1093/femsle/fnz020
24. Liu C, Cui Y, Li X, Yao M. Microeco: an R package for data mining in microbial community ecology. FEMS Microbiol Ecol. (2021) 97(2):fiaa255. doi: 10.1093/femsec/fiaa255
25. Martín-Fernández JA, Barceló-Vidal C, Pawlowsky-Glahn V. Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math Geology. (2003) 35:253–78. doi: 10.1023/A:1023866030544
26. Sim K, Powell E, Cornwell E, Simon Kroll J, Shaw AG. Development of the gut microbiota during early life in premature and term infants. Gut Pathog. (2023) 15:3. doi: 10.1186/s13099-022-00529-6
27. Low JSY, Soh SE, Lee YK, Kwek KYC, Holbrook JD, van der Beek EM, et al. Ratio of klebsiella/bifidobacterium in early life correlates with later development of paediatric allergy. Benef Microbes. (2017) 8:681–95. doi: 10.3920/bm2017.0020
28. Zheng H, Liang H, Wang Y, Miao M, Shi T, Yang F, et al. Altered gut microbiota composition associated with eczema in infants. PloS One. (2016) 11:e0166026. doi: 10.1371/journal.pone.0166026
29. Baquerizo Nole KL, Yim E, Keri JE. Probiotics and prebiotics in dermatology. J Am Acad Dermatol. (2014) 71:814–21. doi: 10.1016/j.jaad.2014.04.050
30. Konieczna P, Groeger D, Ziegler M, Frei R, Ferstl R, Shanahan F, et al. Bifidobacterium infantis 35624 administration induces foxp3 T regulatory cells in human peripheral blood: potential role for myeloid and plasmacytoid dendritic cells. Gut. (2012) 61:354–66. doi: 10.1136/gutjnl-2011-300936
31. Song J, Dai J, Chen X, Ding F, Ding Y, Ma L, et al. Bifidobacterium mitigates autoimmune hepatitis by regulating il-33-induced treg/th17 imbalance via the tlr2/4 signaling pathway. Histol Histopathol. (2024) 39:623–32. doi: 10.14670/hh-18-669
32. Henrick BM, Rodriguez L, Lakshmikanth T, Pou C, Henckel E, Arzoomand A, et al. Bifidobacteria-mediated immune system imprinting early in life. Cell. (2021) 184:3884–98.e11. doi: 10.1016/j.cell.2021.05.030
33. Lundell AC, Björnsson V, Ljung A, Ceder M, Johansen S, Lindhagen G, et al. Infant B cell memory differentiation and early gut bacterial colonization. J Immunol. (2012) 188:4315–22. doi: 10.4049/jimmunol.1103223
34. Sjögren YM, Tomicic S, Lundberg A, Böttcher MF, Björkstén B, Sverremark-Ekström E, et al. Influence of early gut microbiota on the maturation of childhood mucosal and systemic immune responses. Clin Exp Allergy. (2009) 39:1842–51. doi: 10.1111/j.1365-2222.2009.03326.x
35. Marras L, Caputo M, Bisicchia S, Soato M, Bertolino G, Vaccaro S, et al. The role of bifidobacteria in predictive and preventive medicine: A focus on eczema and hypercholesterolemia. Microorganisms. (2021) 9(4):836. doi: 10.3390/microorganisms9040836
36. Arrieta MC, Stiemsma LT, Dimitriu PA, Thorson L, Russell S, Yurist-Doutsch S, et al. Early infancy microbial and metabolic alterations affect risk of childhood asthma. Sci Transl Med. (2015) 7:307ra152. doi: 10.1126/scitranslmed.aab2271
37. Cait A, Cardenas E, Dimitriu PA, Amenyogbe N, Dai D, Cait J, et al. Reduced genetic potential for butyrate fermentation in the gut microbiome of infants who develop allergic sensitization. J Allergy Clin Immunol. (2019) 144:1638–47.e3. doi: 10.1016/j.jaci.2019.06.029
38. van der Hee B, Wells JM. Microbial regulation of host physiology by short-chain fatty acids. Trends Microbiol. (2021) 29:700–12. doi: 10.1016/j.tim.2021.02.001
39. Kim CH. Control of lymphocyte functions by gut microbiota-derived short-chain fatty acids. Cell Mol Immunol. (2021) 18:1161–71. doi: 10.1038/s41423-020-00625-0
40. Fu CY, Li LQ, Yang T, She X, Ai Q, Wang ZL. Autoinducer-2 may be a new biomarker for monitoring neonatal necrotizing enterocolitis. Front Cell Infect Microbiol. (2020) 10:140. doi: 10.3389/fcimb.2020.00140
41. Sun D, Zhang Y, Wang R, Du Q, Shi Z, Chen H, et al. Causal effects of gut microbiota on multiple sclerosis: A two-sample mendelian randomization study. Brain Behav. (2024) 14:e3593. doi: 10.1002/brb3.3593
42. Feng Y, Cui Y, Jin J, Huang S, Wei J, Yao M, et al. The alterations of gut microbiome and lipid metabolism in patients with spinal muscular atrophy. Neurol Ther. (2023) 12:961–76. doi: 10.1007/s40120-023-00477-6
43. Zhang M, Fang J, Zheng C, Lin Q, Zhang J. Gut microbiota and autoimmune neurologic disorders: A two-sample bidirectional mendelian randomization study. Front Microbiol. (2024) 15:1337632. doi: 10.3389/fmicb.2024.1337632
44. Su T, Yin X, Ren J, Lang Y, Zhang W, Cui L. Causal relationship between gut microbiota and myasthenia gravis: A bidirectional mendelian randomization study. Cell Biosci. (2023) 13:204. doi: 10.1186/s13578-023-01163-8
45. Li Z, He H, Ni M, Wang Z, Guo C, Niu Y, et al. Microbiome-metabolome analysis of the immune microenvironment of the cecal contents, soft feces, and hard feces of hyplus rabbits. Oxid Med Cell Longev. (2022) 2022:5725442. doi: 10.1155/2022/5725442
46. Topçuoğlu BD, Lesniak NA, Ruffin M, Wiens J, Schloss PD. A framework for effective application of machine learning to microbiome-based classification problems. mBio. (2020) 11(3):e00434-20. doi: 10.1128/mBio.00434-20
47. Volkova A, Ruggles KV. Predictive metagenomic analysis of autoimmune disease identifies robust autoimmunity and disease specific microbial signatures. Front Microbiol. (2021) 12:621310. doi: 10.3389/fmicb.2021.621310
48. Song W, Wu F, Yan Y, Li Y, Wang Q, Hu X, et al. Gut microbiota landscape and potential biomarker identification in female patients with systemic lupus erythematosus using machine learning. Front Cell Infect Microbiol. (2023) 13:1289124. doi: 10.3389/fcimb.2023.1289124
49. Zeng F, Su X, Liang X, Liao M, Zhong H, Xu J, et al. Gut microbiome features and metabolites in non-alcoholic fatty liver disease among community-dwelling middle-aged and older adults. BMC Med. (2024) 22:104. doi: 10.1186/s12916-024-03317-y
Keywords: machine learning, random forest, light gradient boosting machine, extreme gradient boosting, SHAP value, partial dependence plot, interpretable machine learning
Citation: Ma J, Fang Y, Li S, Zeng L, Chen S, Li Z, Ji G, Yang X and Wu W (2025) Interpretable machine learning algorithms reveal gut microbiome features associated with atopic dermatitis. Front. Immunol. 16:1528046. doi: 10.3389/fimmu.2025.1528046
Received: 15 November 2024; Accepted: 07 April 2025;
Published: 01 May 2025.
Edited by:
Kaijian Hou, Shantou University, ChinaReviewed by:
Hai-Feng Pan, Anhui Medical University, ChinaJoana Costa, University of Porto, Portugal
Yu Liu, Chinese Academy of Medical Sciences and Peking Union Medical College, China
Copyright © 2025 Ma, Fang, Li, Zeng, Chen, Li, Ji, Yang and Wu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Guiyuan Ji, MzM0OTQzMzBAcXEuY29t; Xingfen Yang, eGZ5YW5nQHZpcC4xNjMuY29t; Wei Wu, d3V3QGdkaXBoLm9yZy5jbg==
†These authors have contributed equally to this work