Edited by: Ying Wang, Shanghai Institute of Immunology, Shanghai Jiao Tong University School of Medicine, China
Reviewed by: Zheng Sun, Brigham and Women’s Hospital and Harvard Medical School, United States; Yeshi Yin, Hunan University of Science and Engineering, China; Grégory Dubourg, IHU Méditerranée Infection, France
This article was submitted to Microbial Immunology, a section of the journal Frontiers in Microbiology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Gut microbiome dysbiosis has been known to be associated with all stages of non-alcoholic fatty liver disease (NAFLD), but questions remain about microbial profiles in progression and homogeneity across NAFLD stages. We performed a meta-analysis of three publicly shotgun datasets and built predictive models to determine diagnostic capacity. Here, we found consistently microbiome shifts across NAFLD stages, of which co-occurrence patterns and core sets of new biomarkers significantly correlated with NAFLD progression were identified. Machine learning models that are able to distinguish patients with any NAFLD stage from healthy controls remained predictive when applied to patients with other NAFLD stages, suggesting the homogeneity across stages once again. Focusing on species and metabolic pathways specifically associated with progressive stages, we found that increased toxic metabolites and decreased protection of butyrate and choline contributed to advanced NAFLD. We further built models discriminating one stage from the others with an average of 0.86 of area under the curve. In conclusion, this meta-analysis firmly establishes generalizable microbiome dysbiosis and predictive taxonomic and functional signatures as a basis for future diagnostics across NAFLD stages.
Non-alcoholic fatty liver disease (NAFLD) is defined as the pathological accumulation of lipid droplets in >5% of hepatocytes (
Evidence is accumulating that the gut microbiome is involved in the etiology of NAFLD, while few studies have focused specifically on microbiota signatures in NAFLD at species level (
Here, we collected currently available NAFLD shotgun metagenomic datasets (
After filtering samples with incomplete diagnostic information, we considered 39 patients with NAFL, 39 patients with NASH, 15 patients with fibrosis, 14 patients with cirrhosis, and 120 healthy controls (
Normalization and overview of the NAFLD microbiome taxonomic profiles at species level.
At a meta-analysis FDR of 0.05, we identified 99 species, out of 261 species consistently detected across studies, to be associated with general NAFLD microbiome dysbiosis, of which 47 microbial species were identified to be significantly enriched in patients and 52 microbial species were identified to be significantly depleted (
Meta-analysis identifies a core set of gut microbes strongly associated with NAFLD.
We then focused on a core set of the 38 most significant markers (FDR < 1 × 10–3) for further analysis (
A co-occurrence network and the correlation between gut microbes and NAFLD.
In addition to the species associated to general microbiome dysbiosis for NAFLD, we also look for species associated with progressive subtype (NASH, fibrosis, and cirrhosis) that can lead to serious consequences, such as hepatocellular carcinoma and liver-related death. Ten species were enriched in patients with progressive NAFLD, including
We also did univariate analysis of genus associated with NAFLD. The raw counts were log-cpm-SNM transformed to reduce batch effect as mentioned above (
Functional potential of the microbiome was also significantly associated with NAFLD samples when compared to healthy controls. We found 11,941 of the 17,426 single gene families (FDR < 0.05) detected at least once to be enriched in NAFLD patients and 12,969 to be enriched in controls at meta-analysis FDR < 0.05. We further observed 179 out of 189 metagenomically reconstructed microbial functional pathways (FDR < 0.05) to be at least once control-enriched, and only 10 to be enriched in NAFLD patients at all stages of NAFLD. The disordered metabolic pathways showed an abnormal glycolipid metabolism, such as glycolysis, glyoxylate bypass, tricarboxylic acid (TCA) cycle, as well as fatty acid elongation, oxidation, and degradation, and these have been reported in several intestinal and metabolic disorders of multiple etiologies, such as colorectal cancer (
Microbial metabolic pathways altered in NAFLD. The significance of gut microbial species derived from blocked two-sided Wilcoxon tests (+ indicates statistical significance FDR < 0.05). In the generalized fold-change color scale, yellow represents microbial pathways that were increased in the NAFLD group compared with the healthy control group, while purple represents pathways that were decreased in the NAFLD group compared with the healthy control group. NAFL, non-alcoholic fatty liver; NASH, non-alcoholic steatohepatitis.
To evaluate the utility of the metagenomic gut microbiome signature for the detection of NAFLD, we tested its diagnostic accuracy between patients and healthy controls and cross four stages of NAFLD by training stochastic gradient-boosting machine (GBM) learning models. In patients–controls validation using species-level taxonomic normalized abundances, we observed performances ranging in area under the receiver operating characteristic curve (AUROC) score from 0.9861 to 1.0000 and in area under the precision-recall curve (AUPR) score from 0.8723 to 1.0000 (
Taxonomic classification models generalize across stages by GBM.
Among the features used in the model validating patients and controls,
Ranking relevance of each species in the predictive models for each stage.
Our study was performed across multiple datasets and populations, through a combined analysis of fecal NAFLD metagenomes from three publicly available datasets. Divergence of metagenomic approaches and study design, such as differences in sample collection and preservation, DNA extraction methodology and sequencing platform, all affect the composition of downstream sequence data. The effect of study-associated heterogeneity on microbiome composition was first quantified. The sequencing platform was the same in all three studies (Illumina HiSeq), while DNA extraction methods were different. Although all three studies stated that the sampling method was rapid freezing to −80°C, there were still technical differences due to human factors. So, the technical variation, such as sampling and DNA extraction for each of the downloaded dataset, was integrally considered as batch effects, which be dealt with at the beginning. Although these effects cannot be completely eliminated, they were greatly reduced (
Researchers are often more likely to focus on the difference between disease and healthy controls, while the commonality between related diseases is often neglected. Here, we identified a core gut microbiome signature for general NAFLD microbiome dysbiosis instead of disease-stage-specific links (
Broadly applicable, non-invasive methods for diagnosing the stage of NAFLD are currently not available. The identification of microbial biomarkers for NAFLD may enable the design of non-invasive diagnostic tools. We developed machine learning models able to distinguish patients with any stage of NAFLD from healthy controls with an average performance of 0.99 AUROC when validated on datasets excluded from the training of the model (
Although this study included the relatively small sizes of the experimental cohorts, analysis of patients with different stage of NAFLD presents a distinct opportunity for studying the general NAFLD-associated and stage-specific microbiome. By combining multiple cohorts of potentially low generalizability, it is possible to obtain better representation of the spectrum of NAFLD cases and controls. At present, researches about gut microbes are still very limited, and we still know little about the role of different strains in different situations. Even some known probiotics can be opportunistic pathogens. Therefore, this study combines data from three studies to identify potential candidate bacteria that contribute to disease development. These bacteria have been poorly studied and functional studies are needed to explore their role in disease. With appropriate methodology, artifactual findings due to batch effects present in any individual dataset can be avoided. In addition, the identification of pathogenic and beneficial microbial species might lead to novel therapies for severe forms of NAFLD. Taking account of limited accuracy of serum markers, the expense of MRE technologies and the invasiveness of liver biopsy, gut microbiome test is more convenient and feasible for disease screening. Our discovery of a gut microbiome-derived signature that accurately identifies the stage of NAFLD lays the foundations and points to the potential for non-invasive microbial diagnostic tests to supplement existing screening.
We used PubMed to search for studies that published fecal shotgun metagenomic data of human NAFLD patients and healthy CTRLs. Raw FASTQ files were downloaded for the three included studies from the European Nucleotide Archive (ENA) using the following ENA identifiers: ERP015847 for
The stage of NAFLD was diagnosed according to liver biopsy. Biopsies were assessed for the following three parameters: Steatosis was graded 0–3, lobular inflammation was graded 0–3, and ballooning was graded 0–2. Fibrosis stage was classified into five stages from 0 to 4. NAFL patients have fat accumulation in the liver (steatosis) involving at least 5% of hepatocytes on routine stains without lobular inflammation, ballooning, and fibrosis. Presence of NASH was defined as a pattern that was consistent with steatohepatitis including presence of at least 5% steatosis, lobular inflammation, and ballooning with or without peri-sinusoidal fibrosis (fibrosis stage 1). Fibrosis stage consists of periportal fibrosis (fibrosis stage 2) and bridging fibrosis (fibrosis stage 3). Cirrhosis was defined as stage 4 fibrosis.
Participants were included in the study if they met the following criteria: (1) 18 years or older, (2) fat accumulation in the liver (steatosis) involving at least 5% of hepatocytes on routine stains, (3) no evidence of other acute or chronic liver disease, and (4) absence of regular or excessive use of alcohol. Patients were excluded from the study if they met any of the following criteria: (1) clinical or histological evidence of alcoholic liver disease; and (2) clinical or biochemical evidence of liver diseases other than NAFLD, including hepatitis B, hepatitis C, alpha-1 antitrypsin deficiency, hemochromatosis, Wilson’s disease, autoimmune hepatitis, polycystic liver diseases, cholestatic liver diseases, and vascular liver diseases. Patients in the three studies who met the above conditions and had a clear liver biopsy diagnosis were included in this study.
The liver imaging and liver biochemistry results of all healthy controls were in the normal range. Physical examination; routine examination of blood, urine, and stools; preoperative serological tests; liver function; renal function; electrolyte; liver ultrasound; electrocardiogram; and chest X-ray results were checked in the healthy controls to exclude any abnormal samples, such as clinical or biochemical evidence of liver diseases, chronic illnesses associated with hepatic steatosis, use of drugs known to cause hepatic steatosis, and presence of systemic infectious illnesses.
Fecal metagenomic shotgun sequences were quality filtered using Trimmomatic (
Cognizant of how technical variation and heterogeneous ethnicity between studies could confound our results, we made data normalization to remove batch effects before further analysis. In brief, we transformed our discrete taxonomical count data to approximately normally distributed, log-count per million (log-cpm) data, which models and removes the data’s heteroskedasticity; and then performed supervised normalization (SNM) on the data to remove significant batch effects while preserving biological effects (
Since microbiome data are characterized by non-Gaussian distributions with excessive dispersion, the non-parametric significance testing using blocked Wilcoxon rank-sum testing was implemented in the R “coin” package (
Stochastic GBM learning models were trained, automatically tuned, and tested using the GBM package (
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/
TW and HX conceived and supervised the study. TW performed the taxonomic profiling, machine learning, statistical analyses, produced the figures, and wrote the manuscript with contributions from X-KG and HX. All authors discussed and approved the manuscript.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
HX was supported by the National Natural Science Foundation of China (Grant 31821003) and the China Ministry of Science and Technology (Grant 2018AAA0100300). Shanghai Municipal Key Clinical Specialty (shslczdzk02602) and Shanghai Science and Technology Development Funds (2020-SH-XY-2).
We are thankful to Zhao’s cluster for sequence preprocessing.
The Supplementary Material for this article can be found online at: