Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model

Ai, Dongmei; Pan, Hongfei; Li, Xiaoxin; Gao, Yingxin; Liu, Gang; Xia, Li C.

doi:10.3389/fmicb.2019.00826

ORIGINAL RESEARCH article

Front. Microbiol., 24 April 2019

Sec. Systems Microbiology

Volume 10 - 2019 | https://doi.org/10.3389/fmicb.2019.00826

Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model

1. Basic Experimental of Natural Science, University of Science and Technology Beijing, Beijing, China
2. School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China
3. Department of Medicine, Stanford University School of Medicine, Stanford, CA, United States

Abstract

Colorectal cancer (CRC) is the third most common cancer worldwide. Its incidence is still increasing, and the mortality rate is high. New therapeutic and prognostic strategies are urgently needed. It became increasingly recognized that the gut microbiota composition differs significantly between healthy people and CRC patients. Thus, identifying the difference between gut microbiota of the healthy people and CRC patients is fundamental to understand these microbes' functional roles in the development of CRC. We studied the microbial community structure of a CRC metagenomic dataset of 156 patients and healthy controls, and analyzed the diversity, differentially abundant bacteria, and co-occurrence networks. We applied a modified zero-inflated lognormal (ZIL) model for estimating the relative abundance. We found that the abundance of genera: Anaerostipes, Bilophila, Catenibacterium, Coprococcus, Desulfovibrio, Flavonifractor, Porphyromonas, Pseudoflavonifractor, and Weissella was significantly different between the healthy and CRC groups. We also found that bacteria such as Streptococcus, Parvimonas, Collinsella, and Citrobacter were uniquely co-occurring within the CRC patients. In addition, we found that the microbial diversity of healthy controls is significantly higher than that of the CRC patients, which indicated a significant negative correlation between gut microbiota diversity and the stage of CRC. Collectively, our results strengthened the view that individual microbes as well as the overall structure of gut microbiota were co-evolving with CRC.

Introduction

A large number of microbes colonize the human body. They form a complex microbial community, or microbiota (Tringe et al., 2005; Zhao et al., 2013; Liao et al., 2015). Among them, the gut microbiota is the most diverse, with more than 1,000 species (Kostic et al., 2012; Li et al., 2012; Ahn et al., 2013). Those microbes are involved in maintaining intestinal homeostasis, through physiological processes such as metabolism, immune responses, and inflammation, all of which are essential for human health. Previous studies revealed a deliciated and dynamic balance between the microbial community and the host, which is likely the result of long term co-evolution. However, studies also observed that pathogenic changes in the structure, composition, and function of gut microbiota can lead to various diseases, often by causing the production of abnormal metabolites (Chen et al., 2016a; Huang et al., 2017a,b). Those diseases and conditions include irritable bowel syndrome (Kipanyula et al., 2013), Crohn's disease (Sommer and Bäckhed, 2013), and colorectal cancer (CRC) (Zackular et al., 2014; Rea et al., 2018).

The mechanisms by which gut microbes influence the CRC tumorigenesis (Iacob et al., 2017) were actively under study. For examples, researchers have recently learned that the gut microbiota plays a regulatory role in the tumor microenvironment and thus in tissue carcinogenesis (Sohn et al., 2015; Nagy-Szakal et al., 2017; Morgillo et al., 2018). Guo et al. also found that the microbiota structure and microbial metabolites can affect the body's susceptibility to CRC by directly inducing pathological conditions, such as adenoma (Guo et al., 2015). However, to further understand such interactions, it is essential to characterize and compare the gut microbiota structure of healthy controls and cancer patients. And based on that, specific microbiota patterns or strain types need to be identified to provide new targets and strategies for cancer prevention and treatment (Hu et al., 2017, 2018; Zhao et al., 2018a,b,c). Therefore, in this paper, we aim to determine the microbes that are associated with CRC using a large-scale metagenomic data set.

While the metagenomics research has provided enormous scientific data for investigating the role of the gut microbiota in the context of cancer development and progression (Zhang et al., 2014), appropriate bioinformatics and statistical analyses are also required to accurately identifying the differential microbes. Several algorithms using either parametric or non-parametric tests have been proposed to determine such species. For examples, Abusleme et al. (2013) combined the Kruskal-Wallis test with the Wilcoxon rank-sum test to analyze periodontitis data and used linear discriminant analysis to identify the species with significant differences between periodontitis patients and healthy controls. Nagy-Szakal et al. used the non-parametric Mann-Whitney U test with Benjamini-Hochberg correction to show that the microbial composition in the intestines of patients with chronic fatigue syndrome differed significantly from that of healthy individuals (Nagy-Szakal et al., 2017). And Peng et al. conducted beta regression on the abundance of microbes to obtain regression coefficients (Peng et al., 2016).

One particular difficulty associated with the statistical testing of differential abundance is the under-sampling or dropout (Hughes et al., 2001) of less abundant microbes caused by an insufficient sequencing depth. This fact creates many zeros in the abundance values and leads to inaccurate differential analysis when only conventional normalization was applied. This issue might be mitigated with the Zero-inflated Negative Binomial modeling (ZINB) (Ridout et al., 1998). The method is now widely adopted. For examples, Paulson et al. analyzed the differential abundance in sparse high-throughput large-scale microbial marker gene survey data by using a zero-inflated Gaussian distribution mixture model with cumulative-sum scaling normalization (Paulson et al., 2013). Zhang et al. (2016) identified differentially abundant taxa between two or more populations by using a ZINB regression method and estimated the model parameters by Expectation Maximization algorithm. Chen et al. proposed a zero-inflated Beta regression model which included two parts: a logistic regression component and a Beta regression component, for testing the association between microbial abundance and clinical covariates for longitudinal microbiome data (Chen and Li, 2016). Chen Jun et al. in 2017, proposed a robust and powerful framework of differential analysis of microbiome data based on a zero-inflated negative binomial (ZINB) regression model (Chen et al., 2017). They also proposed an omnibus test of all the parameters. Omnibus test was compared with previous methods [edgeR (Robinson et al., 2010), RAIDA (Sohn et al., 2015), DESeq2 (Love et al., 2014), and metagenomeSeq (Paulson et al., 2013)] by using simulated data. RAIDA had slightly worse FDR control at a high nominal level than omnibus test, but better FDR control than other methods. The performance of RAIDA was close to that of the omnibus test, and were higher than one of other methods. RAIDA is more effective at controlling FPR than other method including the omnibus test.

In this study, we identified the differentially abundant gut microbes between CRC and healthy samples using the Ratio Approach for Identifying Differential Abundance (RAIDA) algorithm (Sohn et al., 2015). The algorithm fitted the distribution of observed data with a modified zero-inflated lognormal (ZIL) model and estimated the statistical significance of abundance difference by the T-test. Furthermore, we used the GRAMMy algorithm (Xia et al., 2011) to estimate and analyze the relative abundance of gut microbes and diversity of the microbial communities. Finally, we constructed and analyzed a microbial association network based on all healthy, small adenoma, large adenoma, and CRC samples.

Materials and Methods

Two Metagenomics Datasets

Our first gut metagenomics dataset was downloaded from the European Nucleotide Archive (ENA) database (accession number ERP005534) (Table 1). The dataset (Zeller et al., 2014) consists of 156 samples from France (61 healthy, 27 small adenoma, 15 large adenoma, and 53 CRC samples). Samples with an adenoma diameter smaller than 10 mm were classified as small adenoma while those with larger than 10 mm ones were classified as large adenoma.

Table 1

Total number of samples	Healthy control	Adenoma		Colorectal cancer
		Small (<1 cm)	Large (>1 cm)	Early stage		Late stage
				I	II	III	IV
156	61	27	15	15	7	10	21

Number of experimental samples.

Our second gut metagenomics dataset was also downloaded from the ENA database (accession number ERP008729) (Zeller et al., 2014). The dataset included 156 samples from Austria, including 63 healthy samples, 47 adenoma patient samples, and 46 CRC patient samples.

A Modified ZIL Model

We estimated the relative abundance of gut microbes using the GRAMMy algorithm. We then identified differentially abundant microbes by the RAIDA algorithm which uses a modified ZIL model to account for ratios with zeros. Metagenomic data are typically sparse because of undersampling of the microbial community or insufficient sequencing depth. The resulting abundance table is over-presented with zeros assumed that most of those zeros is a result of insufficient sequencing depth, i.e., the under-sampling of the microbial community. Based on the assumption that most microbes are not differentially abundant, the RAIDA algorithm was systematically demonstrated to consistently identify differentially abundant microbes. We adapted the RAIDA model for our statistical analysis as follows.

Let γ_ij denote the observed count for microbes i and sample j, and let r_ij denote the ratio of γ_ij to γ_kj, where k represents the microbe (or a set of microbes) used as a divisor and γ_kj > 0 for all j. Here, i = 1, 2, …, n and j = 1, 2, …, m. The abundance ratio computed this way is denoted as such that:

In this study, we used ε = min(r_ij|r_ij > 0) for all i and j. The parameters θ_i = (α_i, μ_i, σ_i) were estimated by the following expectation-maximization (EM) algorithm. Given that a ratio R follows a lognormal distribution, thus:

in which, by definition, Y = logR is normally distributed with mean μ and variance σ². Let , z_ij is an unobservable latent variable that accounts for the probability of zero coming from the false state. Thus, the maximum-likelihood estimate of θ_i for the modified ZIL model, i.e., Equation (1), can be obtained by solving

where ϕ is the probability density function of a normal distribution.

Diversity Analysis

To analyze microbial diversity, alpha diversity was used to measure the differences in gut microbial structure in the following three stages: healthy, adenoma (small and large combined), and cancer. We used the Shannon diversity index to measure the alpha diversity of the gut community. The Shannon index is defined as

where H represents the Shannon Index, N indicates the total number of microbial species detected, and a_j indicates the relative abundance of the j th microorganism.

Results and Discussion

Alpha Diversity of Gut Microbiota Predicts Colorectal Cancer Status

We computed the alpha diversity of gut microbes of the healthy samples, adenoma samples and CRC samples using the Shannon index and compared them with the rank-sum Dunn test (Figure 1). We found that the alpha diversity was significantly lower in the CRC samples as compared to the healthy samples (two tailed, Dunn test, P < 0.0001) and adenoma samples (two tailed, Dunn test, P = 0.0021). However, the alpha diversity of the healthy and adenoma samples was not significantly different (two tailed, Dunn test, P = 0.0571). To study the relationship between the probability of cancer occurrence and the alpha diversity, we performed logit regression to associate CRC status with the Shannon index. The regression results showed that the Shannon index is a significant predictor of CRC status (univariate logistic model, P < 0.05). The fitted logistic regression model was as follows:

i.e., logit(P) = −4.563d + 17.546, where P is the probability of being CRC, and d is the Shannon diversity index. We provided the plot of the relationship of probability of cancer occurrence and Shannon index of adenoma patients as show in Figure S1. Our result suggested that the diversity of the microbial species in the human intestines decreases as colorectal malignancies grow, which was supported by literature (Ahn et al., 2013).

Figure 1

Nine Genera Were Differentially Abundant in the Colorectal Cancer Gut Environment

Using the RAIDA algorithm, we identified nine microbial genera that were significantly different in abundance between the CRC and the controls, which included Anaerostipes, Coprococcus, Pseudoflavonifractor, Bilophila, Flavonifractor, Desulfovibrio, Catenibacterium, Porphyromonas, and Weissella (Figure 2A). We first observed that the abundance of Coprococcus was higher in the healthy samples as compared to the CRC patients. As a validation, Shen et al. showed that colorectal adenomas had lower relative abundance of Bacteroides spp. and Coprococcus spp. than controls (Shen et al., 2010). The metabolic activity of butyrate-producing bacteria is the major source of butyrate in human body. Coprococcus is among the essential butyrate-producing genera in human body, which promote colonic health by mediating anti-inflammatory and antitumor effects, as well as providing energy for colonocytes (Singh et al., 2014).

Figure 2

Also notable in our result were the genera Fusobacterium (Fusobacteriaceae) and Porphyromonas (Porphyromonadaceae), which were shown highly enriched in the CRC patients. So was the species Bibliophile wadsworthia. Those sulfidogenic bacteria, including Desulfovibrio, Fusobacterium, and Bilophila wadsworthia, likely participate in the development of CRC by producing hydrogen sulfide (Ridlon et al., 2016; Dahmus et al., 2018). Bilophila wadsworthia was additionally reported to cause systemic inflammatory response in a preclinical mice study (Zhou et al., 2017).

Interestingly, we also observed that the abundance of Eubacterium hallii, Anaerostipes hadrus, and Eubacterium ventriosum (Figure 2B) were significantly higher in the healthy samples than in the CRC samples. E. hallii and A. hadrus can utilize the glucose and fermentation intermediates acetate and lactate to form butyrate and hydrogen, which were considered important microbes in maintaining intestinal metabolic balance (Christina et al., 2016).

We also found that Flavonifractor was higher in the healthy samples than that in the CRC samples, which was in agreement with Anand et al. (2016). We also observed that Anaerostipes had a significantly lower abundance in the CRC samples, which agreed with previous studies (Peters et al., 2016; Mori et al., 2018). We found that no Catenibacterium and Gardnerella (Bifidobacteriaceae) were present in CRC patient samples, which was supported by Chen et al. (2012).

We tested if the nine differentially abundant genera are viable biomarkers to distinguish healthy individuals from CRC patients. We trained a random forest classifier using a 5-fold cross-validation (rotative using 80% data as the training set the rest 20% as the testing set) using the first metagenomic dataset. The classifier achieved an Area Under Curve (AUC) of 0.9333.

Microbial Co-occurrence Network Evolves With CRC Development

Sophie Weiss et al. compared 8 methods of establishing association networks, they recommend filtering out extremely rare OTUs prior to network construction (Weiss et al., 2016). According to Figure 7 in this paper, SparCC should be used when the inverse simpson n_eff of microbes < 13, SparCC maintain high precision compared with predictions on abundance tables with low n_eff. But the inverse simpson n_eff of microbes is 27.9 (>13) in our paper, abundance of OTUs are more than 50% sparse. So we calculated the correlation between species by Pearson correlation coefficient (Pearson, 1909). We further conducted an association network analysis to identify the co-occurring intestinal microbes under different CRC states. All significant co-occurrences (PCC > 0.5) were found to be within the same genera, such as Bifidobacterium, Bacteroides, and Bilophila (Figure 3). Furthermore, both Bifidobacterium and Bacteroides were previously identified by us to have significant differences in abundance between healthy controls and CRC patients (Figure 3A). It is thus reasonable to assess that these bacteria were pathogenic as a group because the change of abundance in one them can result in changes of abundance in the entire clique. Our observation supported the theory that CRC ensues an interrupted balance between these bacteria (Brennan and Garrett, 2016; Yazici et al., 2017).

Figure 3

Co-occurrence was also found among species of the genus Prevotella in the healthy, small adenoma, and large adenoma environments (Figures 3A–C), however, such co-occurrence was missing in the CRC environment (Figure 3D). Conversely, several species of the genera Streptococcus, Parvimonas, Collinsella, and Citrobacter were only co-occurring in the cancer environment. Overall, we observed fewer microbial co-occurrences the healthy environment. While, in the adenoma environments, we found an increase of co-occurring pathogenic microbes. The number of co-occurring microbes was then reduced in the CRC environment. The total number of co-occurrence is relatively close between the healthy and the CRC environment, however, the microbes involved were distinct. The number of total co-occurrence might have peaked at the adenoma environments because of the co-existence of competing homeostatic and pathogenic microbial interactions in the intermediacy stage.

Conclusions

We analyzed the alpha diversity of the gut microbial community of 156 healthy, adenoma and CRC samples. We found the alpha diversity was significantly higher in healthy samples as compared to the CRC samples. We applied a modified ZIL model and identified nine significantly different genera between the healthy and CRC groups, i.e., Anaerostipes, Bilophila, Catenibacterium, Coprococcus, Desulfovibrio, Flavonifractor, Porphyromonas, Pseudoflavonifractor, and Weissella. We used these nine genera as input features for a random forest classifier and successfully predicted the CRC status with a high AUC score of 0.9333. Our results suggested that the community member and the overall structure of the gut microbiota are potential effective biomarkers of CRC stages. This avenue is being actively pursued by us and other computational researchers (Chen and Yan, 2013; Chen et al., 2016b,c, 2018a,b,c; Chen and Huang, 2017), who may bring in novel strategies for preventing and curing CRC in the near future.

Statements

Author contributions

DA and YG conducted the analysis, summarized the result and drafted the manuscript. HP, XL, and GL assisted in the data analysis and contributed to the manuscript. DA and LX conceived the study. LX supervised the manuscript writing. All authors have read and approved the final manuscript.

Funding

This work was supported by grants from the National Natural Science Foundation of China (61873027, 61370131). LX is supported by the Innovation in Cancer Informatics Fund, American Cancer Society (132922-PF-18-184-31301-TBG), the National Institutes of Health (HG006137-07), and funds from the Intermountain Healthcare.

Acknowledgments

Both DA and LX thank Professor Fengzhu Sun at the University of Southern California and LX thanks Dr. Nancy Zhang at the University of Pennsylvania and Dr. Hanlee Ji at Stanford University for their support and helpful discussions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2019.00826/full#supplementary-material

Figure S1

Logit regression prediction results of the Shannon diversity index. The blue circle in the figure represents a large adenoma sample, and the red triangle represents a small adenoma sample.

References

1
AbuslemeL.DupuyA. K.DutzanN.SilvaN.BurlesonJ. A.StrausbaughL. D.et al. (2013). The subgingival microbiome in health and periodontitis and its relationship with community biomass and inflammation. ISME J.7, 1016–1025. 10.1038/ismej.2012.174
2
AhnJ.SinhaR.PeiZ.DominianniC.WuJ.ShiJ.et al. (2013). Human gut microbiome and risk for colorectal cancer. J. Natl. Cancer Inst.105, 1907–1911. 10.1093/jnci/djt300
3
AnandS.KaurH.MandeS. S. (2016). Comparative in silico analysis of butyrate production pathways in gut commensals and pathogens. Front. Microbiol.7:1945. 10.3389/fmicb.2016.01945
4
BrennanC. A.GarrettW. S. (2016). Gut microbiota, inflammation, and colorectal cancer. Annu. Rev. Microbiol.70, 395–411. 10.1146/annurev-micro-102215-095513
5
ChenE. Z.LiH. (2016). A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics32, 2611–2617. 10.1093/bioinformatics/btw308
6
ChenJ.KingE.DeekR.WeiZ.YuY.GrillD.et al. (2017). An omnibus test for differential distribution analysis of microbiome sequencing data. Bioinformatics34, 643–651. 10.1093/bioinformatics/btx650
7
ChenW.LiuF.LingZ.TongX.XiangC. (2012). Human intestinal lumen and mucosa-associated microbiota in patients with colorectal cancer. PLoS ONE7:e39743. 10.1371/journal.pone.0039743
8
ChenX.HuangL. (2017). LRSSLMDA: Laplacian regularized sparse subspace learning for MiRNA-disease association prediction. PLoS Comput. Biol.13:e1005912. 10.1371/journal.pcbi.1005912
9
ChenX.HuangY. A.YouZ. H.YanG. Y.WangX. S. (2016a). A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics33, 733–739. 10.1093/bioinformatics/btw715
10
ChenX.RenB.ChenM.WangQ.ZhangL.YanG. (2016b). NLLSS: predicting synergistic drug combinations based on semi-supervised learning. PLoS Comput. Biol.12:e1004975. 10.1371/journal.pcbi.1004975
11
ChenX.WangL.QuJ.GuanN.-N.LiJ.-Q. (2018a). Predicting miRNA–disease association based on inductive matrix completion. Bioinformatics34, 4256–4265. 10.1093/bioinformatics/bty503
12
ChenX.XieD.WangL.ZhaoQ.YouZ.-H.LiuH. (2018b). BNPMDA: bipartite network projection for MiRNA–disease association prediction. Bioinformatics34, 3178–3186. 10.1093/bioinformatics/bty333
13
ChenX.YanC. C.ZhangX.YouZ.-H. (2016c). Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinformatics18, 558–576. 10.1093/bib/bbw060
14
ChenX.YanG.-Y. (2013). Novel human lncRNA–disease association inference based on lncRNA expression profiles. Bioinformatics29, 2617–2624. 10.1093/bioinformatics/btt426
15
ChenX.YinJ.QuJ.HuangL. (2018c). MDHGI: Matrix Decomposition and Heterogeneous Graph Inference for miRNA-disease association prediction. PLoS Comput. Biol.14:e1006418. 10.1371/journal.pcbi.1006418
16
ChristinaE.Hans-JoachimR.NikoB.ChristopheL.ClarissaS. (2016). The common gut microbe Eubacterium halliialso contributes to intestinal propionate formation. Front. Microbiol.7:713. 10.3389/fmicb.2016.00713
- CrossRef
- Google Scholar
17
DahmusJ. D.KotlerD. L.KastenbergD. M.KistlerC. A. (2018). The gut microbiome and colorectal cancer: a review of bacterial pathogenesis. J. Gastrointest. Oncol.9, 769–777. 10.21037/jgo.2018.04.07
18
GuoH.ShaoY.MengheB.ZhangH. (2015). Research on the relation between gastrointestinal microbiota and disease. Microbiol. China42, 400–410. 10.13344/j.microbiol.china.140474
- CrossRef
- Google Scholar
19
HuH.ZhangL.AiH.ZhangH.FanY.ZhaoQ.et al. (2018). HLPI-ensemble: prediction of human lncRNA-protein interactions based on ensemble strategy. RNA Biol.15, 797–806. 10.1080/15476286.2018.1457935
20
HuH.ZhuC.AiH.ZhangL.ZhaoJ.ZhaoQ.et al. (2017). LPI-ETSLP: lncRNA–protein interaction prediction using eigenvalue transformation-based semi-supervised link prediction. Mol. Biosyst.13, 1781–1787. 10.1039/C7MB00290D
21
HuangY. A.YouZ. H.ChenX.HuangZ. A.ZhangS.YanG. Y. (2017a). Prediction of microbe–disease association from the integration of neighbor and graph with collaborative recommendation model. J. Transl. Med.15:209. 10.1186/s12967-017-1304-7
22
HuangZ. A.ChenX.ZhuZ.LiuH.YanG. Y.YouZ. H.et al. (2017b). PBHMDA: path-based human microbe-disease association prediction. Front. Microbiol.8:233. 10.3389/fmicb.2017.00233
23
HughesJ. B.HellmannJ. J.RickettsT. H.BohannanB. J. (2001). Counting the uncountable: statistical approaches to estimating microbial diversity. Appl. Environ. Microbiol.67, 4399–4406. 10.1128/AEM.67.10.4399-4406.2001
24
IacobT.TătulescuD. F.DumitraşcuD. (2017). Therapy of the postinfectious irritable bowel syndrome: an update. Clujul Med.90, 133–138. 10.15386/cjmed-752
25
KipanyulaM. J.EtetP. F. S.VecchioL.FarahnaM.NukenineE. N.KamdjeA. H. N. (2013). Signaling pathways bridging microbial-triggered inflammation and cancer. Cell. Signal.25, 403–416. 10.1016/j.cellsig.2012.10.014
26
KosticA. D.GeversD.PedamalluC. S.MichaudM.DukeF.EarlA. M.et al. (2012). Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res.22, 292–298. 10.1101/gr.126573.111
27
LiQ.WangC.TangC.LiN.LiJ. (2012). Molecular-phylogenetic characterization of the microbiota in ulcerated and non-ulcerated regions in the patients with Crohn's disease. PLoS ONE7:e34939. 10.1371/journal.pone.0034939
28
LiaoB.WangS.ZhangJ.HongjingY. U. (2015). Role of gut microbiota in human diseases. Chin. J. Gastroenterol.20, 126–128. 10.3969/j.issn.1008-7125.2015.02.015
- CrossRef
- Google Scholar
29
LoveM. I.HuberW.AndersS. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol.15:550. 10.1186/s13059-014-0550-8
30
MorgilloF.DallioM.Della CorteC. M.GravinaA. G.ViscardiG.LoguercioC.et al. (2018). Carcinogenesis as a result of multiple inflammatory and oxidative hits: a comprehensive review from tumor microenvironment to gut microbiota. Neoplasia20, 721–733. 10.1016/j.neo.2018.05.002
31
MoriG.RampelliS.OrenaB. S.RengucciC.De MaioG.BarbieriG.et al. (2018). Shifts of faecal microbiota during sporadic colorectal carcinogenesis. Sci. Rep.8:10329. 10.1038/s41598-018-28671-9
32
Nagy-SzakalD.WilliamsB. L.MishraN.CheX.LeeB.BatemanL.et al. (2017). Fecal metagenomic profiles in subgroups of patients with myalgic encephalomyelitis/chronic fatigue syndrome. Microbiome5:44. 10.1186/s40168-017-0261-y
33
PaulsonJ. N.StineO. C.BravoH. C.PopM. (2013). Differential abundance analysis for microbial marker-gene surveys. Nat. Methods10:1200. 10.1038/nmeth.2658
34
PearsonK. (1909). Determination of the coefficient of correlation. Science30, 23–25. 10.1126/science.30.757.23
35
PengX.LiG.LiuZ. (2016). Zero-inflated beta regression for differential abundance analysis with metagenomics data. J. Comput. Biol.23, 102–110. 10.1089/cmb.2015.0157
36
PetersB. A.DominianniC.ShapiroJ. A.ChurchT. R.WuJ.MillerG.et al. (2016). The gut microbiota in conventional and serrated precursors of colorectal cancer. Microbiome4:69. 10.1186/s40168-016-0218-6
37
ReaD.CoppolaG.PalmaG.BarbieriA.LucianoA.Del PreteP.et al. (2018). Microbiota effects on cancer: from risks to therapies. Oncotarget9:17915. 10.18632/oncotarget.24681
38
RidlonJ. M.WolfP. G.GaskinsH. R. (2016). Taurocholic acid metabolism by gut microbes and colon cancer. Gut Microbes7, 201–215. 10.1080/19490976.2016.1150414
39
RidoutM.DemétrioC. G.HindeJ. (1998). Models for count data with many zeros, in Proceedings of the XIXth International Biometric Conference: International Biometric Society Invited Papers (Cape Town), 179–192.
- Google Scholar
40
RobinsonM. D.McCarthyD. J.SmythG. K. (2010). edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics26, 139–140. 10.1093/bioinformatics/btp616
41
ShenX. J.RawlsJ. F.RandallT.BurcalL.MpandeC. N.JenkinsN.et al. (2010). Molecular characterization of mucosal adherent bacteria and associations with colorectal adenomas. Gut Microbes1, 138–147. 10.4161/gmic.1.3.12360
42
SinghN.GuravA.SivaprakasamS.BradyE.PadiaR.ShiH.et al. (2014). Activation of Gpr109a, receptor for niacin and the commensal metabolite butyrate, suppresses colonic inflammation and carcinogenesis. Immunity40, 128–139. 10.1016/j.immuni.2013.12.007
43
SohnM. B.DuR.AnL. (2015). A robust approach for identifying differentially abundant features in metagenomic samples. Bioinformatics31, 2269–2275. 10.1093/bioinformatics/btv165
44
SommerF.BäckhedF. (2013). The gut microbiota—masters of host development and physiology. Nat. Rev. Microbiol.11, 227–238. 10.1038/nrmicro2974
45
TringeS. G.Von MeringC.KobayashiA.SalamovA. A.ChenK.ChangH. W.et al. (2005). Comparative metagenomics of microbial communities. Science308, 554–557. 10.1126/science.1107851
46
WeissS.Van TreurenW.LozuponeC.FaustK.FriedmanJ.DengY.et al. (2016). Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J.10, 1669–1681. 10.1038/ismej.2015.235
47
XiaL. C.CramJ. A.ChenT.FuhrmanJ. A.SunF. (2011). Accurate genome relative abundance estimation based on shotgun metagenomic reads. PLoS ONE6:e27992. 10.1371/journal.pone.0027992
48
YaziciC.WolfP. G.KimH.CrossT.-W. L.VermillionK.CarrollT.et al. (2017). Race-dependent association of sulfidogenic bacteria with colorectal cancer. Gut66, 1983–1994. 10.1136/gutjnl-2016-313321
49
ZackularJ. P.RogersM. A.RuffinM. T.SchlossP. D. (2014). The human gut microbiome as a screening tool for colorectal cancer. Cancer Prev. Res. 7, 1112–1121. 10.1158/1940-6207.CAPR-14-0129
50
ZellerG.TapJ.VoigtA. Y.SunagawaS.KultimaJ. R.CosteaP. I.et al. (2014). Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol.10:766. 10.15252/msb.20145645
51
ZhangX.MallickH.YiN. (2016). Zero-inflated negative binomial regression for differential abundance testing in microbiome studies. J. Bioinform. Genom. 2, 1–9. 10.18454/jbg.2016.2.2.1
- CrossRef
- Google Scholar
52
ZhangZ.LiuC. H.ZhaoX. H. (2014). Research advance of human gut microbiome and related diseases. Chin. Bull. Life Sci.26, 768–772. 10.13376/j.cbls/2014108
- CrossRef
- Google Scholar
53
ZhaoQ.LiangD.HuH.RenG.LiuH. (2018a). RWLPAP: random walk for lncRNA-protein associations prediction. Protein Pept. Lett.25, 830–837. 10.2174/0929866525666180905104904
54
ZhaoQ.YuH.MingZ.HuH.RenG.LiuH. (2018b). The Bipartite network projection-recommended algorithm for predicting long non-coding RNA-protein interactions. Mol. Ther.13, 464–471. 10.1016/j.omtn.2018.09.020
55
ZhaoQ.YueZ.HuH.RenG.WenZ.LiuH. (2018c). IRWNRLPI: integrating random walk and neighborhood regularized logistic matrix factorization for lncRNA-protein interaction prediction. Front. Genet.9:239. 10.3389/fgene.2018.00239
56
ZhaoY.WuJ.LiJ. V.ZhouN.-Y.TangH.WangY. (2013). Gut microbiota composition modifies fecal metabolic profiles in mice. J. Proteome Res.12, 2987–2999. 10.1021/pr400263n
57
ZhouF.LongW.HaoB.DingD.MaX.ZhaoL.et al. (2017). A human stool-derived Bilophila wadsworthia strain caused systemic inflammation in specific-pathogen-free mice. Gut Pathog.9:59. 10.1186/s13099-017-0208-7
- CrossRef
- Google Scholar

Summary

Keywords

gut microbiota, colorectal cancer, zero-inflated lognormal model, association network, microbial diversity

Citation

Ai D, Pan H, Li X, Gao Y, Liu G and Xia LC (2019) Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model. Front. Microbiol. 10:826. doi: 10.3389/fmicb.2019.00826

Received

15 January 2019

Accepted

01 April 2019

Published

24 April 2019

Volume

10 - 2019

Edited by

Qi Zhao, Liaoning University, China

Reviewed by

Quan Chen, Icahn School of Medicine at Mount Sinai, United States; Lingling An, University of Arizona, United States

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Dongmei Ai aidongmei@ustb.edu.cnLi C. Xia l.c.xia@stanford.edu

This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Systems Microbiology

ORIGINAL RESEARCH article

Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model

Abstract

Introduction