Microbiome Big-Data Mining and Applications Using Single-Cell Technologies and Metagenomics Approaches Toward Precision Medicine
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
With the development of high-throughput sequencing technologies as well as various bioinformatics analytic tools, microbiome is not a “microbial dark matter” anymore. In this review, we first summarized the current analytical strategies used for big-data mining such as single-cell sequencing and metagenomics. We then provided insights into the integration of these strategies, showing significant advantages in fully describing microbiome from multiple aspects. Moreover, we discussed the correlation between gut microbiome with host organs and diseases, confirming the importance of big-data mining in clinical practices. We finally proposed new ideas about the trend of big-data mining in microbiome using multi-omics approaches and single-cell sequencing. The integration of multi-omics approaches and single-cell sequencing can provide full understanding of microbiome at both macroscopic level and microscopic level, thus contributing to precision medicine.
Strategies for Big-Data Mining
The human gut microbiome has been confirmed to highly correlate with human health and diseases, through influencing human metabolism, nutrition, physiology, and immune function (Hooper and Gordon, 2001; Bäckhed et al., 2005; Manichanh et al., 2012). Hence, the characterization of the human gut microbiome, as well as its correlation with diseases, has fascinated a great number of researchers to explore. However, the human gut microbiome consists of approximately 15,000 to 36,000 species of bacteria (Frank et al., 2007), with the total number of bacterial cells ranging from 1013 to 1014, which is of the same order as the number of human cells (3.0 × 1013) (Sender et al., 2016). The gut microbiome also contains more than 100 times more genes, compared with 25,000 genes in humans (Gill et al., 2006). Considering this big data of the gut microbiome, sequencing would be a promising technology for mining it, rather than the traditional cultural methods. Sequencing is the precondition for obtaining raw genetic materials of the gut microbiome, followed by genetic assembly and taxonomic and functional annotations. Several strategies are currently used for big-data mining in microbial communities from different perspectives as follows (Table 1).
Amplicon sequencing uses specific marker genes of microbes such as 16S ribosomal RNA for bacteria and Internal Transcribed Spacer (ITS) for fungi. This sequencing method mainly answers “who is there” in an uncultured microbial community by assigning reads to reference reads. However, low-resolution level (cannot reach to species or strain level) of amplicon sequencing, as well as its disability in functional annotation, largely limits its application. Therefore, current solution for this problem is to combine the amplicon sequencing and the metagenomic sequencing. Researchers can first use relatively low-cost amplicon sequencing to have a preliminary understanding of the composition of the targeted microbial community, thus determining the hypothesis. Subsequently, they can perform metagenomic sequencing to confirm the hypothesis from a perspective of both phylogeny and functions.
The shotgun metagenomic sequencing process consists of DNA extraction from all cells in a community, DNA fragmentation, DNA sequencing, and sequence analysis such as marker gene analysis, binning, or contig assembly to obtain the taxonomic composition. Metagenomic sequencing not only can shed light on “who is there” at a high resolution to strain level, but also “what are they doing.” The metagenomic reads encoding proteins can be predicted for functional annotation, through various ways including gene fragment recruitment, protein family classification, and de novo gene prediction (Sharpton, 2014). The disadvantages of metagenomics sequencing are as follows. First, there are limitations of short reads produced by next-generation sequencing and the complexity in sequence assembly, especially when multiple strains are present (Sczyrba et al., 2017). For instance, the closely related genomes in a community might represent genome-sized approximate repeats. Second, metagenomic sequencing cannot obtain high genome coverage and might even lose genomes of low abundant microbes, owing to the high genomic richness and evenness in a community (Mende et al., 2016). Third, functional genes of one microbe cannot be fully linked to its phylogeny. There are two solutions for these problems. First, long-read sequencing can solve the ambiguity in sequence assembly (Bertrand et al., 2019). A recent method named OPERA-MS (Bertrand et al., 2019), which combines nanopore-sequenced long reads and Illumina-sequenced short reads through a hybrid metagenomic assembler, succeeds to promote the accuracy of strain-resolved assembly and obtains genomes with higher coverage. The second solution is to combine metagenomics with single-cell sequencing, which can reconstruct how DNA is compartmentalized into cells and link functions to their corresponding species (Tolonen and Xavier, 2017).
The first step of single-cell sequencing is to isolate the individual cells, using serial dilution, microfluidics, flow cytometry, micromanipulation, or encapsulation in droplets (Bäckhed et al., 2005). The following steps include DNA extraction, whole-genome amplification, DNA sequencing, and sequence analysis such as alignment and assembly. Owing to the fact that minimum requirement of high-throughput sequencing is micrograms, which is more than the femtograms of DNA a bacterial cell generally contains, amplification of the minute amounts of DNA of the cell is necessary (Xu and Zhao, 2018). For this purpose, a non–polymerase chain reaction–based DNA amplification method multiple displacement amplification (MDA) (Dean et al., 2002) uses random hexamer primers annealed to the template and a high-fidelity polymerase of the Bacillus subtilis phage phi29 (Blanco et al., 1989). The Phi29 DNA polymerase can work at a moderate isothermal condition, with a high-strand displacement activity and an inherent 3′–5′ proofreading exonuclease activity, thus ensuring enough genome coverage with lower amplification error for the following sequencing analysis.
The major advantage of single-cell sequencing is that it can generate a high-quality genome for species with low abundance, which might be lost by the metagenomic sequencing. Additionally, this method can discriminate and validate the functions of individuals within the community, linking these functions to specific species. Moreover, the single-cell sequencing can simultaneously recover bacterial genomes and extrachromosomal genetic materials in a cell, dissecting virus–host interactions at cell level (Yoon et al., 2011). Single-cell sequencing has already led to many novel findings such as the discovery of bacteria with an alternative genetic code (Campbell et al., 2013), the ability to observe which gut microbial cells use host-derived compounds (Berry et al., 2013), and the ability to quantify the absolute taxon abundances of the gut microbiome (Props et al., 2017).
However, the single-cell sequencing also has limitations as follows. First, cell sorting is a complicated and time-consuming process. Isolating cells from solid medium such as swabs, biopsies, and tissues remains challenging (Tolonen and Xavier, 2017). Second, the amplification step using MDA might magnify the DNA contamination. DNA contamination is mainly from the tainted specimen at the step of cell sorting, polluted reagents or laboratory apparatuses, and microbes in the environment. The solution for the contamination is to keep strictly clean of the work area with extra precaution. In addition, the reaction volume can be moderately reduced to increase the ratio of targeted DNA to the contaminated DNA. Moreover, contaminated DNA can be partly removed by aligning the reads to the reference of potentially contaminated DNA of human and environment. The third limitation is that the MDA procedure would cause highly uneven read coverage and increased formation of chimera reads that links nonadjacent template sequences; thus, conventional genome-assembly algorithms are not suitable for single-cell data. The solution for uneven read coverage is to normalize the reads by trimming the reads according to their k-mer depth, which has been integrated to several assembly algorithms such as SPAdes (Bankevich et al., 2012). The solution for chimera reads is to identify and remove the chimeras. Owing to the lack of reference genome of a certain number of cells, metagenomic sequencing can provide the contigs as reference for identifying chimeras.
The Integration of Single-Cell Genomics and Metagenomics
The metagenomics represents the whole genome of all microbes in the environment, while single-cell genomics refers to the genomes of individuals cells that may or may not contain the full genetic repertoire in the microbiota. Hence, the integration of these two technologies can make up for each other’s shortcomings (Figure 1). For instance, reads and contigs of metagenomics can improve the genome assembly of single-cell genomics (Mende et al., 2016). Conversely, single-cell genomics can serve as scaffolds for comparison or recruitment of metagenomics when reference genomes are unavailable (Swan et al., 2013; Roux et al., 2014). Several studies have generated much-improved microbe genome assemblies from a variety of microbial communities, using the integration of single-cell genomics and metagenomics (Dupont et al., 2012; Nobu et al., 2015). The disadvantage of this integration is that the potential errors of both methods would be gathered, thus requiring more sophisticated methods to deal with.
Figure 1 The integration of single-cell sequencing and metagenomics makes them complement each other. Single-cell sequencing could provide metagenomics with reference scaffolds, while metagenomics could ameliorate the genome assembly of single-cell sequencing.
The Integration of Metagenomics and Three-Dimensional Genomics
Metagenomics can quantify the genetic materials of a microbial community, while the Hi-C sequencing can identify all chromatin interactions of the community, producing three-dimensional (3D) genome, reflecting both the genetic content and topological chromatin structures into digital information (Belaghzal et al., 2017). The integration of metagenomics and 3D genomics can fully display the composition and structure of genomes of a microbial community. Moreover, a recent study performed Hi-C for single-cell analysis, to capture 3D genomes of individual cells (Nagano et al., 2017).
Microbial Multi-Omics Analysis
With advances in high-throughput sequencing technologies and bioinformatics approaches, researchers are now able to perform comprehensive analysis in microbial communities, named as “multi-omics analysis.” This analysis integrates metagenome, metatranscriptome, metaproteome, and metabolome. The metagenome displays the taxonomic composition in a microbial community and predicted functional expression. The metatranscriptome, metaproteome, and metabolome can confirm the predicted functions, further unveiling how microbes work in a community. These omics can provide significant information about a microbial community from different perspectives. For instance, the microbial communities of twins with Crohn disease have been analyzed at phylogenetic, functional, and metabolic levels, using 16S sequencing (Dicksved et al., 2008; Willing et al., 2009; Willing et al., 2010), metagenomics, proteomics (Erickson et al., 2012), and metabolomics (Jansson et al., 2009).The subjects with Crohn disease contain a microbial community with lower microbial diversity, depletion of Faecalibacterium prausnitzii, and lower expression levels of proteins involved in butyrate metabolism (Erickson et al., 2012). At the metabolite level, thousands of metabolites such as the bile acids (BAs) that were detected higher in diseased subjects can distinguish healthy subjects from subjects with Crohn disease (Jansson et al., 2009). Therefore, the integration of these omics is necessary for fully detecting microbial community. In a recent study, researchers succeeded to correlate the process of permafrost thawing with microbial composition and functions, using “multi-omics analysis” (Hultman et al., 2015).
The Connection Between Microbiota and the Human Body
The dietary intake (Wu et al., 2011; Liu et al., 2018) and environmental exposure such as administration of antibiotics (Pérez-Cobas et al., 2012; Raymond et al., 2016) can largely influence human gut microbiota. The gut microbiota would then respond to these factors, producing signals adjusting human distal organs including liver (Khalsa et al., 2017), brain (Dinan and Cryan, 2017), and lung (Budden et al., 2017), as described in Figure 2. Both of microbes’ own structural components and metabolites produced by them can serve as the signal molecules. These signals can affect distal organs metabolism either directly or by signaling through nerves or hormones from the gut (Schroeder and Bäckhed, 2016).
Figure 2 Communications between the gut microbiome and distal organs. Various factors such as environmental exposure and dietary intake can modulate gut microbiota. The change of gut microbiota will bring a certain number of effects on distal organs through signals molecules consisting of their structural components such as lipopolysaccharide (LPS) and their metabolites such as SCFAs.
The gut microbiota was confirmed to adjust liver metabolism (Kim et al., 2007; Khalsa et al., 2017). BAs, for example, derived from cholesterol in the liver, can be modified by microbiota in the distal small intestine and colon (Schroeder and Bäckhed, 2016). Primary BAs will be deconjugated by the ileal gut microbiota after they are secreted into the small intestine, which makes them manage to escape the reabsorption and then be subjected to further chemical modification by colonic microbiota (Midtvedt, 1974; Swann et al., 2011). BAs are capable of activating nuclear receptors such as farnesoid X receptor (FXR) and G-protein–coupled receptors (GPCRs), which are associated with host metabolism (Fiorucci et al., 2009). The activation of FXR can suppress the rate-limiting step in BA synthesis through a gut microbiota–liver feedback loop, thus controlling the BA levels (Kim et al., 2007). Additionally, TGR5, one of GPCRs, predominately recognizes secondary BAs, which is associated with increased thermogenesis in brown adipose tissue (Broeders et al., 2015). The adjustment of the gut microbiota on the liver is important, while the response of liver cells is important as well, which can be described using single-cell sequencing. A recent study used single-cell RNA sequencing on T cells from hepatocellular carcinoma patients to identify 11 T-cell subsets with special molecular and functional properties, thus contributing to the prediction of their clinical responses in liver cancer (Zheng et al., 2017).
The association between the brain and other organs depends on complex pathways consisting of the dual autonomic nervous system and endocrine. The gut–brain axis is defined to encompass afferent and efferent neural, endocrine, and nutrient signals between the central nervous system and the gastrointestinal system (Romijn et al., 2008). Several studies have shown that the gut microbiota influences our brain morphology and stress response and even causes the stroke (Schroeder and Bäckhed, 2016) via the gut–brain axis. As for brain morphology, most studies were performed using mice due to the challenges in humans. Through the comparison between germ-free mice and colonized mice, the gut microbiota has been found to cause alterations in the structural integrity of the amygdala and hippocampus (Luczynski et al., 2016). Germ-free mice displayed increased hippocampal neurogenesis and hypermyelination of the prefrontal cortex (Hoban et al., 2016). Moreover, a more permeable blood–brain barrier (BBB) in germ-free mice suggests that the gut microbiota is also capable of modulating the BBB (Braniste et al., 2014). In respect to stress response, Bifidobacterium longum was observed to activate the vagus nerve to reduce anxiety-like behavior independently of brain-derived neurotrophic factor (Bercik et al., 2011). Moreover, different community members may have distinct influences on the stress response. For instance, when young germ-free mice with originally elevated stress response were colonized with Bifidobacterium infantis at an early developing stage, the stress response was then diminished. But when they were colonized with enteropathogenic Escherichia coli, their stress responses were observed to aggravate (Sudo et al., 2004). As to the stroke, 87% are ischemic and caused by interruption of the blood supply to the brain. A study displayed that ischemic brain injury in mice can be reduced by antibiotic-induced alterations in the gut microbiota (Benakis et al., 2016), which provided us with a potential therapeutic method in the future. The characterization of brain cells is important for researchers to further explore the gut–brain axis. Recently, a study performed single-cell sequencing, integrated with multi-omics on the human brain, providing new insights into complex processes in the brain (Lake et al., 2018).
The conception of the gut–lung axis has emerged these years, which still needs more investigations to excavate mechanisms. First, dietary intake can shape both the gut microbiota and the airway microbiota (Marsland et al., 2015). On the one hand, dietary fiber intake leads to an increased level of short-chain fatty acids (SCFAs), which is associated with shifts in both gut microbiota and airway microbiota (Trompette et al., 2014). On the other hand, a high-fat diet has been confirmed to correlate with compositional changes in intestinal microbiota and elevated allergic airway inflammation (Myles et al., 2013). Second, the gut–lung axis contains several interactions among microbiota, metabolites, immune cells, and the lung. Bacterial metabolites such as SCFAs, with the ability to reach other organs via the bloodstream, are able to exert their anti-inflammatory properties. Additionally, the microbial seeding from the intestinal microbiota into the airways makes these bacteria able to act on local immune cells to shape their responses (Marsland et al., 2015). Moreover, migrating immune cells are capable of acquiring information directly from microbiota and the concomitant local cytokine response to adjust inflammatory response, which shapes immune responses at distal sites such as the lung (Trompette et al., 2014; Budden et al., 2017). Scientists have correlated allergic asthma, one of the lung diseases, with the gut microbiota. A study displayed that a fecal transplant from a child at risk of asthma into germ-free mice resulted in severe lung inflammation after challenge with ovalbumin (Arrieta et al., 2015). Moreover, another study showed that the impacts by recurrent antibiotic treatment on the diversity of the microbiota early in life (Fouhy et al., 2012) have been confirmed to strongly correlate with the development of an asthmatic phenotype later in life (Fanaro et al., 2003). There are still a certain number of unknown mechanisms in the gut–lung axis, which provides us with a lot of potential therapeutic methods against lung diseases.
Microbiota and Clinical Medicine
The intestine is a critical organ in the human’s body, whose functions involve the uptake of nutrients and water. The intestinal barrier (Figure 3), as the essential barrier of the intestine, prevents the transfer of harmful substances and pathogens. Pathogenic bacteria may cause the disruption of this barrier resulting in increased intestinal permeability. Enteropathogenic E. coli (EPEC), for instance, causes a loss of enterocyte microvilli and the formation of a raised pedestal structure for firm bacterial attachment (Lapointe et al., 2009). In addition, enterohemorrhagic E. coli also possesses an attaching and effacement locus but with less profound effects on the barrier (Kaper and Nataro, 2004). Moreover, enteroaggregative E. coli and enterotoxigenic E. coli can cause diarrhea through effects on chloride secretion in the intestinal epithelium (Dubreuil, 2012). The single-cell sequencing helps to identify the pathogenic microbes at the intestinal lumen. The main antibody isotype named immunoglobulin A (IgA), which is produced at mucosal surfaces, can bind those pathogenic microbes in the intestinal lumen. The cell sorting then uses a fluorescent anti-IgA antibody, followed by 16S rDNA sequencing to identify the isolated pathogenic microbes (Palm et al., 2014). Furthermore, metagenomic sequencing can also be performed on these isolated microbes to identify the basis of immunogenic differences between and within microbes. Similarly, the elevated IgG coating of gut bacteria has also been observed in patients with sepsis and Crohn disease system (Zeng et al., 2016). Therefore, the single-cell sequencing is a promising method to correlate microbes with host immune response for precision medicine (Tolonen and Xavier, 2017).
Figure 3 Intestinal barrier and affecting factors. The intestinal barrier, as an essential barrier against harmful pathogens and substances in the intestine, mainly consists of the mucus layer, the epithelial layer, and the underlying lamina propria. The intestinal lumen contains antimicrobial peptides (AMPs), secreted IgA, and commensal bacteria, which prevent the colonization of pathogens. A mucus layer covers the intestinal surfaces as a physical barrier. The epithelium is composed of a single layer of cells sealed by tight junction proteins such as occludin and claudin inhibiting paracellular passage. M cells and intraepithelial lymphocytes are also contained in this layer. The lamina propria harbors lots of immune cells. Factors including food allergens, lipopolysaccharides (LPS), and pathogenic bacteria such as EPEC effect on the intestinal barrier function.
The risk of thrombosis has been observed to be correlated with the plasma levels of trimethylamine (TMA)–N-oxide (TMAO) in humans (Zhu et al., 2016). Especially, the gut microbiome is critically involved in the generation of TMAO (Tang et al., 2013). The gut microbiome can process certain dietary nutrients such as phosphatidylcholine, choline, and carnitine specifically to procedure TMA, which is absorbed in the gut and converted in the liver to TMAO by hepatic flavin-containing monooxygenases (Tilg, 2016). In humans, foods such as meat and eggs have been associated with an increased risk of major cardiovascular events in patients with proven coronary heart disease (Tang et al., 2013). In addition, administration of antibiotics can markedly reduce the plasma levels of TMAO.
Hepatitis B Virus
Hepatitis B virus (HBV), as one of the most common infectious agents worldwide, has been associated with the gut microbiome (Chou et al., 2015). Scientists have found that viral clearance heavily depends on the age of exposure. According to the control experiments of adult and young mice, the results showed an immune-tolerating pathway to HBV that prevailed in young mice with immature gut microbiota. After the establishment of gut bacteria, the mature gut microbiota in adult mice stimulated liver immunity, resulting in rapid HBV clearance (Chou et al., 2015). Therefore, full understanding of the interaction of virus–host may help us with the therapy for HBV. The single-cell sequencing can serve as a powerful method to explore the virus–host interaction (Labonte et al., 2015).
Depressive episodes correlate with dysregulation of the hypothalamic–pituitary–adrenal (HPA) axis (Barden, 2004) and resolution of depressive systems with normalization of the HPA axis (Heuser et al., 1996; Nickel et al., 2003). The gut microbiota has been confirmed to play a part in both the programming of the HPA axis early in life and stress reactivity over the life span (Foster and Neufeld, 2013). The stress response system is functionally immature at birth and then develops throughout the postnatal period, which coincides with the intestinal bacterial colonization. Stress can increase intestinal permeability, providing bacteria with an opportunity to translocate across the intestinal mucosa and directly access both immune cells and neuronal cells of the enteric nervous system (Gareau et al., 2008; Teitelbaum et al., 2008).
The gut microbiota has been recently observed to be associated with human immunodeficiency virus (HIV) disease progression (Vujkovic-Cvijin et al., 2013). Scientists identified a dysbiotic mucosal-adherent community enriched in Proteobacteria and depleted of Bacteroidia members that were associated with markers of mucosal immune disruption, T-cell activation, and chronic inflammation in HIV-infected subjects. This dysbiotic community was evident among HIV-infected subjects undergoing highly active antiretroviral therapy (Vujkovic-Cvijin et al., 2013). Furthermore, the extent of dysbiosis correlated with two established markers of disease progression including the activity of the kynurenine pathway of tryptophan catabolism and plasma concentrations of the inflammatory cytokine interleukin 6 (Vujkovic-Cvijin et al., 2013). Hence, a link between mucosal-adherent colonic bacteria and immunopathogenesis during progressive HIV infection deserves better investigations.
Gut microbes have been reported to be correlated with a certain number of cancers related to human stomach (Helicobacter pylori), liver (Opisthorchis viverrini, Clonorchis sinensis), and bladder (Schistosoma haematobium) (Bhatt et al., 2017). H. pylori infections, for instance, can lead to gastritis and gastric ulcers (Marshall et al., 1984), which is considered as the precursor of gastric cancer. Nevertheless, H. pylori was also observed to protect against esophageal adenocarcinoma, by influencing stomach pH and ameliorating acid reflux (Vaezi et al., 2000). Hence, owing to the participation of microbes in multiple biological processes, the oncogenicity of microbes should be discussed and determined by multi-omics approaches.
The Trend of Big-Data Mining for Microbiome
In the past, owing to limitations in abilities to obtain and process microbial big data, scientists were not able to obtain a full understanding of the microbiota. Neither the sequencing technologies nor the analysis tools can meet the high dimensional complicacy of the intestinal microbiota. Nowadays, the high-throughput sequencing technologies, such as MDA (Dean et al., 2002) for single-cell sequencing, and numerous statistical analysis tools, such as QIIME for 16S sequencing data (Caporaso et al., 2010) and MetaPhlAn (Segata et al., 2012) for metagenomics data, make it possible to unveil the microbiota from various perspectives. The integration of the current sequencing methods would be necessary to conduct a comprehensive study on microbiota in the future. First, the taxonomic information at various levels can be obtained by amplicon sequencing and metagenomic sequencing. Second, the functional annotation can be predicted by metagenomics and confirmed by the multi-omics including metagenome, metatranscriptome, metaproteome, and metabolome. Third, the connection between functions and phylogeny of a single microbe cell can be established by single-cell sequencing. Finally, the interactions between all chromosomes can be detected by Hi-C sequencing. The integration of these methods can answer the questions “who is there,” “what are they doing,” and “how are they doing” from a macroscopic level of overall microbial composition and microscopic level of single microbe cell and even the single chromosome. The comprehensive analysis of big data, followed by strict in vivo and in vitro experiments, is required to determine the causality of clinical diseases by microbes for specific medicine. Moreover, a standard pipeline for the integration of these methods proposed in the future can produce a huge amount of data sets. The big-data sets across continents provide the spatial characteristics, and the big-data sets in the long-term investigations provide the characteristics at time scale.
KN conceived the review framework. MC and LC conducted the literature review. MC made the figure illustration. MC and LC wrote the manuscript. KN reviewed and revised the manuscript.
This work was partially supported by the National Key R&D Program of China (grant 2018YFC0910502) and the National Natural Science Foundation of China (grants 61103167, 31271410, and 31671374).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The handling editor declared a past co-authorship with one of the authors KN.
The authors would like to thank Pengshuo Yang, PhD, Maozhen HAN, PhD, Chaoyun Chen, PhD, Chaofang Zhong, PhD, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, for their discussions about this work.
Arrieta, M. C., Stiemsma, L. T., Dimitriu, P. A., Thorson, L., Russell, S., Yurist-Doutsch, S., et al. (2015). Early infancy microbial and metabolic alterations affect risk of childhood asthma. Sci. Transl. Med. 7 (307), 307ra152–307ra152. doi: 10.1126/scitranslmed.aab2271
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., et al. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19 (5), 455–477. doi: 10.1089/cmb.2012.0021
Belaghzal, H., Dekker, J., Gibcus, J. H. (2017). Hi-C 2.0: an optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. Methods 123, 56–65. doi: 10.1016/j.ymeth.2017.04.004
Benakis, C., Brea, D., Caballero, S., Faraco, G., Moore, J., Murphy, M., et al. (2016). Commensal microbiota affects ischemic stroke outcome by regulating intestinal γδ T cells. Nat. Med. 22 (5), 516–523. doi: 10.1038/nm.4068
Bercik, P., Park, A., Sinclair, D., Khoshdel, A., Lu, J., Huang, X., et al. (2011). The anxiolytic effect of Bifidobacterium longum NCC3001 involves vagal pathways for gut–brain communication. Neurogastroenterol. Motil. 23 (12), 1132–1139. doi: 10.1111/j.1365-2982.2011.01796.x
Berry, D., Stecher, B., Schintlmeister, A., Reichert, J., Brugiroux, S., Wild, B., et al. (2013). Host-compound foraging by intestinal microbiota revealed by single-cell stable isotope probing. Proc. Natl. Acad. Sci. U.S.A. 110 (12), 4720–4725. doi: 10.1073/pnas.1219247110
Bertrand, D., Shaw, J., Kalathiyappan, M., Ng, A. H. Q., Kumar, M. S., Li, C., et al. (2019). Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat. Biotechnol. 37 (8), 937–944. doi: 10.1038/s41587-019-0191-2
Blanco, L., Bernad, A., Lazaro, J. M., Martin, G., Garmendia, C., Salas, M. (1989). Highly efficient DNA synthesis by the phage phi 29 DNA polymerase. Symmetrical mode of DNA replication. J. Biol. Chem. 264 (15), 8935–8940.
Braniste, V., Al-Asmakh, M., Kowal, C., Anuar, F., Abbaspour, A., Tóth, M., et al. (2014). The gut microbiota influences blood–brain barrier permeability in mice. Sci. Transl. Med. 6 (263), 263ra158–263ra158. doi: 10.1126/scitranslmed.3009759
Broeders, E. P., Nascimento, E. B., Havekes, B., Brans, B., Roumans, K. H., Tailleux, A., et al. (2015). The bile acid chenodeoxycholic acid increases human brown adipose tissue activity. Cell. Metab. 22 (3), 418–426. doi: 10.1016/j.cmet.2015.07.002
Budden, K. F., Gellatly, S. L., Wood, D. L., Cooper, M. A., Morrison, M., Hugenholtz, P., et al. (2017). Emerging pathogenic links between microbiota and the gut–lung axis. Nat. Rev. Microbiol. 15 (1), 55–63. doi: 10.1038/nrmicro.2016.142
Campbell, J. H., O’Donoghue, P., Campbell, A. G., Schwientek, P., Sczyrba, A., Woyke, T., et al. (2013). UGA is an additional glycine codon in uncultured SR1 bacteria from the human microbiota. Proc. Natl. Acad. Sci. U.S.A. 110 (14), 5540–5545. doi: 10.1073/pnas.1303090110
Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., et al. (2010). QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7 (5), 335–336. doi: 10.1038/nmeth.f.303
Chou, H.-H., Chien, W.-H., Wu, L.-L., Cheng, C.-H., Chung, C.-H., Horng, J.-H., et al. (2015). Age-related immune clearance of hepatitis B virus infection requires the establishment of gut microbiota. Proc. Natl. Acad. Sci. U.S.A. 112 (7), 2175–2180. doi: 10.1073/pnas.1424775112
Dean, F. B., Hosono, S., Fang, L., Wu, X., Faruqi, A. F., Bray-Ward, P., et al. (2002). Comprehensive human genome amplification using multiple displacement amplification. Proc. Natl. Acad. Sci. U.S.A. 99 (8), 5261–5266. doi: 10.1073/pnas.082089499
Dicksved, J., Halfvarson, J., Rosenquist, M., Jarnerot, G., Tysk, C., Apajalahti, J., et al. (2008). Molecular analysis of the gut microbiota of identical twins with Crohn’s disease. ISME J. 2 (7), 716–727. doi: 10.1038/ismej.2008.37
Dupont, C. L., Rusch, D. B., Yooseph, S., Lombardo, M. J., Richter, R. A., Valas, R., et al. (2012). Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage. ISME J. 6 (6), 1186–1199. doi: 10.1038/ismej.2011.189
Erickson, A. R., Cantarel, B. L., Lamendella, R., Darzi, Y., Mongodin, E. F., Pan, C., et al. (2012). Integrated metagenomics/metaproteomics reveals human host–microbiota signatures of Crohn’s disease. PLoS One 7 (11), e49138. doi: 10.1371/journal.pone.0049138
Fanaro, S., Chierici, R., Guerrini, P., Vigi, V. (2003). Intestinal microflora in early infancy: composition and development. Acta Paediatr. Suppl. 91 (s441), 48–55. doi: 10.1111/j.1651-2227.2003.tb00646.x
Fiorucci, S., Mencarelli, A., Palladino, G., Cipriani, S. (2009). Bile-acid–activated receptors: targeting TGR5 and farnesoid-X-receptor in lipid and glucose disorders. Trends Pharmacol. Sci. 30 (11), 570–580. doi: 10.1016/j.tips.2009.08.001
Fouhy, F., Guinane, C. M., Hussey, S., Wall, R., Ryan, C. A., Dempsey, E. M., et al. (2012). High-throughput sequencing reveals the incomplete, short-term recovery of infant gut microbiota following parenteral antibiotic treatment with ampicillin and gentamicin. Antimicrob. Agents Chemother. 56 (11), 5811–5820. doi: 10.1128/AAC.00789-12
Frank, D. N., Amand, A. L. S., Feldman, R. A., Boedeker, E. C., Harpaz, N., Pace, N. R. (2007). Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc. Natl. Acad. Sci. U.S.A. 104 (34), 13780–13785. doi: 10.1073/pnas.0706625104
Gill, S. R., Pop, M., DeBoy, R. T., Eckburg, P. B., Turnbaugh, P. J., Samuel, B. S., et al. (2006). Metagenomic analysis of the human distal gut microbiome. Science 312 (5778), 1355–1359. doi: 10.1126/science.1124234
Heuser, I. J., Schweiger, U., Gotthardt, U., Schmider, J. (1996). Pituitary–adrenal-system regulation and psychopathology during amitriptyline treatment in elderly depressed patients and normal comparison subjects. Am. J. Psychiatry. 153 (1), 93. doi: 10.1176/ajp.153.1.93
Hoban, A., Stilling, R., Ryan, F., Shanahan, F., Dinan, T., Claesson, M., et al. (2016). Regulation of prefrontal cortex myelination by the microbiota. Transl. Psychiatry 6 (4), e774. doi: 10.1038/tp.2016.42
Hultman, J., Waldrop, M. P., Mackelprang, R., David, M. M., McFarland, J., Blazewicz, S. J., et al. (2015). Multi-omics of permafrost, active layer and thermokarst bog soil microbiomes. Nature 521 (7551), 208–212. doi: 10.1038/nature14238
Jansson, J., Willing, B., Lucio, M., Fekete, A., Dicksved, J., Halfvarson, J., et al. (2009). Metabolomics reveals metabolic biomarkers of Crohn’s disease. PLoS One 4 (7), e6386. doi: 10.1371/journal.pone.0006386
Khalsa, J., Duffy, L. C., Riscuta, G., Starke-Reed, P., Hubbard, V. S. (2017). Omics for understanding the gut–liver–microbiome axis and precision medicine. Clin. Pharmacol. Drug. Dev. 6 (2), 176–185. doi: 10.1002/cpdd.310
Kim, I., Ahn, S.-H., Inagaki, T., Choi, M., Ito, S., Guo, G. L., et al. (2007). Differential regulation of bile acid homeostasis by the farnesoid X receptor in liver and intestine. J. Lipid. Res. 48 (12), 2664–2672. doi: 10.1194/jlr.M700330-JLR200
Labonte, J. M., Swan, B. K., Poulos, B., Luo, H., Koren, S., Hallam, S. J., et al. (2015). Single-cell genomics–based analysis of virus–host interactions in marine surface bacterioplankton. ISME J. 9 (11), 2386–2399. doi: 10.1038/ismej.2015.48
Lake, B. B., Chen, S., Sos, B. C., Fan, J., Kaeser, G. E., Yung, Y. C., et al. (2018). Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 36 (1), 70–80. doi: 10.1038/nbt.4038
Lapointe, T. K., O’connor, P. M., Buret, A. G. (2009). The role of epithelial malfunction in the pathogenesis of enteropathogenic E. coli–induced diarrhea. Lab. Invest. 89 (9), 964. doi: 10.1038/labinvest.2009.69
Liu, H., Han, M., Li, S. C., Tan, G., Sun, S., Hu, Z., et al. (2018). Resilience of human gut microbial communities for the long stay with multiple dietary shifts. Gut, 1–2. doi: 10.1136/gutjnl-2018-317298
Luczynski, P., Whelan, S. O., O’sullivan, C., Clarke, G., Shanahan, F., Dinan, T. G., et al. (2016). Adult microbiota-deficient mice have distinct dendritic morphological changes: differential effects in the amygdala and hippocampus. Eur. J. Neurosci. 44 (9), 2654–2666. doi: 10.1111/ejn.13291
Mende, D. R., Aylward, F. O., Eppley, J. M., Nielsen, T. N., DeLong, E. F. (2016). Improved environmental genomes via integration of metagenomic and single-cell assemblies. Front. Microbiol. 7, 143. doi: 10.3389/fmicb.2016.00143
Myles, I. A., Fontecilla, N. M., Janelsins, B. M., Vithayathil, P. J., Segre, J. A., Datta, S. K. (2013). Parental dietary fat intake alters offspring microbiome and immunity. J. Immunol. 191 (6), 3200–3209. doi: 10.4049/jimmunol.1301057
Nagano, T., Wingett, S. W., Fraser, P. (2017). Capturing Three-Dimensional Genome Organization in Individual Cells by Single-Cell Hi-C. Methods. Mol. Biol. 1654, 79–97. doi: 10.1007/978-1-4939-7231-9_6
Nickel, T., Sonntag, A., Schill, J., Zobel, A. W., Ackl, N., Brunnauer, A., et al. (2003). Clinical and neurobiological effects of tianeptine and paroxetine in major depression. J. Clin. Psychopharmacol. 23 (2), 155–168. doi: 10.1097/00004714-200304000-00008
Nobu, M. K., Narihiro, T., Rinke, C., Kamagata, Y., Tringe, S. G., Woyke, T., et al. (2015). Microbial dark matter ecogenomics reveals complex synergistic networks in a methanogenic bioreactor. ISME J. 9 (8), 1710–1722. doi: 10.1038/ismej.2014.256
Palm, N. W., de Zoete, M. R., Cullen, T. W., Barry, N. A., Stefanowski, J., Hao, L., et al. (2014). Immunoglobulin A coating identifies colitogenic bacteria in inflammatory bowel disease. Cell 158 (5), 1000–1010. doi: 10.1016/j.cell.2014.08.006
Pérez-Cobas, A. E., Gosalbes, M. J., Friedrichs, A., Knecht, H., Artacho, A., Eismann, K., et al. (2012). Gut microbiota disturbance during antibiotic therapy: a multi-omic approach. Gut 62, 1591–1601. doi: 10.1136/gutjnl-2012-303184
Props, R., Kerckhof, F.-M., Rubbens, P., De Vrieze, J., Sanabria, E. H., Waegeman, W., et al. (2017). Absolute quantification of microbial taxon abundances. ISME J. 11 (2), 584. doi: 10.1038/ismej.2016.117
Raymond, F., Ouameur, A. A., Déraspe, M., Iqbal, N., Gingras, H., Dridi, B., et al. (2016). The initial state of the human gut microbiome determines its reshaping by antibiotics. ISME J. 10 (3), 707–720. doi: 10.1038/ismej.2015.148
Roux, S., Hawley, A. K., Torres Beltran, M., Scofield, M., Schwientek, P., Stepanauskas, R., et al. (2014). Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell- and meta-genomics. Elife 3, e03125. doi: 10.7554/eLife.03125
Sczyrba, A., Hofmann, P., Belmann, P., Koslicki, D., Janssen, S., Droge, J., et al. (2017). Critical assessment of metagenome interpretation—a benchmark of metagenomics software. Nat. Methods 14 (11), 1063–1071. doi: 10.1038/nmeth.4458
Segata, N., Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., Huttenhower, C. (2012). Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Med. 9 (8), 811–814. doi: 10.1038/nmeth.2066
Sudo, N., Chida, Y., Aiba, Y., Sonoda, J., Oyama, N., Yu, X. N., et al. (2004). Postnatal microbial colonization programs the hypothalamic–pituitary–adrenal system for stress response in mice. J. Physiol. 2558 (1), 263–275. doi: 10.1113/jphysiol.2004.063388
Swan, B. K., Tupper, B., Sczyrba, A., Lauro, F. M., Martinez-Garcia, M., Gonzalez, J. M., et al. (2013). Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean. Proc. Natl. Acad. Sci. U.S.A. 110 (28), 11463–11468. doi: 10.1073/pnas.1304246110
Swann, J. R., Want, E. J., Geier, F. M., Spagou, K., Wilson, I. D., Sidaway, J. E., et al. (2011). Systemic gut microbial modulation of bile acid metabolism in host tissue compartments. Proc. Natl. Acad. Sci. U.S.A. 108 (Supplement 1), 4523–4530. doi: 10.1073/pnas.1006734107
Tang, W. W., Wang, Z., Levison, B. S., Koeth, R. A., Britt, E. B., Fu, X., et al. (2013). Intestinal microbial metabolism of phosphatidylcholine and cardiovascular risk. N. Engl. J. Med. 368 (17), 1575–1584. doi: 10.1056/NEJMoa1109400
Teitelbaum, A. A., Gareau, M. G., Jury, J., Yang, P. C., Perdue, M. H. (2008). Chronic peripheral administration of corticotropin-releasing factor causes colonic barrier dysfunction similar to psychological stress. Am. J. Physiol. Gastrointest. Liver Physiol. 295 (3), G452–G459. doi: 10.1152/ajpgi.90210.2008
Trompette, A., Gollwitzer, E. S., Yadava, K., Sichelstiel, A. K., Sprenger, N., Ngom-Bru, C., et al. (2014). Gut microbiota metabolism of dietary fiber influences allergic airway disease and hematopoiesis. Nat. Med. 20 (2), 159–166. doi: 10.1038/nm.3444
Vaezi, M. F., Falk, G. W., Peek, R. M., Vicari, J. J, Goldblum, J. R., Perez, G. I., et al. (2000). CagA-positive strains of Helicobacter pylori may protect against Barrett’s esophagus. Am. J. Gastroenterol. 95 (9), 2206–2211. doi: 10.1111/j.1572-0241.2000.02305.x
Vujkovic-Cvijin, I., Dunham, R. M., Iwai, S., Maher, M. C., Albright, R. G., Broadhurst, M. J., et al. (2013). Dysbiosis of the gut microbiota is associated with HIV disease progression and tryptophan catabolism. Sci. Transl. Med. 5 (193), 193ra191–193ra191. doi: 10.1126/scitranslmed.3006438
Willing, B., Halfvarson, J., Dicksved, J., Rosenquist, M., Jarnerot, G., Engstrand, L., et al. (2009). Twin studies reveal specific imbalances in the mucosa-associated microbiota of patients with ileal Crohn’s disease. Inflamm. Bowel Dis. 15 (5), 653–660. doi: 10.1002/ibd.20783
Willing, B. P., Dicksved, J., Halfvarson, J., Andersson, A. F., Lucio, M., Zheng, Z., et al. (2010). A pyrosequencing study in twins shows that gastrointestinal microbial profiles vary with inflammatory bowel disease phenotypes. Gastroenterology 139 (6), 1844–1854 e1841. doi: 10.1053/j.gastro.2010.08.049
Wu, G. D., Chen, J., Hoffmann, C., Bittinger, K., Chen, Y. Y., Keilbaugh, S. A., et al. (2011). Linking long-term dietary patterns with gut microbial enterotypes. Science 334 (6052), 105–108. doi: 10.1126/science.1208344
Yoon, H. S., Price, D. C., Stepanauskas, R., Rajah, V. D., Sieracki, M. E., Wilson, W. H., et al. (2011). Single-cell genomics reveals organismal interactions in uncultivated marine protists. Science 332 (6030), 714–717. doi: 10.1126/science.1203163
Zeng, M. Y., Cisalpino, D., Varadarajan, S., Hellman, J., Warren, H. S., Cascalho, M., et al. (2016). Gut microbiota–induced immunoglobulin G controls systemic infection by symbiotic bacteria and pathogens. Immunity 44 (3), 647–658. doi: 10.1016/j.immuni.2016.02.006
Zheng, C., Zheng, L., Yoo, J. K., Guo, H., Zhang, Y., Guo, X., et al. (2017). Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing. Cell 169 (7), 1342–1356.e1316. doi: 10.1016/j.cell.2017.05.035
Keywords: big data, microbiome, metagenomics, single-cell sequencing, precision medicine
Citation: Cheng M, Cao L and Ning K (2019) Microbiome Big-Data Mining and Applications Using Single-Cell Technologies and Metagenomics Approaches Toward Precision Medicine. Front. Genet. 10:972. doi: 10.3389/fgene.2019.00972
Received: 12 May 2019; Accepted: 12 September 2019;
Published: 09 October 2019.
Edited by:Jialiang Yang, Geneis (Beijing) Co. Ltd, China
Reviewed by:Fengfeng Zhou, Jilin University, China
Xuefeng Cui, Tsinghua University, China
Minxian Wang, Broad Institute, United States
Copyright © 2019 Cheng, Cao and Ning. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Kang Ning, firstname.lastname@example.org