Gut Microbial Diversity Assessment of Indian Type-2-Diabetics Reveals Alterations in Eubacteria, Archaea, and Eukaryotes

Diabetes in India has distinct genetic, nutritional, developmental and socio-economic aspects; owing to the fact that changes in gut microbiota are associated with diabetes, we employed semiconductor-based sequencing to characterize gut microbiota of diabetic subjects from this region. We suggest consolidated dysbiosis of eubacterial, archaeal and eukaryotic components in the gut microbiota of newly diagnosed (New-DMs) and long-standing diabetic subjects (Known-DMs) compared to healthy subjects (NGTs). Increased abundance of phylum Firmicutes (p = 0.010) and Operational Taxonomic Units (OTUs) of Lactobacillus (p < 0.01) were observed in Known-DMs subjects along with the concomitant graded decrease in butyrate-producing bacterial families like Ruminococcaceae and Lachnospiraceae. Eukaryotes and fungi were the least affected components in these subjects but archaea, except Methanobrevibacter were significantly decreased in them. The two dominant archaea viz. Methanobrevibacater and Methanosphaera followed opposite trends in abundance from NGTs to Known-DMs subjects. There was a substantial reduction in eubacteria, with a noticeable decrease in Bacteroidetes phylum (p = 0.098) and an increased abundance of fungi in New-DMs subjects. Likewise, opportunistic fungal pathogens such as Aspergillus, Candida were found to be enriched in New-DMs subjects. Analysis of eubacterial interaction network revealed disease-state specific patterns of ecological interactions, suggesting the distinct behavior of individual components of eubacteria in response to the disease. PERMANOVA test indicated that the eubacterial component was associated with diabetes-related risk factors like high triglyceride (p = 0.05), low HDL (p = 0.03), and waist-to-hip ratio (p = 0.02). Metagenomic imputation of eubacteria depict deficiencies of various essential functions such as carbohydrate metabolism, amino acid metabolism etc. in New-DMs subjects. Results presented here shows that in diabetes, microbial dysbiosis may not be just limited to eubacteria. Due to the inter-linked metabolic interactions among the eubacteria, archaea and eukarya in the gut, it may extend into other two domains leading to trans-domain dysbiosis in microbiota. Our results thus contribute to and expand the identification of biomarkers in diabetes.


INTRODUCTION
The eubacterial assemblage associated with the human body together with other microbes like archaea, eukaryotes and fungi are referred to as "microbiota." Trillions of these microbes that live in our distal gut are believed to be co-evolving with their hosts (Ley et al., 2008). Within the gut, microbes interact amongst themselves and their host; together, their metagenomes contain genes that act as a repertoire of metabolic functions which influence human health (Clemente et al., 2012). Recent studies have revealed that the gut microbiota is subjected to variations in the host's diet (Turnbaugh et al., 2009), genotype (Spor et al., 2011) and health status (Cénit et al., 2014). Any perturbation in the delicate balance between microbial consortia and host results in "dysbiosis, " sometimes leading to severe ailments in the host. Thus, gastrointestinal disorders such as inflammatory bowel disease (Frank et al., 2007) and colitis (Lucke et al., 2006), as well as metabolic disorders such as obesity  and diabetes (Qin et al., 2012;Karlsson et al., 2013;Zhang et al., 2013) are found to be associated with the distinct pattern of gut microbiota in which certain OTUs/species are present in different proportions.
Although studies on gut microbiota are largely dominated by eubacteria, in recent years, studies on gut-inhabiting archaea, (Scanlan et al., 2008;Gaci et al., 2014) fungi (Dollive et al., 2012;Wang et al., 2014) and eukaryotes (Pandey et al., 2012;Grattepanche et al., 2014) are being conducted to understand their distribution and possible role in human health. Thus, archaea such as genus Methanobrevibacter has been linked with human diseases like obesity (Million et al., 2012) and periodontitis (Lepp et al., 2004). Fungi residing in the gut are associated with diseases such as colorectal adenomas (Luan et al., 2015) and, Crohn's disease (Li Q. et al., 2014). Similarly, eukaryotes in the gut are found to be very complex and correlated with human diseases (Gouba et al., 2014). Thus, besides the fact that reports on gut archaea, fungi and eukaryotes are lagging, studies such as these are a clear indication that these microbes together with eubacteria forms a very complex ecosystem in the gut and their functional role in human health and diseases needs to be evaluated thoroughly.
Studies conducted in Indian population suggest compositional differences in gut microbiota and how it differs from the western population (Patil et al., 2012;Bhute et al., 2016). Therefore, considering the unique gut microbial features of Indian population  efforts to define extents of perturbation in gut microbial communities of diabetic subjects from India will help us to decipher the association between gut microbial composition and diabetes. India is one of the global capitals of diabetes with an estimated 69.1 million diabetic patients in the year 2015 (International Diabetes Federation, 2015). The explosive epidemic of diabetes in India is incompletely explained, although various contributing factors are suggested. Compared to diabetic patients in the western world, Indian diabetic patients have unique and paradoxical characteristics. These include possible heightened genetic predisposition (Ramachandran et al., 2012), intrauterine undernutrition (thrifty phenotype) leading to epigenetic predisposition (Yajnik, 2001), the manifestation of diabetes at an earlier age and at a much lower body mass index (BMI) compared to white Caucasians (Yajnik, 2004). Diabetes seems to be precipitated in this population by rapid economic and nutritional transition and rural-urban migrations (Anjana et al., 2011).
Based on above facts, we hypothesized that the dysbiosis in gut microbiota may not be limited to just eubacteria but other two domains (Archaea and Eukarya) too are disturbed due to the disease condition or vice-versa. In the present study, we investigated the composition of the intestinal microbiota of newly diagnosed (New-DMs) and long-standing diabetic subjects (Known-DMs) and compared it with normal glucose tolerant subjects (NGTs). We used Ion torrent PGM sequencing technology, to analyse eubacterial and archaeal 16S rRNA gene, 18S rRNA gene from eukaryotes and fungal ITS tagged amplicon from fecal samples.

Participants and Sample Collection
We studied 49 adults, who are parents of children in the Pune Children Study (PCS) conducted by Diabetes Unit of KEM Hospital Research Centre (Yajnik et al., 1995). They have been followed up since 1995 along with their children with serial glucose tolerance testing. The present study refers to clinical and metabolic follow-up in 2009. The study and the experimental protocols followed were approved by Ethics Committee of KEM Hospital Research Centre, Pune, India (Study number 0,847), and separate written informed consent was obtained from each participant. Inclusion criteria in NGTs group was the absence of any apparent acute or chronic disorders. New-DMs were the participants that were diagnosed with type 2 diabetes during the routine check-up, were not on anti-diabetic treatment until sample collection and free from any acute and chronic illness. Known-DMs subjects were known cases of type 2 diabetes in PCS cohort, were on anti-diabetic treatment at least for the past 1 year and free from any acute and chronic illness. General exclusion criteria for all three groups were subjects undergoing dietary intervention, use of antimicrobial in past 3 months and major surgeries of the gastrointestinal tract. All participants were admitted to Diabetes Unit the evening before the investigations. Anthropometry was measured by trained observers according to standard protocols. The following morning, fasting blood specimens were assessed for plasma glucose, insulin and lipids. Sixteen known diabetic subjects underwent only fasting and postbreakfast glucose measurements. In the remaining subjects, an oral glucose tolerance test (75 g anhydrous glucose) was carried out according to the (Alberti and Zimmet, 1998) protocol. Fecal samples were collected from all participants in a sterile container and preserved at −80 • C until DNA extraction.

Measurement of Biochemical Parameters
Plasma glucose, cholesterol, HDL-cholesterol, and triglyceride concentrations were measured using standard enzymatic methods (Hitachi 902, Germany). Between-batch coefficients of variation for all these assays were <3% in the normal range. Plasma insulin was measured using Delfia technique (Victor 2, Wallac, Turku, Finland). Overweight was defined as BMI ≥25 kg/m 2 and <30 kg/m 2 , and obesity as BMI ≥30 kg/m 2 . Diabetes mellitus was diagnosed if fasting plasma glucose ≥126 mg/dl or 120-min plasma glucose ≥200 mg/dl. Hypercholesterolaemia was defined as plasma total cholesterol ≥200 mg/dl, hypertriglyceridaemia as plasma triglyceride concentration ≥150 mg/dl and low HDL-cholesterol as HDL-cholesterol concentration <40 mg/dl for men and <50 mg/dl for women. Hypertension was defined as systolic blood pressure (SBP) ≥130 mmHg or diastolic blood pressure (DBP) ≥85 mmHg.

Sequencing of 16S rRNA Gene Amplicons
Total community DNA was extracted from each fecal sample using QIAmp DNA Stool Mini kit (Qiagen, Madison USA) as per manufacturer's protocol. The PCR amplification and sequencing of resulting amplicons was performed as described earlier . Briefly, the concentration of extracted DNA was measured using Nanodrop-1000, (Thermo Scientific, USA). DNA concentration was normalized to 100 ng/µl and used as a template for amplification of 16S rRNA gene. PCR was set up in 50 µl reaction using AmpliTaq Gold PCR Master Mix (Life Technologies, USA) and with 16S rRNA V3 region specific bacterial universal primers: 341F (5 ′ -CCTACGGGAGGCAGCAG-3 ′ ) and 518R (5 ′ -ATTACCGCGGCTGCTGG-3 ′ ) ( Bartram et al., 2011). Following conditions were used for PCR: initial denaturation at 94 • C for 4 min, followed by 20 cycles of 94 • C for 1 min, 56 • C for 30 s, and 72 • C for 30 s with final extension at 72 • C for 10 min. PCR products were purified using Agencourt AMPure XP DNA purification Bead (Beckman Coulter, USA). Resulting PCR products were end-repaired and ligated with sample-specific barcode adaptor as explained in Ion Xpress TM Plus gDNA Fragment Library Preparation user guide. Prior to sequencing, fragment size distribution and molar concentrations of amplicons were assessed on Bioanalyser 2,100 (Agilent Technologies, USA) using High Sensitivity DNA Analysis Kit. All amplicons were diluted to the lowest molar concentration and pooled into sets of 10 samples.

Sequencing of Archaeal 16S, Eukaryotic 18S and Fungal ITS Genes
The archaeal 16S, eukaryotic 18S and fungal ITS1 genes were PCR amplified using primers listed in Supplementary Table  1. The resulting PCR products were purified using Agencourt AMPure XP DNA purification Bead (Beckman Coulter, USA) and quantified using Nanodrop-1000 (Thermo Scientific, USA). Then, PCR products of all NGTs samples (n = 19), all New-DMs (n = 14) and all Known-DMs (n = 16) were pooled by mixing equal quantities of concentration normalized PCR products. This way we obtained three pools for each group, NGTs, New-DMs and Known-DMs for archaeal 16S rRNA, eukaryotic 18S rRNA and fungal ITS1 genes. All the pooled samples were then sequenced using Ion Torrent PGM. Since fungal ITS amplicons varied in length, we fragmented 100 ng of it with Ion Shear Enzyme mix (Ion Xpress Plus Fragment Library preparation kit, Life Technologies) for 20 min and 200 bp size fragments were selected before adapter ligation step (Tang et al., 2015).

Sequence Processing and Bioinformatics Analysis of Eubacterial 16S rRNA Gene Amplicons
All PGM quality-approved reads from 49 samples were exported as sample specific fastq files and pre-processed in Mothur pipeline (Schloss et al., 2009) with following conditions: (1) minimum length-150 bp, (2) maximum length-200 bp, (3) maximum homopolymer-5, (4) maximum ambiguity-0, and (5) average quality score-20. This way we derived a total of 2.1 million high-quality amplicon reads from 49 samples; subsequently, these reads were pooled as single FASTA file for further analysis in QIIME: Quantitative Insights Into Microbial Ecology (Caporaso et al., 2010). Briefly, reads were binned into Operational Taxonomic Units (OTUs) at 97% sequence similarity using UCLUST algorithm and single sequence from each OTU was picked out for further analysis. The PyNAST algorithm was used to align representative sequences against Greengenes core set; all unaligned and chimeric sequences were excluded from alignment and downstream analysis. Then lane masking was applied to the alignment to retain conserved regions of 16S rRNA and a phylogenetic tree was inferred using FastTree 2.1.3. Additionally, all reads were assigned to the lowest possible taxonomic rank by utilizing RDP Classifier 2.2 with a confidence score of 80%. Alpha diversity measures such as Chao1 index (Chao, 1984) and Shannon index (Shannon, 1948) were inferred. Phylum level abundance data and alpha diversity indices were compared among the three groups using the non-parametric tests such as Wilcoxon sum rank test and Kruskal-Wallis rank sum test. To assess beta diversity among three study groups, we applied phylogenetic distance based UniFrac (both unweighted and weighted) analysis and the results are visualized as Principal coordinate plots. To determine differentially abundant OTUs among the three groups, OTU table was filtered such that at least 8 sample will have that OTU to be retained in the OTU table. Kruskal-Wallis rank sum test was then applied to filtered OTU table containing 1969 OTUs. We next applied supervised machine learning approach (Random Forest) to identify OTUs that were indicators of community differences in three groups. This was done by estimating the amount of error introduced if a particular OTU is removed from a group of indicator OTUs and assigning it an importance score.

Clustering of Samples into Enterotypes
To understand whether disease state has any effect on the composition of enterotypes, we applied original measurements proposed by Arumugam et al. (2011) and as detailed at http:// enterotyping.embl.de (Arumugam et al., 2014) to partition the samples into distinct enterotypes clusters. Briefly, the genus level abundance data was segregated according to the three categories, imported in R and clustered using partitioning around medioid (PAM) algorithm followed by determination of optimal number of clusters by utilizing Calinski-Harabasz (CH) index. Finally, results of between class (BC) analyses were visualized as principal component analysis. Additionally, taxa that influenced partitioning of samples into enterotypes (drivers of enterotype) were identified based on their abundance in a particular enterotype.

Bioinformatics Analysis of Archaeal 16S, Eukaryotic 18S and Fungal ITS Genes
Most of the steps for analysis of pooled archaeal 16S, eukaryotic 18S and fungal ITS1 genes were similar as that of eubacterial 16S rRNA gene, except for the fact that QIIME compatible SILVA_111 database (Quast et al., 2013) for archaeal 16S and eukaryotic 18S amplicons and QIIME compatible UNITE_12_11 database (Kõljalg et al., 2013) for fungal ITS amplicon was used during the OTU picking step.

Prediction of Ecological Relationships
To predict ecological relationships among gut microbiota, microbial association network showing co-occurrence and coexclusion pattern was built as described before (Faust et al., 2012). Briefly, genus level abundance data was imported to CoNet plugin (version 1.0.4 beta) in Cytoscape 3.0.0 environment (Shannon et al., 2003). To produce association network, 100 top and bottom edges were used with two measures of similarity (Pearson and Spearman) and three measures of dissimilarity (Bray-Curtis, Hellinger, and Kullback-Leibler). Spurious correlations due to compositional structure of relative abundances were avoided by bootstrapping and renormalization and resulting networks were combined using Simes method followed by Benjamini-Hochberg-Yekutieli false discovery rate (FDR) correction with FDR cut-off of 0.05. Finally, all unstable edges outside the 95% confidence interval of bootstrap distribution score were removed and network was visualized and suitably edited.

Metagenomic Imputation
For metagenomic imputation, amplicon sequences were binned into OTUs at 97% similarity using closed-reference OTU picking in QIIME. The resulting OTU table was filtered such that at least 8 samples will have that OTU to retain it in OTU table. Resulting OTU table was then analyzed using online tool PICRUSt (Langille et al., 2013) at http://huttenhower.sph.harvard.edu/ galaxy/. PICRUSt (phylogenetic investigation of communities by reconstruction of unobserved states) is a computational tool that uses marker gene data for prediction of functional composition of metagenome. Briefly, OTU abundance table was first normalized for 16S rRNA copy number against known gene copy number for each OTU. Functional predictions were categorized into KEGG pathways and an annotated table of predicted gene family counts (KOs) for each sample using predict metagenome option. Gene family table then categorized by function and further statistical analysis was performed in STAMP v2.0.1 (Parks and Beiko, 2010).

Validation of Eubacterial Amplicon Library Results
Owing to their high precision, quantitative PCR-based assays were performed using group-specific primers to validate the major findings of eubacterial disturbances. Absolute quantification of total bacteria, phylum Bacteroidetes and genus Lactobacillus was performed using SYBR green master mix (Applied biosystems Inc. USA) on 7,300 Real-time PCR system from Applied Biosystems Inc. (USA). The primers and PCR conditions used for qPCR assays were described earlier (Suryavanshi et al., 2016). The findings of qPCR results were analyzed using Kruskal-Wallis test, bacterial groups with a p-value less than 0.05 were considered as significantly different among the three groups.

Additional Statistical Analysis
Biochemical and anthropometric parameters were expressed as mean (SD) and ANOVA test is used to compare differences among the study groups. Different type of data generated through QIIME was imported and analyzed in ade4, vegan and ggplot2 packages within R software (R Core Team, 2013) environment. In addition, the relationship between biochemical parameters and microbiota were assessed using PERMANOVA: permutational multivariate analysis of variance test (Anderson and Walsh, 2013). Covariance between biochemical parameters dataset and genus abundance dataset was performed by using co-inertia analysis (Dray et al., 2003), these two datasets were connected to each other owing to the presence of same subjects.

Availability of Data
Raw sequences generated in the present study are deposited to NCBI SRA under accession number SRP041693.

Summary of Biochemical Parameters
Biochemical and anthropometric characteristics are shown in Table 1. Out of the 49 participants, 19 were NGTs, 14 were New-DMs and 16 were Known-DMs. In the total study group, 8 participants were obese and 28 were overweight. Twelve participants had hypercholesterolemia, 16 had hypertriglyceridemia, 45 had low HDL and 8 were hypertensive.

Altered Eubacterial Diversity and OTU Composition of Diabetic Subjects
We obtained and analyzed 4,111 eubacterial OTUs among the three study groups. Analysis of alpha diversity indices revealed that overall diversity in New-DMs was noticeably reduced and both expected (Chao1, p = 0.095) and observed (Observed Species, p = 0.047) species diversity indices were lowered in New-DMs and Known-DMs subjects ( Figure 1A). Out of eight bacterial phyla detected, Proteobacteria (p = 0.026) were significantly lowered, Firmicutes (p = 0.010) were significantly higher while Bacteroidetes (p = 0.098) showed decreased trend in abundance in New-and Known-DMs subjects ( Figure 1B). Kruskal-Wallis test (without post-hoc analysis) revealed the presence of 83 significantly different OTUs (p < 0.01) of which Prevotella copri, Faecalibacterium prausnitzii and Lachnospiraceae OTUs were enriched in NGTs whereas Lactobacillus ruminis OTUs were found enriched in Known-DMs (Figure 2A). Moreover, 2 OTUs belonging to genus streptococcus were abundant in New-DMs. Interestingly, the OTUs assigned to P. copri and Lachnospiraceae were found to be negatively correlated with fasting glucose (Supplementary  Table 2). Using UniFrac distance based PCoA biplots, we demonstrate substantial segregation of the subjects into three groups based on the presence/absence (Unweighted UniFrac, Figure 2B) and abundance of specific bacterial taxa (Weighted UniFrac, Figure 2C).We thus suggest that the presence of discrete clusters of samples in PCoA biplot is an indication of unique bacterial community structure in the three study groups. We further observed that OTUs belonging to order Bacteroidales, family Lachnospiraceae and phylum Bacteroidetes and genus Prevotella were determinative taxa for segregation of NGTs from New-DMs and Known-DMs subjects on PCoA biplots. It was noted that Lactobacillus was the crucial contributor for segregation of Known-DMs from rest of the samples and thus confirms the findings of Kruskal-Wallis test (performed above) demonstrating enrichment of L. ruminis in these subjects.
After the Random Forest analysis, top 20 OTUs were considered as highly discriminative among the three groups ( Figure 3A).
Considering the fact that multiple hypothesis testing was not applied during the Kruskal-Wallis test, we speculate that some of the differences may be overstated. Therefore, we performed qPCR-based absolute quantification to support our major findings of decreased total bacterial count and phylum Bacteroidetes; and increased abundance of genus Lactobacillus in New-and Known-DMs subjects ( Figure 3B).

Disease State Has Profound Effect on Composition of Enterotypes
We were able to stratify the gut microbial communities of NGTs, New-DMs as well as Known-DMs subjects into three distinct enterotypes (E) (Figure 4 and Supplementary Figure 1). As observed earlier by Arumugam et al., healthy (NGTs) subjects ( Figure 4B) grouped into three enterotypes (E1-Bacteroidetes, E2-Prevotella, and E3-Ruminococcus). However, notable compositional changes were observed in enterotypes of both New-DMs ( Figure 4C) and Known-DMs ( Figure 4D) compared to enterotypes of NGTs subjects. Based on the abundance of the different genera we found that all three enterotypes in these subjects were found to driven by members of Firmicutes (New-DMs: E1-Lachnospira, E2-Streptococcus, and E3-Weissella & Known-DMs: E1-Veillonella, E2-Lachnospira and E3-Lactobacillus). Notably, the E2 (five subjects) in New-DMs and E3 (eight subjects) in Known-DMs were dominated by taxa that were being enriched in these subjects.

Archaeal, Eukaryotic, and Fungal Dysbiosis
We generated 109,561 good quality archaeal 16S rRNA amplicon reads from three pools of samples (NGTs, New-DMs, and Known-DMs); which clustered into 65 OTUs belonging to Euryarchaeota and Thaumarchaeota phyla. The former being the most dominated phylum occupying more than 99% reads of all three groups. We noticed the gradual increase in Methanobrevibacter (which was also the most abundant taxa in all groups) and associated decrease in Methanosphaera abundance from NGTs to New-DMs to Known-DMs subjects. From the three pools of Eukaryotic sequence data, we obtained 41,959 good quality sequences that clustered into 383 OTUs and could be assigned to four phyla: Chloroplastida, Metazoa, Stramenopiles, and Metamonada. Members of Stramenopile especially members of genus Blastocystis were found abundant in all groups. Fungi, particularly members belonging to Saccharomycetales were abundant in New-DMs compare to NGTs and Known-DMs. For fungal ITS data, we could obtain 106,185 reads that clustered into 871 OTUs belonging to phyla Ascomycota being most dominant followed by Basidiomycota and Zygomycota to be least dominant. From the Ascomycota group; Aspergillus and Emericella, the two alternative forms of the same fungus predominated most of the sequences (Figure 5).

Altered Microbial Composition Is Associated with Clinical Parameters
To analyse the effect of different biochemical and anthropometric measurements on sampled microbiota among the three groups, we used PERMANOVA and Co-inertia analysis. After applying PERMANOVA test, we discovered that HDL (p = 0.03), triglyceride (p = 0.05), and waist-hip ratio (p = 0.02) to be associated with OTU diversity across all samples (Supplementary Table 3). In the case of Known-DMs, we found HDL (p = 0.01), and in the case of New-DMs, oral glucose tolerance test (p = 0.05) to have an influence on distinct OTU diversity. Further, the covariance between genus abundance and clinical and anthropometric parameters were examined using co-inertia analysis (1,000 permutations) of these datasets, resulting in a relationship with RV coefficient = 0.219, P = 0.196 between the datasets (Figure 6). Similar and subsequent analysis were not performed on simulated datasets of Archaeal, Eukaryotic and Fungal datasets.

Eubacterial Interaction Network
Microbiome network containing a total of 108 nodes connected with 174 edges together representing 46% co-occurrence and 54% of mutual exclusion interactions were obtained. Further, to measure the scale-freeness of the network, we used fitted power law and obtained correlation of 0.6 with the R-square value of 0.723 (Supplementary Figure 2). This network reveals that the patterns observed were disease state specific, i.e., majority of the edges were found clustering within one study group providing a clue that individuals in each group have distinctly interacting microbiome composition (Figure 7 and Supplementary Table 4). We then filtered the network to retain nodes positively interacting with each other, assuming that microbes represented by these nodes will stay together in a given community. In the filtered network of positively interacting genera, we noticed that a cluster of Lachnospira, Ruminococcus, Faecalibacterium, Roseburia, Oscillospira, Parabacteroides, and Bulleidia decomposed from NGTs to New-DMs then to Known-DMs (Supplementary Figures  3-5). We also noted negative interactions of Lactobacillus in Known-DMs.
FIGURE 7 | Significant co-occurrence and co-exclusion relationships at genus level. Each node represents a bacterial genus; size of the node is proportional to the abundance of the genus and colored according to diabetes status (Red: Known-DMs, Yellow: New-DMs, and Green: NGTs). Each edge represents co-occurrence/co-exclusion relationships, edge width is proportional to the significance of supporting evidence, and color indicates sign of the association (red: negative, green: positive).

Deficient Metabolic Activities in New-DMs as Revealed by Imputed Metagenome
Having identified the compositional changes in microbiota with respect to diabetes state, we tested whether these changes are accompanied with selectively fostering or lacking particular functional capabilities of gut microbiota. Similarities and differences in metabolic capabilities in gut microbiota were evaluated by making the pair-wise comparison between the diabetes statuses using two-sided Welch's t-test. Compared to NGTs, the metagenome of New-DMs was found augmented with glycerolipid metabolism, fructose and mannose metabolism, pentose phosphate pathway, galactose metabolism, glycolysis/gluconeogenesis and arginine and proline metabolism. Concurrently, these subjects were found to be deficient in many important metabolic activities such as carbohydrate metabolism (including carbohydrate digestion and absorption, TCA cycle, oxidative phosphorylation, glycan biosynthesis and metabolism, glycosyltransferases), amino acid metabolism (including metabolism of glycine, serine, threonine, histidine), vitamin B metabolism (including folate, biotin, pyridoxine metabolism), glutathione metabolism and other functions (Supplementary Figure 6). Compared to Known-DMs, New-DMs were deficient of carbohydrate digestion and absorption, glycosyltransferases and glutathione metabolism (Supplementary Figure 7). Conversely, they were enriched with functions unrelated to carbohydrate or amino acid or lipid metabolism compared to NGTs (Supplementary Figure 8).

DISCUSSION
The present study is first to report perturbation in the gut microbiota of Indian diabetic subjects across the three domains of life. Considering the unique characteristics of Indian diabetic subjects, understanding their gut microbiota will be important to understand the possible role of gut microbiota in affecting these characteristics. Members of eubacteria such as P. copri, Lachnospiraceae and Ruminococcaceae families were found significantly abundant in NGTs subjects. Known-DMs subjects exhibited increased abundance of Firmicutes and OTUs belonging to genus Lactobacillus. These organisms were seen to have an effect on the segregation of samples in both unweighted and weighted UniFrac based PCoA biplots. Fungi prevailed in New-DMs; in particular, genera Aspergillus, Candida, and Saccharomyces were found enriched in these subjects. We also observed the progressive decline in butyrate-producing bacteria from NGTs to Known-DMs subjects. These variations in gut microbiota were associated with diabetes risk factors such as fasting glucose, high triglycerides, low HDL and fasting insulin. Additionally, synergistic or antagonistic interactions occurring in gut microbiota were found specific to the stage of glucose intolerance. Using PICRUSt, we predicted that the gut microbiome of New-DMs subjects was metabolically disturbed and was lacking in many necessary functions. Increased Firmicutes and proportionate decrease in Bacteroidetes is linked with more energy harvesting and storage in ob/ob animals . Analogous to animal studies, human obesity is also found to be linked with higher Firmicutes to Bacteroidetes ratio . Our finding of increased abundance of Firmicutes in known-DMs is in agreement with previous reports (Karlsson et al., 2013) but not with findings of Larsen and co-workers, who reported a decrease in the proportion of Firmicutes (Larsen et al., 2010). Association of Firmicutes with obesity and diabetes could operate through insulin resistance which is a common attribute of both the conditions (Pandolfi et al., 1994).
Analysis of differentially abundant OTUs revealed that NGTs were highly enriched with Prevotellaceae, Lachnospiraceae, and Ruminococcaceae families. Members belonging to Prevotellaceae such as genus Prevotella contribute significantly to interindividual variation in gut microbiota (Arumugam et al., 2011) and increased proportions of Prevotella are associated with the diet rich in plant-derived complex carbohydrates and fibers such as the diet in Indians (De Filippo et al., 2010). Additionally, a study in which subjects were kept of dietary interventions (barley kernel-based bread, which is considered as a rich source of fibers), showed that there was a significant increase in P. copri and that it was found to be associated with improvement in glucose metabolism in these subjects (Kovatcheva-Datchary et al., 2015). Strikingly, several studies on type 1 diabetes, a pathophysiologically different disorder related to persistent hyperglycemia, are also reporting reduced levels of Prevotella in newly diagnosed as well as longstanding type 1 diabetic subjects (Mejía-León et al., 2014;Alkanani et al., 2015;Mejía-León and Barca, 2015). At this moment we could speculate that this could just be a coincidence or indeed it is linked with hyperglycemia per se which is a common attribute of type 1 and type 2 diabetes. Members of families Lachnospiraceae and Ruminococcaceae are known producers of short-chain fatty acids (SCFAs) such as acetate and butyrate. These SCFAs are known to confer many health benefits; individuals lacking bacterial families producing SCFAs suffer from many diseases (Morgan et al., 2012). Interestingly, we observed decreasing trends in the richness of these bacterial families with progressive deterioration of glucose tolerance (from NGTs to New-DMs to Known-DMs subjects). Presence of these families in the gut may be essential to foster a "healthy state, " and their depletion might have a role in diabetes development (Remely et al., 2014). Thus, we hypothesize that the decreased abundance of P. copri and concomitant loss of short chain fatty acids producers in New-and Known-DMs subjects could be linked with glucose intolerance in these subjects as these organisms were found to be negatively correlated with fasting glucose in our analyses.
We also found that Known-DMs were enriched with genus Lactobacillus consistent with previous studies on diabetic subjects in different populations of the world (Larsen et al., 2010;Lê et al., 2012). Karlsson and co-workers have also demonstrated enrichment of lactobacilli-derived metagenomic clusters (MGCs) in type 2 diabetic patients that they found positively correlating with fasting glucose and HbA 1c . Another large-scale study dealing with the characterization of over 170 Lactobacillus species from oral cavity showed a higher prevalence of lactobacilli in diabetic subjects (Teanpaisan et al., 2009) and this increase in Lactobacillus species has been linked with increased salivary glucose in children with diabetes (Karjalainen et al., 1996). Lactobacillus ruminis, which we found significantly abundant in Known-DMs subjects, is a member of indigenous gut microflora (O' Donnell et al., 2015). It has approximately 16 carbohydrate utilization pathways including those for utilization of glucose, fructose, mannose, galactose, starch, and sucrose . Thus, as reported earlier, the catabolic flexibility of this organism toward varied dietary carbohydrates is evident . Above facts taken together, indicate that enrichment of the lactobacilli in gastrointestinal tract of diabetic subjects could be a consequence of higher than usual concentration of glucose, which needs to be confirmed.
Besides this, we also show the gradation of NGTs, New-DMs and Known-DMs samples on UniFrac biplots. These UniFrac biplots were plotted using phylogenetic distance which is calculated utilizing unique branch-lengths i.e., only those branches that lead to descendants from one or the other sample but not both samples in a phylogenetic tree were considered (Lozupone et al., 2011). Hence, we believe that segregation of the samples is robust and could be because of the above mentioned compositional differences in bacterial communities in these subjects. We next attempted to group study participants into distinct clusters based on the presence of unique and dominant gut microbial communities called "enterotypes" (Arumugam et al., 2011). Currently, the concept of enterotype is generating a lot of debate; different groups have different opinions about the presence or absence of such discrete cluster in human gut microbiome (Knights et al., 2014;Moeller et al., 2015). Although, it has been shown earlier that during identification of enterotypes, various factors influence clustering of subjects into distinct enterotypes (Koren et al., 2013); we feel that it is beyond the reach of this article to deal with theories of formation of enterotypes and associated factors affecting their formation, hence, we performed this analysis as originally proposed (Arumugam et al., 2011). We find substantial changes in major contributors of enterotype in New-and Known-DMs subjects compared to NGTs subjects. Especially, we observed E2 in New-DMs and E3 in Know-DMs subjects to be driven by Streptococcus and Lactobacillus respectively. These findings are important because clustering of subjects based on the presence of unique and predominated taxa could help us in identifying disease-related biomarkers, thus it can find its implications in microbiome-based diagnostics (Knights et al., 2014).
We next looked into archaeal diversity in the three sample groups; Methanobrevibacter and Methanosphaera were the most prevalent genera. Methanobrevibacter smithii (M. smithii) and Methanosphaera stadtmanae are well adapted to the human gut environment, interestingly, the latter has acquired most of these adaptations through inter-domain lateral gene transfer (Samuel et al., 2007;Lurie-Weinberger et al., 2012). As perceived by us and reported in a previous study , Methanobrevibacter smithii has been represented in large proportion along with increased Firmicutes; it was involved in increased energy harvest through polysaccharide degradation. Further, the same study noted that this attribute was transmissible such that microbiota transplantation from obese donor to lean germ-free mice lead to the gain in body fat.
Additionally, Methanobrevibacter smithii directs polysaccharide utilization by gut inhabitants, leading to the formation of large pools of SCFAs which is later used by M. smithii for methanogenesis in the gut with a consequent increase in host adiposity (Samuel and Gordon, 2006). Thus, Methanobrevibacter smithii can be a therapeutic target to avoid obesity and associated complications such as diabetes (Samuel et al., 2007).
Based on the work we carried out and several other similar studies, gut eukaryotes and fungi appear to be important components of the human gut. Such studies are crucial in the light of the involvement of these organisms in human diseases both inside and outside of gastrointestinal tract (Cui et al., 2013). Morphological and molecular phylogenetic-based classification of eukaryotes show that all eukaryotes originate from one of the six super-groups and that most of them are microscopic in nature (Adl et al., 2005). Although for decades human-associated eukaryotes are considered harmful to their host, recent examination of eukaryotic communities in the gut are amending our understanding of this generally neglected component (Hamad et al., 2012;Pandey et al., 2012;Parfrey et al., 2014). Studies such as these and our findings suggest that Blastocystis and fungi such as Ascomycota and Basidiomycota are predominant in the human gut. Fungi such as Candida albicans, Aspergillus fumigatus, and Saccharomyces are opportunistic pathogens known to be exaggerated in immunecompromised people (Li E. et al., 2014;Gouba and Drancourt, 2015). Fungal species mentioned above have also been associated with various diseases in type 1 (Soyucen et al., 2014) and type 2 diabetic subjects (Aly et al., 1991;Nowakowska et al., 2004) and are probably because of the high blood glucose level in these subjects. Thus, marked enrichment of fungi belonging to these and other genera in New-DMs subjects are likely due to the poor glycemic control in these subjects.
We investigated associations between clinical parameters and OTU richness using permutational multivariate analysis of variance (PERMANOVA). PERMANOVA is considered a powerful technique in detecting changes in community structure in response to environmental parameters (Anderson and Walsh, 2013). We observed that HDL, triglyceride and waist-hip ratio as largest contributors to the observed variation in OTU richness. Such correlations between risk factors for diabetes and variation in microbes in the gut have been previously reported (Zhang et al., 2013) and are also reflected in our dataset. Thus, it could be relevant in the microbiome-phenotype associations, since, low HDL and high triglycerides are typical features of dyslipidaemia found in T2D and known risk factors for cardiovascular disease (Mooradian, 2009).
We used network analysis to capture specific ecological interactions among the eubacterial consortium in relation to diabetes status. Such interaction networks can predict the outcome of community alterations (Faust et al., 2012) and be helpful in designing intervention studies aimed at altering complex microbial communities to restore the healthy state. In essence, we are not demonstrating complete coverage of all microbial interactions in the gut; but analyzing the interactions among microbes in the gut will help us understand how these communities develop or evolve in response to altered physiological and/or metabolic state such as diabetes. We thus highlight two characteristic features of this network: (1) the nature of the interactions observed were diabetes state specific and (2) the disintegration of the microbial cluster of genera: Lachnospira, Ruminococcus, Faecalibacterium, Roseburia, Oscillospira, Parabacteroides, Bulleidia from NGTs to New-DMs to Known-DMs. Almost all these genera include known beneficial species having the ability to produce SCFAs as mentioned earlier. Importantly, metagenomic linkage clusters (MLGs) belonging to these butyrate-producing genera were found enriched in non-diabetic controls in diabetes associated metagenomic study (Qin et al., 2012).
Finally, with the bioinformatics tool PICRUSt (Langille et al., 2013) which predicts functional composition using marker gene data, we had an opportunity to look into imputed metagenomebased discrete functional alteration in the eubacterial component of our study subjects. We observed that New-DMs were severely depleted with metabolic functions involved in carbohydrate metabolism, amino acid metabolism, various cofactor synthesis and oxidative stress management. Although PICRUSt can accurately predict metagenomic functions, it is limited to those sequences that can be accurately mapped to existing Greengenes database and does not consider sequences from novel microbial lineages (Langille et al., 2013). Thus, our explanation on imputed metagenome is limited and interpreted cautiously.
One of the strengths of our study is the comparison of gut microbiota of different grades of glucose intolerant subjects from a cohort which is has been followed for the past 20 years, this allowed a confident separation between newly diagnosed and known diabetic subjects. The participants are from the similar socioeconomic background and have a predominantly vegetarian diet. The age and gender distribution in the three groups were similar. One of the limitations of this study is that we were unable to describe sequential events in gut microbiota from healthy to diabetic state due to the cross-sectional design of this study. Another limitation of the study is the relatively small number of participants from one part of the country. Given the diversity in lifestyles, dietary habits, and social-economic status in the country, this study underscores a need for nationwide longitudinal studies. Our study is subject to inherent biases introduced by the use of high-throughput 16S rRNA gene amplicon sequencing. These include the region of 16S rRNA gene sequenced, set of primers used for gene amplification and use of sequence database for taxonomic assignments of the amplicon reads.
In conclusion, our results add to the growing literature suggesting an association between gut microbiota and diabetes. Broad similarities between our results and literature reports suggest that our measurements are reliable and support consistent association across populations. Additionally, we have broadened the boundaries of diabetes associated gut microbiota by providing the consolidated description on eubacterial, archaeal, and eukaryotic dysbiosis in these subjects. Given the peculiarities of diabetes in Indians, these results suggest an important avenue be further explored for causality and possible interventions to prevent or modify the course of diabetes and related disorders. We anticipate the need for subsequent studies describing differences in gut microbial communities of diabetic patients from different populations and identification of relevant population specific biomarkers.

AUTHOR CONTRIBUTIONS
YS, SG, and CY contributed to conception, design, and coordination of the study and to the critical revisions of the manuscript for important intellectual content. SJ and CY were involved in subject recruitment and sample collection. SB acquired and processed the fecal samples for 16S rRNA amplicon sequencing. SB performed detailed bioinformatics analysis. SB and MS performed archaeal, eukaryotic and fungal amplicon sequencing. SB prepared the first draft of the manuscript and contributed to the critical revisions of the manuscript for important intellectual content. SB and SJ undertook statistical analysis and interpretation of results. SJ contributed to the critical revisions of the manuscript for important intellectual content. All authors gave final approval of the version to be published.