Characterizing the avian gut microbiota: membership, driving influences, and potential function

Birds represent a diverse and evolutionarily successful lineage, occupying a wide range of niches throughout the world. Like all vertebrates, avians harbor diverse communities of microorganisms within their guts, which collectively fulfill important roles in providing the host with nutrition and protection from pathogens. Although many studies have investigated the role of particular microbes in the guts of avian species, there has been no attempt to unify the results of previous, sequence-based studies to examine the factors that shape the avian gut microbiota as a whole. In this study, we present the first meta-analysis of the avian gut microbiota, using 16S rRNA gene sequences obtained from a range of publicly available clone-library and amplicon pyrosequencing data. We investigate community membership and structure, as well as probe the roles of some of the key biological factors that influence the gut microbiota of other vertebrates, such as host phylogeny, location within the gut, diet, and association with humans. Our results indicate that, across avian studies, the microbiota demonstrates a similar phylum-level composition to that of mammals. Host bird species is the most important factor in determining community composition, although sampling site, diet, and captivity status also contribute. These analyses provide a first integrated look at the composition of the avian microbiota, and serve as a foundation for future studies in this area.


INTRODUCTION
The role of the gut microbiota in shaping the health and physiology of vertebrate hosts is a well-established, highly exciting area in microbiology. The diversity and function of microbes in the gastrointestinal (GI) tract is an area of ongoing research, with recognized roles for the vertebrate microbiota in nutrition (Jin et al., 1998;Preest et al., 2003;Turnbaugh et al., 2006;Angelakis and Raoult, 2010;Stanley et al., 2012), gut development (Stappenbeck et al., 2002;Rahimi et al., 2009;Zhang et al., 2011;Cao et al., 2012) and regulation of host physiology (Bäckhed et al., 2004;Björkholm et al., 2009;Meinl et al., 2009). 16S rRNA gene sequencing has been employed in a range of studies to assess the diversity and phylogenetic relationships of gut microbes and this has proven to be a powerful tool for understanding the factors that shape microbial communities, due to both its informative and predictive potential. A secondary benefit of the 16S rRNA gene is that, in addition to reporting the results of findings in scientific journals, it is customary to deposit the primary sequence data into publicly available databases which allow for a second wave of meta-study. By aggregating data from a variety of sources or environments, researchers have been able to discern large-scale patterns in microbial ecology, analysing the bacterial communities of mammalian (Ley et al., 2008a) and fish (Sullam et al., 2012) guts, as well as across other non-biological factors (Lozupone and Knight, 2007;Chu et al., 2010;Shade et al., 2013). One area that has arguably not undergone such a revolution is that of the avian microbiota. While several notable exceptions exist, such as commercially farmed broiler chickens and turkeys as well as the South American hoatzin, the majority of avian systems have not been studied outside of immediate pathogenic concerns.
Similar to other vertebrates, the GI tract of birds is colonized by a community of microbes, with a density as high as 10 11 c.f.u/g in the hindgut (Barnes, 1972). The role of microbes in the avian gut has long been a topic of study, with groundbreaking research throughout the 1960's identifying the role of bacteria in starch degradation and volatile fatty-acid production within the bird gut (Bolton, 1965;Annison et al., 1968;Pritchard, 1972). From a microbiological perspective, there are two major areas of interest in the bird gut. The crop, a muscular pouch located at the start of the alimentary tract, is associated with the breakdown of starch (Shaw, 1913;Pritchard, 1972;Vispo and Karasov, 1997;Pacheco et al., 2004), and microbially mediated fermentation of lactate (Bolton, 1962(Bolton, , 1965Pritchard, 1972;Moore et al., 2004). Cellulolytic microbes have occasionally been observed in avian crops (Shetty et al., 1990;Domínguez-Bello et al., 1993), but significant bacterial cellulolysis has only been reported in the hoatzin (Grajal et al., 1989;Domínguez-Bello et al., 1993), with only low levels of cellulose fermentation reported for other birds (Clemens et al., 1975;Cutler et al., 2005). The ceca are the sites of recycling of urea (Barnes, 1972;Mead, 1989;Vispo and Karasov, 1997;Preest et al., 2003), retention of water (McNab, 1973) and fermentation of carbohydrates (Józefiak et al., 2004). It has been observed that a cellulose-rich diet leads to increased size of the ceca (Leopold, 1953;McNab, 1973;Miller, 1976;Duke et al., 1984;Redig, 1989;Stevens and Hume, 1998), but there is contradictory evidence for the direct utilization of cellulose in the avian hindgut (Barnes, 1972;McNab, 1973;Mead, 1989).
With the rise of 16S rRNA gene sequencing a large portion of avian microbiology has shifted from microbial physiology to the diversity and phylogeny of avian gut microbes. Specific studies have addressed areas of avian microbial ecology, such as the variation in microbial diversity along the GI tract (Bjerrum et al., 2006;Gong et al., 2007;Torok et al., 2008;Waite et al., 2012), the influence of diet (Rubio et al., 1998;Blanco et al., 2006;Torok et al., 2008;Janczyk et al., 2009;Hammons et al., 2010), age (Van Der Wielen et al., 2002;Godoy-Vitorino et al., 2010;Van Dongen et al., 2013) or other host-specific factors (Zhu et al., 2002;Lucas and Heeb, 2005;Banks et al., 2009;Benskin et al., 2010;Wienemann et al., 2011). While there is extensive evidence that microbial colonization of the GI tract brings benefits to the host bird (Jin et al., 1998;Torok et al., 2008;Angelakis and Raoult, 2010;Torok et al., 2011;Zhang et al., 2011;Cao et al., 2012;Stanley et al., 2012), there are also pathways through which the normal colonization of microbes can be of detriment to the host (Ford and Coates, 1971;Potti et al., 2002;Cao et al., 2012;Singh et al., 2013). Although there are many published studies exploring aspects of the avian microbiota, it has evidently been uncommon for authors to publish their sequence data to an archive, somewhat limiting the potential for avian metastudies. As an example of this, in their 2008 meta-analysis of the vertebrate microbiota Ley et al. had access to rich clone-library data from insects (19 studies), humans (20 studies) and other vertebrate species (23 studies, including five from birds) (Ley et al., 2008b). In 2012, Sullam et al. identified for analysis 24 pre-existing clonelibraries derived from fish guts (Sullam et al., 2012). By contrast, in the same year Kohl only identified eight avian libraries with any significant microbiota data (Kohl, 2012). A survey of the recent literature has shown that the picture of the avian microbiota has since improved significantly, with the continued usage of clone-libraries and incorporation of amplicon pyrosequencing into existing study systems ( Table 1).
In order to gain new insights into the avian gut microbiota, we sought to amalgamate the existing knowledge and determine whether patterns detected in individual studies were consistent across avians as a whole. To achieve this goal we collected publicly available data from NCBI GenBank and MG-RAST and reanalyzed the data using established bioinformatics pipelines.

DATA ACQUISITION AND QUALITY CONTROL
Clone-library data were obtained from GenBank through a comprehensive literature survey, followed by the retrieval of clone-library sequence data of interest. Short amplicon data from next-generation sequencing studies were obtained from MG-RAST and the NCBI Sequence Read Archive (hereafter referred to as short-read data) by browsing for the publicly available data sets. Data sources are as reported in Table 1, with the exception of the database provided by Wei and colleagues (Wei et al., 2013), which was excluded from analysis as their data overlapped significantly with sequences obtained from original studies.
All downloaded data were re-analyzed using mothur version 1.32.1 (Schloss et al., 2009). For short-read data, flowgrams were trimmed to a single length then denoised. Where flowgrams were not available, sequences were trimmed using the trim.seqs command, removing the barcode and primer sequences and discarding sequences with an average quality score of less than 25, or sequences with a homopolymer run of greater than eight bases. All sequence data were then aligned, screened for chimeras with uchime (Edgar et al., 2011) and classified against the Greengenes taxonomy using the naïve Bayesian method (Desantis et al., 2006;Wang et al., 2007). Sequences that could not be classified to domain level, or were classified as Cyanobacteria, were removed from the dataset as they likely represent ingested plant material. Chimeric sequences and sequences that could not be aligned were also removed from the data set.
For data obtained from clone libraries it is common practice to simply upload representative sequences to GenBank, rather than the complete dataset. In order to account for the loss of abundance information from the original clone libraries, taxonomic classification was reported by calculating operational taxonomic units (OTUs) of 97% sequence similarity for each sample and assigning taxonomy using the classify.otu command in mothur. Although short-read data does contain the data from the complete sequencing run, studies did not always utilize the same 16S rRNA gene region and so could not be directly compared. In lieu of OTU generation, genus-level phylotypes were constructed using the sequence classification. For short-read data, the phylotype table was rarefied to a depth of 1500 data points and Shannon and Simpson diversity indices calculated.

CORRELATING METADATA TO COMMUNITY STRUCTURE
For clone data, sequences were trimmed to an 800 bp overlapping region and a phylogenetic tree constructed using the clearcut neighbor-joining algorithm (Evans et al., 2006) for UniFrac analysis. Sequences less than 800 bp in length were discarded, resulting in the loss of three avian samples compared with the previous classification. Due to the potential bias in relative abundance incurred by the selective uploading of data, only unweighted UniFrac distance was calculated. For short-read data there was no contiguous region of sequence common to all samples, so analysis was performed by constructing genus-level phylotypes of the classified data. Community differences were calculated using Jaccard (presence/absence) and Yue-Clayton theta (abundance) distance by randomly subsampling each community to 1500 sequences 20,000 times and averaging the community distances across iterations.
Metadata regarding the host, sample type, animal diet and captivity status were recorded and their impact on community differences compared using the vegan package (version 2.0-8) (Oksanen et al., 2013) in the R software environment (R. Core Team, 2012). Samples were grouped according to the following categories: host animal, diet and captivity status. Diet consisted of three categories-carnivore, herbivore and grain-fed-that reflected a "typical" diet of the host. When dividing animals based on diet, the distinction was made between an herbivorous diet (leaves and green plant material, such as eaten by the kakapo and hoatzin) and grain-fed diet (pelleted feed, such as found in farmed chickens) due to the different nutrient content and availability in these diets. Captivity status consisted of simply dividing samples into those animals that are wild or farm-raised. For short-read data the study that provided the data was also used as a test for how much the dynamics of the study itself shaped the data. This factor could not be applied to the clone-library data as not every original study uploaded sequences with sufficient information to recapture biological replication with the sequence data.
Permutational multivariate analysis of variance (PERMAN-OVA) with linear model fitting was performed (Anderson, 2001;McArdle and Anderson, 2001) in R. Samples were grouped according to each metadata factor and tested for how well the grouping accounted for the variation between samples using the "Adonis" function of the vegan package (Oksanen et al., 2013), measured as R 2 . A significance value (p-value) was generated by comparing the obtained R 2 to that obtained from 1000 random permutations of the data. For factors with a statistically significant fit, constrained canonical analysis (CCA) was performed (Ter Braak, 1986) using the factor as the constraining variable to isolate the contribution of that factor to the microbial community.

FUNCTIONAL PREDICTION OF GUT MICROBIOTA
Following quality control of short-read data, sequences were mapped to OTUs using closed-reference OTU picking in QIIME 1.80 . 16S rRNA gene abundance levels were then normalized against the known gene copy number for that OTU and function predictions made based on OTU membership using PICRUSt (Langille et al., 2013). Functional predictions were categorized into KEGG pathways and statistical analysis performed using STAMP v2.0 (Parks and Beiko, 2010). Data were partitioned by metadata factors and differences in relative abundance tested using ANOVA, followed by post-hoc Games-Howell test with the Benjamini-Hochberg FDR used as a multiple testing correction (Benjamini and Hochberg, 1995). For testing the presence of genes involved in cellulose digestion, KEGG data were screened for pathways that mapped to COGs involved in cellulolysis and data extracted. Pair-wise comparisons were performed using Welch's t-test (Welch, 1947) with the Benjamini-Hochberg FDR.

TAXONOMIC CLASSIFICATION OF OTUs
Quality-control of sequence data yielded a high number of highquality sequences, of varying length, from a subset of the studies reported in Table 1, (Tables 2A,B). Consistent with the microbiota of vertebrates in general, the avian gut microbiota appears to harbor mostly OTUs belonging to Bacteroidetes, Firmicutes, and Proteobacteria (Figure 1). Members of the phylum Firmicutes were present in all samples analyzed, while Proteobacteria and Bacteroidetes were also widespread (Proteobacteria: 90% of clone samples, 100% of short-read samples; Bacteroidetes: 80% of clone samples, 87% of short-read samples). These three phyla are commonly observed within gut environments, and specific lineages of these phyla are frequently studied for their symbiotic roles, for example Bacteroides thetaiotaomicron starch degradation in humans (Dongowski et al., 2000;Xu et al., 2003;Sears, 2005), and Lactobacilli-associated bile salt hydrolase activity in mice and chickens (Tannock et al., 1989;Tanaka et al., 1999;Knarreborg et al., 2002). To a lesser extent, Actinobacteria (65% of clone samples, 89% short-read samples) and Tenericutes (65%  of clone samples, 58% short-read samples) were also reasonably common throughout the data. Within the short-read data, a higher proportion of unclassified OTUs was observed, which may be due to a lack of phylogenetic resolution due to shorter read length. Alternatively, it has been shown that the use of the adapter/barcode construct in a single-step PCR, as is commonplace in pyrosequencing studies, can negatively affect taxonomic classification (Berry et al., 2011).

FACTORS SHAPING THE AVIAN MICROBIOTA: STUDY vs. HOST
PERMANOVA testing of the short-read data set revealed that the largest factor contributing to the shaping of the microbiota was the study itself (Table 3). This finding may be a real result, as most studies focused on a single bird geographically isolated from other studies (i.e., the "study" variable is the product of host and location), or may be an artefact resulting from the specific DNA extraction and PCR techniques involved (Boom et al., 1990;Suzuki and Giovannoni, 1996;Martin-Laurent et al., 2001;Sipos et al., 2007;Berry et al., 2011;Kennedy et al., 2014). In order to resolve this issue, we hypothesized that if the host species was truly driving the differences observed between studies, then the phylogenetic differences between taxonomically similar bacterial lineages within each study would be smaller between studies with a closely related host bird. Alternatively, a study that investigated a range of host birds would have greater within-study variation than a study that investigated a single host. We identified three studies that sequenced overlapping regions of the bacterial 16S rRNA gene ( Table 1, Unno, 2010, Dewar, 2013 andPRJEB3384) and observed that two bacterial genera were conserved across all three studies, namely Bacteroides and Clostridium. Sequences associated with these taxa were extracted from the main dataset and unweighted UniFrac distances were calculated between each biological replicate. The within-and between-study UniFrac distances are reported in Figure 2 and, consistent with our prediction, the within-study and betweenstudy difference was similar when the data originated from a closely related host (Figure 2, Dewar, 2013, LittlePenguin andDewar, 2013. LittlePenguin). By contrast, the differences between Dewar, 2013and Unno, 2010, and LittlePenguin and Unno, 2010 were higher than the within-group difference for Clostridium and elevated compared to the penguin/penguin comparisons for Bacteroides. The within-group differences were higher for Unno, 2010-Bacteroides than for other groups, but this may be a result of the Unno, 2010 study itself analysing several different birds.
Although the different methodologies employed in the various studies are likely to have some impact on the results, we concluded that this was overshadowed by the impact of the host organism and proceeded to analyse other metadata factors.

FACTORS SHAPING THE AVIAN MICROBIOTA: BIOLOGICAL FACTORS
Standard ecological diversity indices revealed varying degrees of microbial diversity among the birds studied (Table 2B). In agreement with our previous observations of low microbial diversity within the kakapo hindgut (Waite et al., 2012(Waite et al., , 2013, the diversity estimators for kakapo were among the lowest observed. Consistent with previously reported mammalian findings (Ley et al., 2008a), and with more targeted avian studies (Zhu et al., 2002;Lucas and Heeb, 2005;Banks et al., 2009;Benskin et al., 2010), the host organism was the strongest driver of community structure in the clone-library data and second strongest in the short-read data ( Table 3). Other factors were still significantly associated with shaping the gut community but their fit to the data was lower. The fit for any particular factor across the data was quite low (Table 3), which is likely a result of compounding variables from the individual studies, rather than a real lack of influence of these factors. In order to account for this variation, CCA was used to visualize patterns in the data that could be accounted for by the factor of interest. Results are summarized in Figure 3 and show clear clustering of data for clone samples, but weak clustering for short-read data (Figure 4). This lack of resolution within the short-read data is likely due to the loss of OTU phylogenetic information due to non-overlapping 16S rRNA gene regions between studies. Due to the lack of phylogenetic relationship between OTUs, each OTU is considered equally different from every other OTU (Lozupone and Knight, 2005) and hence evolutionary information is lost.

FUNCTIONAL PREDICTION OF THE GUT MICROBIOTA
Ultimately, the study of microbial communities is of little biological value unless the functional potential of the community, or individual members, is considered. Statistical testing revealed differences in many predicted functional pathways when data were partitioned by host, but this finding was ignored as it is a likely side-effect of 16S rRNA prediction (i.e., if the 16S rRNA-defined communities differ between hosts, the metagenomic prediction based on 16S rRNA community is also likely to differ). Metagenomes were instead partitioned by diet, captivity and gut location sampled and these categorizations of data revealed interesting differences in functional capability ( Table 4).
Captive birds were predicted to have a microbiota with enhanced capability for carbohydrate metabolism and a lower rate of microbial genes associated with infectious disease. When comparing predicted metagenomes by diet, the microbiota of carnivores was predicted to have a greater capability for amino acid and energy metabolism when compared to herbivores, a finding previously reported in mammals (Muegge et al., 2011). The grain-fed microbiota was predicted to have a higher capability for carbohydrate metabolism than that of herbivores. Genes involved in lactate production were predicted in all samples, which is not surprising as lactate is a known by-product of microbial activity in the ceca and is a major metabolic precursor for glucose in avians (Brady et al., 1979;Ogata et al., 1982;Franson et al., 1985). These findings provide support for the fitting of metadata categories to the samples, as the factors that contribute to shaping the microbiota were also supported by known functional roles of these microorganisms.
Partitioning of data by sample site revealed several key influences on the predicted functionality of the microbiota. For example, genes grouping into the KEGG grouping "signaling molecules and interaction" were lowest in faecal samples. This grouping includes an array of genes involved in cell adhesion molecules and cytokine receptors and is likely to be involved in host/bacteria interactions. Genes involved in carbohydrate metabolism were at their lowest in foregut samples from kakapo, and elevated in the hindgut, consistent with the fact that most birds utilize their hindgut/cecum for carbohydrate fermentation (McNab, 1973;Mead, 1989). Interestingly, the influence of diet did not match differences in the predicted ability of the microbiota to degrade cellulose. Between the three diet groupings, β-1,4-endoxylanase was more abundant in carnivorous birds than herbivorous birds. β-xylosidase activity was predicted to be higher in grain-fed birds than strictly herbivorous birds, while xylanase was higher in herbivorous birds than grain-fed (Table 4). When taken as a proportion of the total cellulolytic potential, the microbiota of carnivorous birds had a higher predicted occurrence of βxylosidase than that of herbivorous birds, and a higher occurrence of Cellulase M than grain-fed birds. Between the non-carnivorous birds, Cellulase M and xylanase accounted for a higher proportion of cellulolytic potential in the herbivorous birds, and β-glucosidase and β-xylosidase in grain-fed birds. These genes

Frontiers in Microbiology | Microbial Symbioses
May 2014 | Volume 5 | Article 223 | 6 FIGURE 2 | Unweighted UniFrac distances for within-and between-study comparisons. Distances were calculated by extracting reads classified as Clostridium (top) and Bacteroidetes (bottom) from each sample and constructing neighbor-joining phylogenetic trees based on average-neighbor distances between aligned sequences. Differences between each pair of samples were categorized as being the distance between samples from the same study or from different studies and plotted accordingly (blue = within study, orange = between study). The study "Dewar, 2013" investigated the faecal microbiota from little, king, macaroni, and gentoo penguins. The study "LittlePenguin" investigated the faecal microbiota of little penguins, and "Unno, 2010" the microbiota of a chicken, duck, and goose from a farm.
were detected in a range of bacterial phyla within the avian gut, but particular bacterial families were enriched in the gut microbiota, likely contributing to these differences in relative gene abundance. Of the PICRUSt OTUs that carried cellulolytic potential, members of the Bifidobacteriaceae, Bacteroidaceae, and Lactobacillaceae were highly represented in metagenomes which exhibited elevated β-xylosidase and β-glucosidase levels. Leuconostocaceae were enriched in predicted metagenomes with elevated Cellulase M and β-xylosidase. Interestingly, higher abundance of xylanase genes was pre-dominantly associated with abundance of the Enterobacteriaceae, which may reflect the influence of the Proteobacteria-rich kakapo microbiota. When normalized to a proportion of the total cellulolytic gene abundance, predicted proportions of β-1,4-endoxylanase were not significantly different between dietary groupings. Although not necessarily intuitive, these findings are supported by previous observations that the cellulolytic potential of the avian hindgut is minimal (Barnes, 1972;McNab, 1973;Mead, 1989), and correlates with the observation that cellulolytic pre-digestion of feed boosts energy harvest and weight gain (Józefiak et al., 2006;Yu et al., 2008;Cowieson et al., 2010;Mathlouthi et al., 2011;Ghahri et al., 2012;Ribeiro et al., 2012) in farmed broiler chickens. Caution must be taken in interpreting these predictions, as a recent study has shown that the functional capabilities of the gut microbiota are dependent on community membership as well as genetic potential (Berry et al., 2013). Furthermore, the PICRUSt prediction framework can only account for sequences that can be accurately mapped to the existing database, with no provision for sequences representing novel, or unstudied, bacterial lineages. Nevertheless, the framework provided high-level predictions that were consistent with the known state of avian microbiology and therefore represents an excellent pathway for generation of novel hypotheses and for general annotation of 16S rRNA gene amplicon studies.
In summary, we have conducted a comprehensive metaanalysis of publicly available avian microbiota sequences and tested whether, despite notable differences in physiology between avians and mammals, the factors that drive community structure are the same. We show that the avian host species is the strongest factor in determining community composition and decoupled  this effect from potential study bias where the data allowed. Finally, we have analyzed the potential functional profiles of 16S rRNA gene amplicon data and found that the genomic potential predicted of the communities fits well with the existing literature, and is therefore an excellent platform to leverage these data into new hypotheses and lines of inquiry.