A Metagenomics Investigation of Carbohydrate-Active Enzymes along the Gastrointestinal Tract of Saudi Sheep

The digestive microbiota of humans and of a wide range of animals has recently become amenable to in-depth studies due to the emergence of DNA-based metagenomic techniques that do not require cultivation of gut microbes. These techniques are now commonly used to explore the feces of humans and animals under the assumption that such samples are faithful proxies for the intestinal microbiota. Sheep (Ovis aries) are ruminant animals particularly adapted to life in arid regions and in particular Najdi, Noaimi (Awassi), and Harrei (Harri) breeds that are raised in Saudi Arabia for milk and/or meat production. Here we report a metagenomics investigation of the distal digestive tract of one animal from each breed that (i) examines the microbiota at three intestinal subsites (small intestine, mid-colon, and rectum), (ii) performs an in-depth analysis of the carbohydrate-active enzymes genes encoded by the microbiota at the three subsites, and (iii) compares the microbiota and carbohydrate-active enzyme profile at the three subsites across the different breeds. For all animals we found that the small intestine is characterized by a lower taxonomic diversity than that of the large intestine and of the rectal samples. Mirroring this observation, we also find that the spectrum of encoded carbohydrate-active enzymes of the mid-colon and rectal sites is much richer than that of the small intestine. However, the number of encoded cellulases and xylanases in the various intestinal subsites was found to be surprisingly low, indicating that the bulk of the fiber digestion is performed upstream in the rumen, and that the carbon source for the intestinal flora is probably constituted of the rumen fungi and bacteria that pass in the intestines. In consequence we argue that ruminant feces, which are often analyzed for the search of microbial genes involved in plant cell wall degradation, are probably a poor proxy for the lignocellulolytic potential of the host.


INTRODUCTION
The digestive tract of humans and animals is home to a diverse ecosystem of trillions of microbial cells (Qin et al., 2010;Costello et al., 2012). The digestive microbiota confers abilities that are not encoded by the mammalian genome. In particular, the intrinsic (viz. encoded by the host genome) digestive capabilities of mammals is extremely reduced and is essentially limited to starch, sucrose, and lactose (Cantarel et al., 2012;El Kaoutari et al., 2013a). One of the key roles of the digestive microbiota is to help the host digest its diet, and in particular the large array of complex carbohydrates found in the mammalian diet. The digestive microbiota of humans and of a wide range of animals has recently become amenable to in-depth studies due to the emergence of DNAbased metagenomic techniques that do not require cultivation of gut microbes. Technological progresses offer the ability to explore both taxonomical and functional profiles of the microbial DNA extracted from microbial communities with an everincreasing depth. The most numerous microbiota investigations are certainly those of the human digestive microbiota, which has been linked to health and disease by numerous studies.
The digestive microbiota of other mammals has also been studied extensively, essentially by sampling feces and subjecting them to an analysis of the taxonomical diversity via 16S RNA sequencing (Ley et al., 2008). In comparison, fewer studies have attempted to address the question of the functional digestive potential, although Muegge et al. (2011) have shown that diet shapes both the taxonomical and the functional profiles of mammalian microbiota. The major drivers of these taxonomical and functional shifts correspond to the digestive tract itself (multichamber foregut vs. hindgut fermenters) and to the actual food composition (Sanders et al., 2015). With the exception of starch, plant-derived polysaccharides are left undigested by the mammalian enzymes and are broken down and fermented by their digestive microbiota. This is particularly important for herbivorous animals that harvest their carbohydrate solely from non-starchy plant food, and especially for the species that live in harsh conditions where a maximum of energy must be derived from small amounts of plant feed. This digestion of plant derived complex carbohydrates is notoriously difficult, because of the recalcitrant nature of most plant carbohydrates that have evolved to be resistant to enzymatic digestion (Mba Medie et al., 2012) and because the variety of bonds to cleave is such that no single genome can encode the corresponding variety enzymes necessary to break plant polysaccharides to fermentable sugars (El Kaoutari et al., 2013a).
Sheep (Ovis aries) are exclusively herbivorous mammals that were among the earliest domesticated farm animals. Of the several hundred breeds of sheep identified by the United Nations Food and Agriculture Organization, three breeds of sheep are commonly kept as livestock in Saudi Arabia, namely Najdi, Noaimi (Awassi), and Harrei (Harri) sheep (Supplementary Figure 1). These breeds tolerate the hot and cold weather of the kingdom. In the wild these sheep graze on short green grasses, clover and hay. These three breeds are used for meat production, but Najdi and Noaimi (Awassi) sheep are also used for milk production. Like all sheep, these animals have a typical ruminant gastrointestinal tract (GIT) including four fermentation chambers, composed of the rumen, the reticulum, the omasum and the abomasum followed by a small intestine and a large intestine (Figure 1). It is widely accepted that the breakdown and fermentation of roughage is performed in the first four fermentation chambers (Herrero et al., 2013;Morgavi et al., 2013), but few studies focus on the distal gut except for feces. In order to evaluate the exact digestive role of downstream sheep intestines, we have performed a metagenomics investigation of one Najdi, one Noaimi (Awassi), and one Harrei (Harri) sheep, which examined the microbiota at three GIT subsites (small intestine, mid-colon, and rectum) for each animal. We have also performed an in-depth analysis of the carbohydrateactive enzymes genes encoded by the microbiota in order to evaluate the lignocellulolytic potential at these three subsites.

Sample Collection
Three sheep (one Najdi, one Noaimi, and one Harrei) were obtained at Jeddah's slaughterhouse market. The samples used in this study were taken immediately after animal sacrifice, following best practice veterinary care. The animals used in this study were 12-20 months old and were fed a dried mix of Ammophila arenaria, Medicago sativa, Hordeum vulgare, and Sorghum bicolor for 10-25 days at the slaughterhouse market before sacrifice. A schematic drawing of the sheep digestive system and of the sampling sites is presented in Figure 1. About 30-35 g sample from mid-small and mid-large intestine and from the rectum region were collected into sterile 50 ml Falcon tubes. The intestine samples were mostly semi liquid while the rectal samples were often solid. Immediately after collection the samples were kept refrigerated for 24-28 h then immersed in AquaStool kit solution (MultiTarget Pharmaceuticals LLC, Colorado, USA) according to the manufacturer's instructions.

DNA Extraction
Bacterial DNA was extracted and purified using the commercial QIAamp DNA Stool Mini Kit (Qiagen: https://www.qiagen.com/) according to the manufacturer's protocol. A quantity of 200 mg of each animal sample was used for DNA extraction. Quantification and assessment of extracted DNA was carried out using a NanoDrop ND-1000 spectrophotometer (NanoDrop, Inc.). The extracted materials were stored at −80 • C until use. The concentration of the extracted DNA is reported in Supplementary Table 1.

Metagenomics Analyses
The samples were submitted to LGC Genomics GmbH (Berlin, Germany) for all sequencing tasks. The sequencing technology chosen was Illumina MiSeq V3 delivering 300 bp paired-end reads. All sequence reads were deposited in the Sequence Read Archive (Leinonen et al., 2011).

16S rDNA Amplicon Sequencing and Data Pre-Processing
The V3-V4 regions of the bacterial 16S rRNA gene were amplified using two universal primers 341F (CCTACGGGN GGCWGCAG) and 785R (GACTACHVGGGTATCTAAKCC) complemented with sample-specific barcodes for multiplexing during sequencing. The primary sequencing output data were demultiplexed using the Illumina bcl2fastq 1.8.4 tool to generate the FASTQ files corresponding to each library. Before clipping the barcode and the adapter sequences, reads with the following criteria were filtered out: (i) reads with more than one mismatch per barcode, (ii) reads with missing barcodes, one-sided barcodes, or conflicting barcode pairs and (iii) reads with a final length <100 bases.
Additional filtering criteria were considered before primer detection and clipping. Thus, only reads with pairs of primers (Fw-Rev or Rev-Fw) and reads with less than three mismatches per primer were kept for further analysis. Furthermore, all reads were turned into forward-reverse primer orientation after removing the primer sequences. The forward and reverse reads were combined using BBMerge v34.48 (http://bbmap. sourceforge.net/). Read quality was controlled using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for each sample. Reads with an average Phred quality score <33 or containing ambiguous bases were removed.
Operational Taxonomic Units (OTU) clustering has been performed using Mothur v1.35.1 using a 97% identity threshold (cluster.split method). The representative sequences of each OTU were queried against the Ribosomal Database Project release 11.4 (https://rdp.cme.msu.edu/) using BLAST with the following parameters: E-value of 1 or less and percent identity of 90% or higher. Next, the OTU diversity was analyzed using QIIME v1.9.0 (http://qiime.org/) following the standard workflow to determine the membership of all OTUs to different taxonomic levels and groups. The within-sample (alpha diversity and Shannon's diversity index H) as well as between-sample diversity (beta) were calculated using the R package "vegan" (Oksanen et al., 2017). Downstream analysis and result visualization were done with custom R scripts based on the "ggplot2" package (Wickham, 2009). Hierarchical clustering between samples was performed using "hclust" function of R (R Core Team, 2016) with "complete" method applied to a distance matrix derived from the beta diversity using the "betadiver" function from the "vegan" package.

Metagenomic Shotgun Sequence Data Analysis
The pre-processing of the shotgun sequencing output data was performed following the same workflow as described above. Assembly of the reads was performed using Trinity (Grabherr et al., 2011) with default parameters. Supplementary Table 2 reports the sequencing and assembly statistics for each sample.

Carbohydrate-Active Enzymes Analysis from Metagenomics Data
The detection of the assembled DNA sequences that encode carbohydrate-active enzymes was done using FASTY (Pearson et al., 1997) against a custom sequence library made of the catalytic domains of ∼150,000 sequences of glycoside hydrolase (GH), polysaccharide lyase (PL), glycosyl transferases (GT), carbohydrate esterases (CE), auxiliary activity (AA), and of their associated carbohydrate-binding modules (CBMs) derived from the carbohydrate-active enzymes database (www.cazy.org; Lombard et al., 2014). We retained as positive hits those that had 30% identity or more and an e-value of 10 −6 or better with a sequence in the library.

RESULTS AND DISCUSSION
Community Structure and Diversity Derived from 16S Analysis A total of 625,937 16S rRNA sequences were obtained from the three intestinal sites sampled (small intestine, large intestine, and rectum) of three sheep. The obtained 16S rRNA sequences represented 7,147 unique OTUs which could be assigned to 24 phyla, 69 order, and 186 genera. Overall, the bacterial communities were mostly dominated by Firmicutes (42.5% of the total sequences and 72% of different OTUs) followed by Bacteroidetes (21.5% sequences and 13.9% OTUs), Proteobacteria (11.5% of total sequences and 2.6% of OTUs), Candidate Division TM7 (7.7% of sequences and 3.4% of OTUs), Actinobacteria (2.5% of total sequences and 2.8% of OTUs), and Planctomycetes (6% of sequences and 1.5% of OTUs). Other phyla were represented in minor proportions ranging from 2% to <0.1% of sequences and OTUs ( Table 1). The taxonomic profiles were globally similar between the three animals along the intestinal tract especially at the phylum level. The core intestinal tract microbiota of the three sheep was composed of the major phyla including Firmicutes, Bacteroidetes, Actinobacteria, Candidate division TM7 and Proteobacteria.
The large intestine and rectal microbiota profiles of the Najdi sheep were largely similar despite minor taxonomic differences (Supplementary Figure 2). In fact, the rectal microbiota of this animal was characterized by higher number of OTUs within Firmicutes especially Clostridiales order with 1,152 different OTUs vs. 845 OTUs in large intestine. Bacteroidales (main order of Bacteroidetes) were similarly represented in both large intestine and rectum while this phylum was more abundant in the large intestine (33.7% of sequences) than in the rectum (17.6% of sequences). The diversity within the three GIT subsites in the three animals is given in Supplementary Figure 4. The small intestine of Najdi sheep had a unique taxonomic profile compared to the large intestine and rectum, with a very low diversity (α-diversity index 39.5) and a particularly high number of sequences that belong to Proteobacteria with 66% of total sequences corresponding to Escherichia and Shigella genera. Lactobacillus genus (Firmicutes) was also highly represented in small intestine of Najdi sheep with 29% of total sequences (Supplementary Table 3).
Samples derived from the large intestine and rectum of the Noaimi sheep had very similar taxonomic profiles at both phylum and order level. They were characterized by the predominance of Firmicutes (mainly Clostridiales), Bacteroidetes (mainly Bacteroidales) and Planctomycetes (mainly Planctomycetales). The number of sequences and OTUs at the Order level are given in Supplementary Table 4.
Here again the small intestine had a distinct taxonomic profile compared to other intestinal subsites. It was characterized by a low diversity of different bacterial groups (only 555 OTUs in total and α-diversity index 92.1) and a particularly low relative abundance of Bacteroidetes members which represent <1.5% of sequences in total. In addition, the small intestine was enriched in other phyla such as Actinobacteria, Tenericutes, Proteobacteria, and Candidate division TM7 (new candidate phylum) compared to the other subsites (Figure 2 and Table 2).
The microbial communities within the intestinal tract of the Harrei sheep followed the same trend as with the two other sheep with the highest microbial diversity residing in the large intestine and rectum, and a very low abundance of Bacteroidetes in the small intestine. We observed the highest abundance of Candidate division TM7 and Cyanobacteria in the small intestine compared to the two other subsites (Figure 2 and Table 2).
The distribution of microbial communities at a given intestinal subsite thus appears to be stable across the three animals studied here. The microbial diversity and the relative abundance were correlated to the body subsite. Overall, the small intestine was characterized by low microbial diversity and abundance compared to the large intestine and rectum (Supplementary Figure 4). At the phylum level, the most striking observation is the very low abundance of Bacteroidetes in the small intestine of the three sheep compared to the other intestinal subsites. The large intestine and rectal bacterial communities were characterized by a high predominance of members of Firmicutes (Clostridiales) and Bacteroidetes (Bacteroidales).
Hierarchical clustering based on beta diversity between different sheep and intestinal subsites confirmed the results described above. Supplementary Figure 3 shows that sheep microbiota diversity in large intestine and rectum were similar to each other (especially within the same animal) while all samples from small intestine showed a particular profile and clustered together.
Finally we note that a large portion of identified OTUs from different samples did not belong to any known or candidate phylum and was flagged as "unclassified." The proportion of unclassified OTUs varied between ∼8% in the small intestine  to ∼18% in the large intestine of Harrei sheep (Figure 2 and Table 2).

CAZyme Analysis (Functional Analysis)
The reads of each sample were assembled into contigs. While the final number of reads obtained for each sample was of the same order of magnitude (between 1.5 and 2.9 million reads), the number of assembled contigs was lower and with a larger average length for the three small intestine samples (Supplementary Table 2), suggesting that the small intestine of sheep may be characterized by a lower organismal concentration than the other two subsites. By contrast there was no major difference in the number of assembled contigs and in the contig length for the large intestine or rectal samples.
Because carbohydrate-active enzymes often have a modular structure where the catalytic domain is appended to a variable number of other domains that may be catalytic or carbohydratebinding, all contigs were analyzed by pairwise alignment to a library of ∼150,000 individual functional domains (modules) derived from the five main categories of modules described in the carbohydrate-active enzymes database, namely glycoside hydrolases (GH), polysaccharide lyases (PL), carbohydrate esterases (CE), glycosyltransferases (GT), auxiliary activity (AA), and CBMs (Lombard et al., 2014). From a total number of ∼1.5 million contigs, 59,337 genes encoding carbohydrateactive enzymes were detected (average ∼6,600 per body site), representing ∼4% of the total of contigs that were assembled. The count of CAZy families for each body site and each animal is given in Supplementary Table 5. No similarity to enzymes of  β-N-acetylglucosaminidase GH20 765 1,088 §Taken from Hess et al. (2011). *This work; average of large intestine and rectum for the three animals, scaled to the same overall number of GHs as in Hess et al. (2011). the AA category could be detected, a finding not unexpected given that AA enzymes are oxygen-utilizing oxidoreductases and the three samples subsites are strictly anaerobic. The relative abundance of the families of CAZymes involved in glycosidic bond cleavage [the glycoside hydrolases (GHs) and polysaccharide lyases (PLs)] was computed and the resulting profiles are shown in Figure 3 for the 32 most abundant families. Regardless of the animal, the profile of the GHs and PLs of the small intestine samples is much lower than that in the other sites likely due to the lower concentration of bacteria, a situation similar to that encountered in the human duodenum, where only a very limited number of CAZymes can be found due to the lower concentration of resident bacteria (Angelakis et al., 2015). On the other hand, when the 32 most abundant families were ranked by their average abundance across subsites, the same families tended to be found at the same rank, regardless of the animal (Figure 3) for the large intestine and rectal subsites. By contrast the family profile of the three small intestine samples was markedly different, with a much lower average value and different rank for the most abundant families. This observation mirrors what was observed upon 16S analysis, with the large intestine and rectal subsites being much richer and diverse than the small intestine.
We have then examined the putative function encoded by the most abundant carbohydrate-active enzymes families in the large intestine and rectal samples, viz. the sites with the most abundant profiles (Figure 3). In these sites, regardless of the animal and for both sites, GH13 is consistently the most abundant family, reflecting the universal role of starch/glycogen as a central nutrient source and reserve macromolecule. The other most abundant families include families with miscellaneous substrates (GH2 and GH3), and more specific families like GH43 (α-L-arabinofuranosidases), GH20 (Nacetyl β-glucosaminidases), GH78 (α-L-rhamnosidases), GH97 (starch), GH92 and GH38 (α-mannosidases), GH77 (amylomaltase), GH29 and GH95 (α-L-fucosidases), GH36 (α-galactosidases), CE4 and CE1 (deacetylases), GH33 (sialidases and other ulosonic acid hydrolases), GH25 (lysozymes). Interestingly, the relative abundance of enzymes that target cellulose and xylan is lower than what was found in a large scale metagenomics investigation of the cow rumen (Hess et al., 2011;

CONCLUSIONS
Most studies of the digestive microbiota focus on animal (or human) feces, which are assumed to be representing the diversity (taxonomical and functional) of the microbes involved in the breakdown of complex carbohydrates and glycans. Fewer studies focus on the rumen of herbivores and even less on the intestines. Our study reveals a remarkable functional stability of the CAZyme profiles across the large intestine and rectal sites of all examined sheep. This functional resilience is in agreement with earlier observations made on the human gut microbiota (El Kaoutari et al., 2013b). The functional redundancy in the digestive microbiota ruminants can explain the adaptability of these animals toward various types of feeds and to rapid dietary changes.
The families of enzymes that target cellulose and xylan, were found to be less abundant in the sheep distal gut sites than in metagenomics studies of the cow rumen. Inversely, families of enzymes involved in peptidoglycan (for instance GH25) and fungal polysaccharides (for instance GH16, GH20) were found to be more abundant than in the rumen. Taken together these observations strengthen the idea that the bulk of cellulose breakdown is performed in the upperpart of the gastrointestinal tract (i.e., in the rumen) and that the carbon sources of the distal intestinal flora may comprise fungal and bacterial cells that have passed from the rumen into the intestines, along with residual fibers. An unexpected consequence is the realization that ruminant feces, which are often analyzed for the search of microbial genes involved in plant cell wall degradation, are probably a poor proxy for the lignocellulolytic potential of their host.

ETHICS STATEMENT
The animals were purchased at the slaughter house of Jeddah, Saudi Arabia, immediately after sacrifice.

AUTHORS CONTRIBUTIONS
Conceptualization, BH, SA, ER, and HA; Sample collection, SA; DNA extraction, AE; Computer analyses of carbohydrate-active enzymes, ED and VL; Data analysis, BH, AE, VL; Writing, BH with help from AE, SA, and VL.