Edinburgh Research Explorer Identification, Comparison, and Validation of Robust Rumen Microbial Biomarkers for Methane Emissions Using DiverseBreeds and Basal Diets

to select cattle that emit less methane (CH 4 ), which is a potent greenhouse gas. It is known that methane production (g/kgDMI) and to an extent the microbial community is heritable and therefore biomarkers can offer a method of selecting cattle for low methane emitting phenotypes. In this study a wider range of Bos Taurus cattle, varying in breed and diet, was investigated to determine microbial communities and genetic markers associated with high/low CH 4 emissions. Digesta samples were taken from 50 beef cattle, comprising four cattle breeds, receiving two basal diets containing different proportions of concentrate and also including feed additives (nitrate or lipid), that may inﬂuence methane emissions. A combination of partial least square analysis and network analysis enabled the identiﬁcation of the most signiﬁcant and robust biomarkers of CH 4 emissions (VIP > 0.8) across diets and breeds when comparing all potential biomarkers together. Genes associated with the hydrogenotrophic methanogenesis pathway converting carbon dioxide to methane, provided the dominant biomarkers of CH 4 emissions and methanogens were the microbial populations most closely correlated with CH 4 emissions and identiﬁed by metagenomics. Moreover, these genes grouped together as conﬁrmed by network analysis for each independent experiment and when combined. Finally, the genes involved in the methane synthesis pathway explained a higher proportion of variation in CH 4 emissions by PLS analysis compared to phylogenetic parameters or functional genes. These results conﬁrmed the reproducibility of the analysis and the advantage to use these genes as robust biomarkers of CH 4 emissions. Volatile fatty acid concentrations and ratios were signiﬁcantly correlated with CH 4 , but these factors were not identiﬁed as robust enough for predictive purposes. Moreover, the methanotrophic Methylomonas genus was found to be negatively correlated with CH 4 . Finally, this study conﬁrmed the importance of using robust and applicable biomarkers from the microbiome as a proxy of CH 4 emissions across diverse production systems and environments.


INTRODUCTION
Recent metagenomic analyses have highlighted the exciting opportunity that rumen microbial biomarkers of methane (CH 4 ) emissions could enable the selection by breeding of cattle which emit less CH 4 and ultimately may lower agricultural greenhouse gas (GHG) emissions. Ross et al. (2013) highlighted that this approach may surpass current prediction accuracies that are based on the host genome, especially for traits that are difficult to measure and largely influenced by the gut microbiome. Methane has a large impact on global warming, being 28-fold more potent as a GHG than carbon dioxide (CO 2 ) (IPCC, 2014). It is one of the main anthropogenic sources (IPCC, 2014) and ruminants are major producers of CH 4 , accounting for 37% of total GHG from agriculture in the UK (Cottle et al., 2011). Methane results as an end product of anaerobic microbial fermentation in the rumen and it significant negative economic and environmental impacts on animal production (Johnson and Johnson, 1995). A limited number of archaeal taxa within Euryarchaeota are methane producers and the genes involved in this process are wellcharacterized (Thauer et al., 2008;Leahy et al., 2010;Borrel et al., 2013). The hydrogenotrophic pathway catalyzing the conversion of CO 2 to methane is dominant in the rumen, and occurs in Methanobrevibacter spp. (Hook et al., 2010;Danielsson et al., 2017). However, methylotrophic methanogenesis also occurs in the Methanomassilliicoccales group , converting methylamine or methanol derived from digestion of feed constituents to methane (Poulsen et al., 2013;Vanwonterghem et al., 2017). In addition, more work is needed to identify the bacterial populations interacting with methanogens for H 2 or involved in different metabolic pathways associated with lactate or volatile fatty acids (VFA) including propionate, butyrate, or acetate which are known to impact differently methane emissions (Moss et al., 2000;Janssen, 2010;Wanapat et al., 2015;Kamke et al., 2016). For example, Megasphaera elsdenii is the major rumen bacterium involved in the acrylate pathway converting lactate to propionate and, in the absence of lactate, producing acetate and butyrate but not propionate from glucose (Hino et al., 1994;Russell and Wallace, 1997). Higher abundance of bacteria populations involved in propionate metabolism is associated with reduced methane emissions compared to acetate metabolism because more H 2 is utilized per mole VFA thus reducing availability for methane production (Janssen, 2010;Wanapat et al., 2015). Methanotrophic populations within both archaea and bacteria are known to metabolize methane as a carbon and energy source but the impact of such populations in the rumen seems likely to be minor (Parmar et al., 2015;Wallace et al., 2015).
Strategies to lower methane emissions in animal production are becoming an important field of research with the aims to enhance fermentation end-products that are useful to the host and reduce GHG emissions (Immig et al., 1996;Knapp et al., 2014). It is well-known that diet has an impact on the microbial community composition and the genes carried by these populations Henderson et al., 2015). Diets with a higher content of concentrate (e.g., grain) compared to a forage diet (e.g., grass and silages) tend to produce lower methane emissions. For example, Giger-Reverdin and Sauvant (2000) observed that maximum methane emissions occurred between 30 and 40% of grain-based concentrate in the diet. Many feed additives have been explored for their impact on methane emissions. Addition of nitrate or polyunsaturated lipids (e.g., from rapeseed or linseed oil) to the diet showed promising results (Veneman et al., 2015;Guyader et al., 2016). The percentage of concentrate as constituent of the diet strongly affected this inhibitory effect (Duthie et al., 2017). Mechanisms behind this effect are partly explained by the possible inhibition of H 2 producers in the presence of oil whilst nitrate is thought to act as a competitor with methanogens for H 2 and may also be toxic to methanogens (Guyader et al., 2015). Besides the use of different diets or additives, recent research has identified links between the rumen microbiome and the host animal (Roehe et al., 2016;Duthie et al., 2017;Malmuthuge and Guan, 2017) and it has been established that host genetics influences methane emissions (Pinares-Patiño et al., 2013;Herd et al., 2014). The rumen microbiome may be the link between host genetics and methane emissions. Therefore, the impact of basal diets, additives and breeds on the microbiome may be considered and evaluated for the identification of robust biomarkers of CH 4 emissions.
Until now, proxies to predict methane emission phenotypes based on rumen samples including phylogenetic, genomic, or metabolomic markers have not been considered to be robust and accurate, and are also expensive (Negussie et al., 2017). This limitation has been partly attributed to the low number of ruminants studied for the identification and validation of biomarkers. There are inherent difficulties comparing the results of direct quantitation of methanogens using qPCR across different studies due to differences in sampling methods or primer target (McCartney et al., 2013). Quantitative PCR has produced conflicting results when correlating absolute methanogen abundance with CH 4 emissions (Mosoni et al., 2011;Morgavi et al., 2012). However, a stronger correlation was obtained calculating relative abundance between the Archaea and Bacteria abundance (A:B ratio) in rumen digesta samples .
Reliable knowledge about the relationship between CH 4 emissions and both the microbiome and the metabolites released is very important for improving the identification of biomarkers (McCartney et al., 2013;Ross et al., 2013).
Metagenomics permits the identification of all genes comprising the microbiome and enables taxonomic characterization of the microbial population. Metagenomics has been confirmed to be a powerful method for studying the rumen microbiome (Roehe et al., 2016;Wallace et al., 2017). Roehe et al. (2016) identified 20 genes as biomarkers of methane emissions using a combination of metagenomics and partial least square analyses. Moreover, the same authors showed that these genes clustered together within a genetic network providing a proof of principle about the feasibility of breeding selection by targeting these genes within the rumen microbiome. These preliminary results were obtained on a limited number of beef cattle (n = 8) selected as extreme methane emitters (low or high) and fed with two basal diets (forage or concentrate). Therefore, the possibility to use a large scale method like metagenomics on a set of data from different breeds of beef cattle fed different diets and coupled with VFA monitoring is a great opportunity to identify and validate robust biomarkers of CH 4 emissions.
The aim of this study was (i) to evaluate the effect of two basal diets, and additives on the rumen microbiome of a selection of four beef livestock breeds and to identify robust biomarkers of CH 4 emissions associated with the microbiome or microbial activities, (ii) the identification of robust biomarkers of CH 4 emissions associated with data from the microbial community composition, the relative abundance of microbial populations, the relative abundance of genes within the microbiome or VFA concentrations, all data collected from three independent experiments, and (iii) the comparison of these biomarkers to identify those highly correlated with CH 4 emissions across diverse breeds and diets and the evaluation of the possibility of implementing a breeding strategy using these microbial biomarkers from the rumen microbiome.

Ethics Statement
This study was conducted at the Beef and Sheep Research Centre of Scotland's Rural College (SRUC, Edinburgh, UK). The experiment was approved by the Animal Experiment Committee of SRUC and was conducted in accordance with the requirements of the UK Animals (Scientific Procedures) Act 1986.

Animals, Experimental Design, and Diets
In our previous study (Wallace et al., 2015;Roehe et al., 2016), data on feed efficiency and methane emissions (measured using respiration chambers) were obtained from a 2 × 2 factorial design experiment of breed types and diets using 72 steers from a two-breed rotational cross between Aberdeen Angus (AA) and Limousin (LIM) and completed in 2011. Similar experiments were carried out using purebred Luing (LU) and crossbred Charolais (CH) steers in 2013 and Aberdeen Angus (AA) and Limousin (LIM) rotational crossbred steers in 2014. The data in this study were obtained from samples from those experiments whereby animals with extreme high and low methane emissions (2011) or feed conversion efficiency (2013 and 2014) were selected for whole genome sequencing. The breed type were balanced within experiment comprising 4 AA and 4 LIM in , 9 LU and 9 CH in 2013, and 12 AA and 12 LIM in 2014. Methane emissions were measured individually for 48 h in respiration chambers  and based on this result, 25 animals were considered as low CH 4 emitters whilst the other 25 animals were classified as high CH 4 emitters. The average CH 4 emissions (g/kg DMI) between Low and High CH 4 emitters were significantly different as shown in Figure 1. The animals were offered two complete diets ad libitum consisting (g/kg DM) of ∼500 forage to 500 concentrate or 80 forage to 920 concentrate which are subsequently referred to as forage and concentrate diets, respectively. Nitrate, lipids, or the combination of both were also added to the basal diet and were compared with the control fed with the same diet without additive. The detailed diet composition and proximate analysis has been reported previously by Rooke et al. (2014) and CHx, all samples from Charolais (n = 12); LIMx, all samples from Limousin (n = 13); Luing: all samples from Luing (n = 12). **P < 0.01. Duthie et al. (2016Duthie et al. ( , 2017. Animals were fed ad libitum during the entire experiment including in the respiration chamber. A single sample of rumen fluid for VFA analysis (expressed as molar proportions) was taken by stomach tube (naso ruminal sampling) within 1 h of cattle leaving the chambers in the 2011 experiment. VFA were determined in 2013 and 2014 using samples collected directly at the abattoir. As recommended by Terré et al. (2013), we compared the VFA profiles between samples rather than total VFA concentrations because of the different methods for rumen sampling applied. The acetate-topropionate ratio was calculated and considered as a proxy for H 2 generation.
Samples were obtained from a total of 50 animals balanced for breed type and diet and including the eight post mortem samples previously studied in Roehe et al. (2016), (Table S1).

Genomic Analysis
The animals were fed ad libitum until they left the farm and thereafter slaughtered within 2 h in a commercial abattoir where two rumen fluid samples (∼50 mL) were taken immediately after the rumen was opened to be drained. The main advantage to collect rumen contents after slaughter is to obtain samples representative of both solid and liquid phases. DNA was extracted from the rumen digesta samples following the protocol described in Rooke et al. (2014).
Illumina TruSeq libraries were prepared from genomic DNA and sequenced on an Illumina HiSeq 2500 instrument (2011 samples) and on an Illumina HiSeq 4000 instrument (2013 and 2014 samples) by Edinburgh Genomics (Edinburgh, UK). Bioinformatics analyses using the two sets of data followed the same procedure as previously described in Wallace et al. (2015). Briefly, functional genes including the genes detailed in this study were identified using KEGG genes database (http://www.kegg. jp). Genes with a relative abundance greater than 0.001% were carried forward for downstream analysis.
For 16S rRNA gene analysis, the genomic reads were aligned to the Greengenes database (Desantis et al., 2006) using Novoalign (www.novocraft.com) and also using the Kraken database (Wood and Salzberg, 2014).
Parameters were adjusted such that all hits were reported that were equal in quality to the best hit for each read, and allowing up to a 10% mismatch across the fragment. Further details are included in Wallace et al. (2015). These data can be downloaded from the European Nucleotide Archive under accession PRJEB10338 and PRJEB21624.

Statistical Analysis
Statistical analysis of the metagenomics samples was based on the complete sample profiles as expressed by the pattern of metagenomic reads classified within KEGG ortholog groups with >90% similarity and belonging to a single KEGG ortholog (KO) groups and the relative abundance (percentage) of individual KO group in each profile. Principal coordinate analysis (PCoA) was carried out using Gen-Stat 16th edition (VSN International Ltd, UK) to identify the factors explaining differences observed in the microbial community (phylum level) between samples. Relative abundance of microbial populations and functional genes, Archaea-to-Bacteria (A:B) ratio, Firmicutes-to-Bacteroidetes (F:B) ratio as an indicator of degradation activities carried by the two main phyla in rumen and acetate-to-propionate ratio were compared using General Linear Models and P-values were Bonferroni corrected for multiple testing (SPSS Statistics 22, IBM, USA).
In a network analysis using BioLayout Express3D (Freeman et al., 2007), we identified the distinct functional clusters of microbial genes for each experiment. These networks consist of nodes representing microbial genes and the connecting edges determining the functional linkages between these genes.
Partial least squares analysis (PLS, Version 9.1 for Windows, SAS Institute Inc., Cary, NC, USA) was used to identify the most correlated microbial populations (at the phylum or genus level) or microbial genes associated with methane emissions. This method was successfully applied for the identification of microbial biomarkers in Wallace et al. (2015) and Roehe et al. (2016). The PLS analysis accounted for multiple testing and the correlation between microbial populations or genes as microbial parameters. In addition to microbial parameters, the model included the diet effect (abiotic effector) and additionally the breed type effect (host genetics effect). The model selection was based on the variable importance for projection (VIP) criterion (Wold, 1995), whereby microbial parameters with a VIP<0.8 contribute little to the prediction. Finally, a comparison between different factors identified as highly correlated with CH 4 emissions and therefore considered as potential biomarkers were tested by PLS analysis. In this study, biomarkers of CH 4 emissions will be considered as robust when a similar result is observed across diverse diets and breeds and by comparing all potential biomarker together. A robust biomarker may strengthen the confidence of identifying low-vs. highemitting cattle. Those factors identified to be significant from the microbial community composition, the relative abundance of microbial populations or genes or VFA concentrations. All samples without VFA measurements were removed (N1, N3, N7, and RR41).
The residual methane emissions were calculated using a General Linear Model including diet and breed into the model and measured methane data as dependent variable. These residual methane emissions are thus corrected for diet and breed and were centered and standardized and only used when biomarkers were compared together.
Spearman's correlation analysis was also carried out to determine which factors (the same factors tested by PLS) are correlated with CH 4 emissions using SPSS Statistics 22. P-values ≤ 0.05 were considered significant and tendencies were represented (P-values < 0.1).

Factors Influencing the Differences Observed in Methane Emissions
Several grouping conditions were tested using methane emission values from three independent trials (Figure 1). Average CH 4 emissions were 20.89 ± 0.75 g/kg dry matter intake based on measurements from 50 animals. CH 4 emissions were 1.48-fold higher in the high-CH 4 group (P < 0.001). CH 4 emissions were also higher in animals fed the forage compared to concentrate basal diet (P < 0.001).

Change in Microbial Community Composition between CH 4 Emitters and Diet Treatments
Using the Kraken database for the identification of the 16S rRNA sequences (Phylum level) within the 50 metagenomics datasets, the difference observed within the microbial community composition represented 36.9% over the first two principal coordinate analysis axes ( Figure S2) and 45.2% when the third axis was included (data not shown). The most abundant bacterial phyla (on average) identified were Firmicutes (42.8%), Bacteroidetes (38.6%), Proteobacteria (6.6%), Fibrobacteres (4.9%), and Actinobacteria (2.4%) representing on average 95.3% of the total community ( Figure  S3). Proteobacteria was the only dominant phylum significantly different with a higher abundance in low-CH 4 samples compared to high-CH 4 samples (P = 0.03). Lower abundant phyla such as Deinococcus-Thermus (0.12%, P = 0.006), Chlorobi (0.07%, P = 0.003), Kiritimatiellaeota (0.01%, P = 0.02), Verrucomicrobia (0.12%, P = 0.04), and Calditrichaeota (0.003%, P = 0.01) were also identified as significantly different and generally with a higher abundance in high-CH 4 samples except for Calditrichaeota. Comparing the effect of forage or concentrate diets, a limited number of bacterial phyla (n = 4/32) were affected, which were based on their relative abundance in the rumen minor populations (Table S2). In general, the relative abundance of microbial populations impacted by additives was higher in control treatment except Calditrichaeota and Proteobacteria, the latter being 1.2-fold higher in presence of nitrate compared to the control concentrate treatment (Table  S2). On average, the Firmicutes-to-Bacteroidetes ratio was at 1.22 and not significantly different between methane emitters or diet treatments. Euryarchaeota were not impacted by nitrate or RSC in either concentrate or forage diets.
The archaeal community represented 5.33 ± 0.37% of the total microbial community based on 16S rRNA sequences and higher Shannon diversity was characterized using the Kraken database compared to Greengenes as shown in Figure 2A, with the former identifying more methanogenic groups capable of utilizing acetoclastic, hydrogenotrophic, and methylotrophic pathways to produce methane ( Figure S4). The hydrogenotrophic pathway was highly represented in the rumen content of both high-and low-methane emitting animals, mostly in high emitters and represented on average 96.8% of total methanogens ( Figure S4A). The relative abundance of total methanogens was double in high emitters compared to low emitters. This result was explained by the significant dominance of several populations including Methanobrevibacter (on average 94% of the methanogens), Methanobacterium, Methanococcus, and Methanoculleus species (Figure 2A). On the other hand, the dominant methylotrophic methanogen belonging to Methanomassiliicoccales order was identified as Candidatus Methanomethylophilus, with a relative abundance 7-fold significantly higher in the rumen microbiome of lowmethane emitters compared to high-methane rumen samples (Figure 2A). Finally, the dominant acetoclastic methanogen was Methanosarcina species and represented on average 0.4% of total methanogens (Figure 2A). Overall, Shannon diversity index calculated for total microbial community did not show any significant differences between groups of methane emitters or diet. Focusing on methanogens, a higher diversity in low emitters was confirmed with a Shannon diversity index of 0.55 compared to 0.28 in high emitters (P < 0.001). Effect of the additives on the relative abundance of methanogen populations was not significant whilst methylotrophic methanogenic populations were on average 2.27-fold more abundant in the concentrate diet supplemented with nitrate compared to the control condition and only on average 1.21-fold higher in forage diet supplemented with nitrate compared to the control treatment.
Using the Greengenes annotation, both methanogen diversity (Shannon index H) and composition were lower and only represented by three dominant genera. However, the general results on the dominant populations, methanogen diversity and the importance of methylotrophic methanogens in low-methane emitters were the same but it has to be considered that using this database the minor populations (e.g., acetoclastic methanogens) were not recovered ( Figure 2B and Figure S4B).
Methanotrophic populations were also identified when using the Kraken database, representing a limited part of the microbial community and being about 70-fold less abundant than methanogens (on average 0.1 ± 0.01%). This microbial group was highly dominated by three methanotrophic bacteria including the genus Methylobacterium and to a lesser extent Methylomonas and Methylomicrobium genera. However, only the Methylomonas genus was different between emitters (P = 0.005) or diet treatments (P = 0.005) with a relative abundance 1.7-fold higher in low-compared to high-methane emitters. Finally, the diversity of methanotrophic organisms was greater in high emitters (P = 0.02) compared to low emitters and there was no effect of diet or additives on methanotrophic populations.

Identification of Additional Phylogenetic Biomarkers of Methane Emissions
The Archaea:Bacteria ratio was calculated for each sample and a positive correlation (P < 0.001) was confirmed by linear regression with methane emissions overall (Figure 3). This correlation was weaker when samples were grouped based on diet-being significant (P < 0.01) for the concentrate but not the forage diet ( Figure S5). Interestingly, a positive correlation between CH 4 emissions and the relative abundance of Euryarchaeota was confirmed (F = 0.567, P = 0.003) but only when studying high emitters.
Partial Least Square analysis including in the model diet and breed effects showed that the relative abundances of 31 microbial genera were negatively correlated with methane emissions ("Reducing effects on methane emissions" group in Table 1). There were 56 genera positively correlated (including 16 highly positively correlated) with methane ("Increasing effects on methane emissions" group in Table 1) and 40 genera considered as positively correlated with methane emissions but showing a low regression coefficients ("Low effect on methane emissions" in Table 1). Moreover, the result generated by PLS and including the 56 genera, breed type and diet effects, explained 50% of the variation in CH 4 emissions. One main result is that bacterial populations showed higher VIP value compared to methanogens including the most abundant genus Methanobrevibacter and four other hydrogenotrophic methanogens present at lower abundance including Methanosphaera genus. Bacteria producing butyrate (e.g., Butyrivibrio and Pseudobutyrivibrio spp.) or CO 2 were positively correlated with CH 4 emissions, contrasting with those associated with amino acid (e.g., Acidaminococcus and Allisonella species) and lactate metabolism (e.g., Megasphaera and Lactobacillus genera) or populations consuming hydrogen (e.g., Dehalococcoides genus). Other bacterial populations with significant VIP were known to be associated with nitrogen (Nitrosococcus or Nitrobacter spp.) or sulfur cycles or those classified in the average group were halotolerant populations or potentially involved in organic matter breakdown, or syntrophic activities (e.g., Syntrophobotulus genus).

Validation of Functional Genes as Biomarkers of Methane Emissions
The main result from the network analysis is that most of the same genes directly involved in methane emissions were found  over three independent trials and in one or two closed clusters. For example, these genes grouped within a single cluster (C1) for the 2013 samples or two clusters for the 2011 samples (C3 and C6) and 2014 samples (C3 and C5) (Figures 4A-C). Overall, 202 genes representing different microbial functions were identified using KEGG in these clusters including those known to be involved in methane emissions (n = 37). However, only 27 genes associated with [high or low] methane emissions were detected in the three experiments. A PLS analysis using 202 genes ("general analysis") was carried out and the results are summarized in Table 2. As a result, 37 genes were identified as important to predict methane emissions in cattle and as part of a model including breed type and diet effects explained 62% of the variation in methane emissions. The most abundant of these were either subunits of the methyl coenzyme M reductase gene catalyzing the final step of CH 4 synthesis pathway mcrABG (K00399, K00401 K00402) encoding for or genes associated with hydrogenase activity, such as formate dehydrogenase, tetrahydromethanopterin S-methyltransferase, formylmethanofuran dehydrogenase (K00123, K00125, K00577, K00580, K00581, and K00584) or energy synthesis (V-type H+-transporting ATPase) (K02117 and K02118). The former enzymes are associated with the hydrogenotrophic pathway while the genes encoding for heterodisulfide reductase (K03389, K03390) and associated with low emitters, part of the methylotrophic methanogenic pathway. All these genes were significantly higher in high-emitting rumen samples compared to low-emitters (P < 0.02). Finally, the genes with a higher VIP were not those encoding for the final reaction leading to CH 4 emissions but were associated with the transfer of the methyl group (e.g., K06937) or hydrogen (e.g., K02117 and K02118). In parallel, a similar PLS analysis was carried out but only using the genes (n = 36) known to be directly involved in the methane emissions pathway (Table S3). As a result, the percentage of variation in methane emissions explained by these genes increased (65%) compared to the general analysis (62%). Moreover, the genes with a higher VIP were not those encoding for methyl-coenzyme M reductase (Table S3) as observed in the general analysis.

Comparison between the Different Biomarkers Tested and Correlation with CH 4 Emissions
Potential biomarkers were compared together by PLS analysis to evaluate the factors highly correlated with CH 4 emissions (Table S4). Residual CH 4 emissions data were estimated to remove the effect of diets and breeds and to allow the comparison of the potential biomarkers identified by PLS as significantly correlated with CH 4 emissions. The PLS results identified 37 factors with a VIP value > 0.80 and explaining 42% of the variation in residual CH 4 ( Table 3). Within the 37 factors, 22 individual genes mostly involved in the hydrogenotrophic methanogen pathway were identified. The other parameters identified included methanogen populations (e.g., Methanobrevibacter, Methanotorris, and Methanohalophilus genera), the Shannon diversity indices for the methanogen community, PCoA scores or six bacterial populations as well as the Archaea-to-Bacteria ratio. Finally, all the other parameters previously tested and including the Acetate-to-Propionate ratio, or the data on the methanotrophs (relative abundance) were not identified as final biomarkers. A different result was obtained when a Spearman correlation test was applied on the same set of data using non-corrected methane values and therefore still considering the effects of diet and breed (Table S5). For example, Acetate:Propionate ratio showed the highest correlation with CH 4 emissions.

Treatment Effects on Methane Emissions
In the present study, the results of three independent trials were compared and combined, and the current analysis confirmed that the constituent of the basal diet was strongly and significantly associated with CH 4 emissions. The proportion of dietary forage to concentrate content in the diet as previously identified by Roehe et al. (2016) and Rooke et al. (2014). The dietary additives used in this study as a strategy to lower CH 4 emissions did not show significant results contrasting with previous works identifying nitrate and supplementary lipid as some of the most promising methane mitigation additives in ruminants Olijhoek et al., 2015;Guyader et al., 2016) while variations in response were detected (Yang et al., 2016). Factors that could explain these differences included the use of a reduced number of rumen samples from animals initially selected for lowand high-feed conversion and also the variability in the basal diet composition.

Identification of Functional Genes as Biomarkers of Methane Emissions
In this combined analysis, most of the genes previously identified by Wallace et al. (2015) and Roehe et al. (2016) were in general also identified in this study (n = 19/20) by PLS analysis over the three independent experiments and confirmed as strong biomarkers of CH 4 emissions. Most of these genes were involved in the hydrogenotrophic methane synthesis pathway and grouped in one cluster or two attached clusters over the three independent experiments as previously highlighted by Roehe et al. (2016). This study is one of the first confirming the importance of genes encoding for heterodisulfide reductase in the rumen over the genes associated with methylamine compounds or methanol conversion to accomplish the first step of methylotrophic methanogenic pathway (Buan and Metcalf, 2010;Borrel et al., 2013). Although genes encoding for heterodisulfide reductase were confirmed to be significantly correlated with methane in the rumen of low emitters, the result of the biomarker comparison did not identify those genes as robust biomarkers of CH 4 emissions. This result confirmed the dominance of hydrogenotrophy over methylotrophy in the rumen (Hook et al., 2010;Danielsson et al., 2017) but also highlighted that both pathways are important in explaining methane emissions (Poulsen et al., 2013). Interestingly, these genes associated with high VIP value were in the upper part of the pathway and encode for methyltransferase, hydrogenase, or dehydrogenase activities but not directly the genes (e.g., mcrA) encoding for the methyl coenzyme M reductase system the final step in methane production. This result tends to confirm the importance of hydrogen concentration and thermodynamics affecting the microbial communities and therefore VFA production and methane emissions (Wolin et al., 1997;Rooke et al., 2014). Contrasting with Shi et al. (2014), this study confirmed a significant increase in the relative abundance of most of the genes involved in CH 4 emissions by metagenomics, but in agreement with the same authors, not only mcrA, but all genes are important in explaining higher CH 4 emissions. These results could explain weak correlations previously observed with CH 4 emissions when targeting directly 16S rRNA gene or mcrA (Morgavi et al., 2012;Tapio et al., 2017). There have been estimated unexpected negative associations of microbial gene abundances and methane emissions ( Table 2). Several reasons including bioinformatics limitation (e.g., gene annotation error in database), the presence of artifacts in the generated prediction model and a lack of biological knowledge for the genes correlated with methane emissions results into the difficulty to associate estimates obtain here with a mechanistic function. For example, it is known that different methanogen species found in rumen samples carry most of the genes identified in this study. However, some specific methanogenic species will lack a specific gene or a subunit within an operon as described in Kaster et al. (2011). Therefore, some species have a different impact on the relative abundance of a specific subunit gene compare to others within the same operon and, in consequence, on the coefficient value obtained by PLS analysis.
Although it would be of further interest to identify to which organisms these genes belong to, this is beyond the scope of this paper and has to be addressed in substantial more detail using different methodologies to provide accurate results. In addition, phylogenetic association with the functional genes studied here is still challenging and was not carried out to avoid wrong conclusions. This decision was made based on the fact that new methanogens are still discovered (see Vanwonterghem et al., 2017) and not necessarily carrying all the genes involved in the methane synthesis pathway. Furthermore, different clades have been identified and were even within the same genus (e.g., Methanobrevibacter SGMT or RO clade) differently correlated with methane emissions in the same samples . Specifically, Methanobrevibacter clade SGMT but not RO, was found more abundant in low emitters while genera within methylotrophic methanogens were enriched in high emitting cattle.

Most Important Phylogenetic Parameters Impacting on Methane Emissions
Within the taxonomic parameters tested, factors directly associated with methanogens were confirmed to be robust biomarkers, especially the relative abundance of Methanobrevibacter genus. This genus is known to be the most dominant and active in the rumen (Hook et al., 2010;Henderson et al., 2015;Tapio et al., 2017;Wang et al., 2017) and is also associated with higher CH 4 emissions as confirmed here. Using the Kraken database, a wider diversity of methanogens in the rumen was found compared to the results obtained using the Greengenes database. This confirms preliminary observations by Poulsen et al. (2013) and Henderson et al. (2015), and also highlights the importance of the reference database used to characterize metagenomics data (Siegwald et al., 2017). Sun et al. (2012) confirmed that not all methanogens are active continuously in a methanogenic environment and suggested that the availability of substrates was an important cue for population growth. For example, Methanocaldococcus spp., Methanotorris spp. and the methylotrophic methanogen Methanohalophilus spp. were three low abundance genera that were highly correlated with CH 4 emissions and identified as robust biomarkers across different diets and breeds which contrasted with the result for the main methylotrophic methanogen Candidatus Methanomethylophilus. Moreover, the possibility to use these biomarkers offers an efficient and cheaper alternative to metatranscriptomics considered as more accurate tool to predict methane emissions compared to metagenomics (Shi et al., 2014;Wallace et al., 2017). Finally, the identification of low abundance methanogen populations but not all the most abundant as robust biomarkers may also explain weaker correlations found between total methanogens and CH 4 emissions when 16S rRNA or mcrA genes were targeted by qPCR (Mosoni et al., 2011;Morgavi et al., 2012). On the other hand, this weak correlation can also be the result of methane oxidation by methanotrophs. This study is one of the first confirming a greater abundance of methanotrophic populations, especially Methylomonas genus in rumen and being significantly negatively correlated with CH 4 emissions. Genes associated with methanotrophy were not identified in this study and previously in the set of eight animals (2011 experiment) as highlighted by Wallace et al. (2015) and could be explained by not enough depth of sequencing for genes carried by very low abundant populations (0.1%). The genus Methylomonas is identified in Greengenes and Kraken databases but the last one contains a broader diversity of recently discovered microbial populations that could improve the detection of low abundance genus in rumen sample.
In terms of data directly associated with the microbial community composition, the Archaea:Bacteria ratio was confirmed as a strong biomarker of CH 4 emissions while a lower R-value (R = 0.272) was found in this study compared to Wallace et al. paper (2014) which calculated this ratio on a reduced number of cattle (R = 0.49). This difference can be explained by the initial set of eight samples representing extreme methane emitters while the other 42 samples were not specifically selected for this trait. However, as also reported by the same authors, this significant correlation was diet dependent, and was significant for concentrate fed rumen samples but not forage samples. As previously shown in sheep by Kittelmann et al. (2014), the microbial community composition (PCoA-2 in this study) even at the phylum level was confirmed as robust biomarkers of CH 4 emissions. This could be explained by an increase in the relative abundance of several bacterial populations within Firmicutes, Bacteroidetes, and Proteobacteria, mostly in low emitters as shown by the L/H ratio in Table 1. However, our study confirmed the necessity to calculate the methanogen diversity as robust biomarker instead of total microbial diversity, not significantly different between methane emitter groups in this study. These results differed from the idea developed by Shabat et al. (2016) that cattle with higher CH 4 emissions will have higher total microbiome diversity.

Link between Microbial Communities and Metabolites Released
In term of identifying links between the bacterial community containing most of the organic matter degraders and the metabolites released in rumen, it seems that the degradation activities carried out by the two most abundant bacterial phyla, Firmicutes and Bacteroidetes as evaluated using the F:B ratio (Chen et al., 2016) were not important to explain CH 4 emissions. More interestingly, this study confirmed the importance of other bacterial populations associated with production of different metabolites which directly impacted on CH 4 emissions and also showing a higher VIP value compared to methanogens ( Table 2). For example, Butyrivibrio spp. and Pseudobutyrivibrio spp. both butyrate-producing bacteria were highly correlated with high CH 4 emissions while the presence of bacteria metabolizing lactate (e.g., Megasphaera), degrading amino acids (e.g., Acidaminococcus) or competing for H 2 were negatively correlated with CH 4 emissions (Park et al., 2014;Kamke et al., 2016;Sa et al., 2016). This result is explained by the different catabolic pathways carried by these populations and directly impacting on H 2 partial pressure and subsequently on CH 4 emissions (Janssen, 2010;Kelly et al., 2010;Kamke et al., 2016;Sa et al., 2016;Tapio et al., 2017). The presence of lactateutilizing Megasphaera genus within the robust phylogenetic biomarkers and negatively correlated with CH 4 emissions, highlighted the importance of lactate metabolism controlling rumen fermentation (Counotte and Prins, 1981), production of H 2 and specific VFAs and ultimately CH 4 (Van Lingen et al., 2016). The impact that VFAs have on CH 4 emissions is established (Janssen, 2010;Wanapat et al., 2015) and the positive correlation between different VFA or acetate-to-propionate ratio and CH 4 emissions as previously stated by Shabat et al. (2016). However, none of the VFA factors were identified as strong biomarkers (Table 3) confirming some contrasting results found between VFA and CH 4 emissions and reviewed in Negussie et al. (2017). It could be explained by necessity to study the relative inter-relationships among VFA measurements and also between VFA and CH 4 yield as suggested by Palarea-Albaladejo et al. (2017). Therefore, the impact that VFA have on CH 4 emissions may be less important compared to lactate metabolism and new strategies for methane mitigation could be developed based on this finding (Jeyanathan et al., 2014).
Genera within Succinovibrionaceae known to be dominant in the digestive tract of the Tammar wallaby, which emit one quarter of the methane emissions of the cattle (Pope et al., 2011) were not identified within low emitters as previously shown by Wallace et al. (2015). At the family level, the relative abundance of Succinovibrionaceae was 1.6-fold higher in low CH 4 emitters (on average 1.3 ± 0.2) compared to high emitters (on average 0.8 ± 0.1) but associated with a weak significance level (P = 0.049). Surprisingly, these bacterial populations were not identified as robust biomarkers, probably because of the functional redundancy associated with the production or degradation of each metabolite. On the other hand, the Opitutus genus was characterized as a robust biomarker and is known to be involved in H 2 production during the fermentation of organic matter (Chin et al., 2001). Very little information exists that explains the role of the Dorea, Isosphaera, Faecalitalea, Colwellia, and Singulisphaera on CH 4 emissions but some were associated with degradation capacities in methane emitting environment (Kleindienst et al., 2016).
Finally, we agree that other potential biomarkers of CH 4 emissions like archaeol could be tested (McCartney et al., 2013) and compared with the robust biomarkers identified in this study. The same authors showed the benefit of using archaeol over qPCR method as a proxy for CH 4 emissions.
To the best of our knowledge, this is the first report identifying and comparing potential CH 4 biomarkers across a range of dietary conditions and several experiments. This study confirms the possible value of targeting functional genes using metagenomics as most of the robust biomarkers identified were genes directly involved in the hydrogenotrophic methane synthesis pathway while methylotrophic methanogens were also important in explaining CH 4 emissions. In addition, most of the genes directly involved in the methane synthesis pathway grouped in the same cluster within a functional genes network and this result was reproduced over three independent trials. Finally, this study confirm the significance of using robust and applicable biomarkers from the microbiome as a proxy of CH 4 emissions across diverse beef cattle breeds fed with different diets as an alternative for a trait that is difficult-to-measure on a large number of animals. Moreover, the use of these biomarkers for the development of molecular tools will help for the implementation of breeding strategies targeting low-methane emitter animals.

ACKNOWLEDGMENTS
We thank Irene Cabeza Luna, Andrew Southwell, Asier Zaragoza, Laura Nicoll, Lesley Deans, and Claire Broadbent for the excellent technical support and Prof Alan Walker (Rowett Institute, University of Aberdeen) for the critical reading of the manuscript.