Variation in microbial CAZyme families across degradation severity in a steppe grassland in northern China

Little is known about the effects of grassland degradation on the carbohydrate-active enzyme (CAZYme) genes responsible for C cycling. Here we used a metagenomic approach to reveal variation in abundance and composition of CAZyme genes in grassland experiencing a range of degradation severity (i.e., non-, light, moderately, and severely degraded) in two soil layers (0–10 cm, 10–20 cm) in a steppe grassland in northern China. We observed a higher CAZyme abundance in severely degraded grassland compared with the other three degradation severities. Glycoside hydrolase (GH) and glycosyltransferase (GT) were identified as the most abundant gene families. The Mantel test and variation partitioning suggested an interactive effect of degradation severity and soil depth with respect to CAZyme gene composition. Structural equation modeling indicated that total soil carbon, microbial biomass carbon and organic carbon were the three soil characteristics most important to CAZyme abundance, which suggests an interaction between degradation and soil carbon fractions in determining CAZyme gene composition. Both above- and below-ground factors linked to soil organic matter play a central role in determining the abundance of CAZyme gene families.


Introduction
Grassland soils store huge carbon stocks and thus play a vital role in global carbon sequestration (Bai and Cotrufo, 2022). The degradation of grasslands due to overgrazing and climate change (Akiyama and Kawamura, 2007;Li et al., 2016), has reduced carbon sequestration capacity and enhanced soil carbon and nitrogen losses (Zhang et al., 2011;Wang et al., 2014;An et al., 2019). The structure and function of soil microbial communities are key indicators of both carbon cycling and grassland degradation (Singh, 2015;Khan et al., 2019;Liang et al., 2019;Yu et al., 2021) because they change throughout grassland degradation, according to soil pH (Che et al., 2019), soil organic carbon concentration (SOC) (Yu et al., 2021) and soil total nitrogen concentration (TN). Grassland degradation could additionally affect soil microbial communities via a decline in the quantity or quality of litter input to the soil (Zhang et al., 2011;He and Richards, 2015). Soil microbes may also exert feedback effects on soil degradation. For example, as an important indicator of fertility, soil microorganisms can aid in the restoration of degraded grasslands by converting exogenous C into stable soil organic C, or by enhancing grassland degradation by intensifying the decomposition of soil organic carbon (Liang et al., 2017;Zhu et al., 2020). Therefore, studying the relationship between grassland degradation and soil microbial activity can help us more effectively remediate degraded grasslands.
Soil microbial communities are usually described either by their taxonomic composition or functional gene composition. Due to limited understanding of the relationship between taxonomy and function, assumptions of functional coherence are usually made for the various taxonomic groups of microorganisms (Philippot et al., 2010;Fierer et al., 2012). However, recent advances in shotgun DNA sequencing methods provide an increased potential for characterizing environmental metagenomes, enhancing the resolution of function down to the responsible genes contained within soil microbial communities. Previous studies on grassland degradation have mainly focused on the taxonomic composition of soil microbial communities (Cai et al., 2014;Che et al., 2019;Zhou et al., 2019), so the effect of grassland degradation on soil microbial function remains unclear.
Metagenomics provides a powerful molecular approach to characterize microbial genes encoding the enzymes taking part in functions such as C cycling. The carbohydrate-active enzymes (CAZymes) are the enzymes involved in the hydrolysis and biosynthesis of complex carbohydrates. CAZymes include glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), glycosyltransferases (GTs), proteins with non-catalytic carbohydrate-binding modules (CBMs), and enzymes with auxiliary activities (AA). The classification of CAZymesis based on the similarity of amino acid sequences in protein domains, which may reflect their biological functions (Cantarel et al., 2008). The GHs cleave the glycosidic bonds within carbohydrates or between a carbohydrate and a non-carbohydrate moiety, and are important in degradation of cellulose, hemicellulose, chitin and glucans. The GTs participate in the formation of glycosidic bonds. The AAs are redox enzymes that act in conjunction with CAZymes, and several AA enzymes play roles in the degradation of cellulose and lignin. The CEs participate in the decomposition of hemicelluloses. The PLs cleave uronic acidcontaining polysaccharide chains via a β-Elimination mechanism to generate an unsaturated hexenuronic acid residue and a new reducing end. Catalytically active CAZymes may contain carbohydrate-binding modules (CBMs), which are essential for effective hydrolysis because they mediate binding to cellulose, xylan, chitin or other carbohydrates (Donohoe and Resch, 2015). The CAZyme gene abundances may reflect the C cycling potential of a microbial community. Therefore, in this study annotation with the CAZy database was used to reveal the functional shifts in soil microbial communities following vegetation changes such as been done previously for the conversion of cropland to plantation (Zhang and Lv 2020), reclamation of saline-alkali soil (Yin and Zhang 2022) and afforestation of farmland (Ren et al., 2021). Previous studies have provided CAZy annotations of soil metagenomes in grasslands Noronha et al., 2017;Yeager et al., 2017). However, the interactions between soil CAZyme abundances and microbial community structure during grassland degradation is still unclear. While soil microbes mainly function as decomposers in degrading grassland, debate still exists as to whether there are other microbial functional responses to the decreased availability of litter caused by grassland degradation (Liang et al., 2017;Liang et al., 2019). Low litter input has been shown to reduce microbial CAZymes (õZifõc´akov´a et al., 2016;Ren et al., 2017;Zhong et al., 2020), but one may easily envision the case in which microorganisms invest more energy in CAZymes to maximize energy and nutrient capture from the limited supply of litter (Liang et al., 2017;Luo et al., 2017). Thus, understanding the overall status of the C cycle functional genes would be helpful in recognizing the interactions between soil microbial communities and degrading grassland.
A recent study investigated the role of CAZymes as microbial functional responses to C turnover (Cantarel et al., 2008). Presently, there are few papers on specific glycoside hydrolases (GHs) or enzymes with auxiliary activities (AA) that drives C dynamics > These have been related to polysaccharides and lignin decomposition, respectively (Llado et al., 2019;Lopez-Mondejar et al., 2020). Previous reports indicated that plant biomass was mainly degraded by cellulases or hemicellulases (GHs), while lysozymes and chitinases (GHs) were responsible for degrading bacterial and fungal biomass (Lombard et al., 2014;õZifõc´akov´a et al., 2016). It was also reported that soil substrates and environmental characteristics were important controls of the activities of microbial CAZymes, as they can impact microbial communities in soils. Therefore, soil C sources have been suggested as the drivers of the transcription of gene that encode specific CAZymes (Hu et al., 2020;Lopez-Mondejar et al., 2020). Additionally, it should also be noted that season may affect CAZyme activity, leading to changes in C cycling (õZifõc´akov´a et al., 2016).
In the present study, we collected soil samples from various portions of the Xilingol Grassland in Inner Mongolia, China, experiencing different degrees of degradation. Xilingol Grassland is a true steppe and one of the three largest natural grasslands in China. It is located in the middle of the Inner Mongolia Autonomous Region and has an area of 1,79,600 square kilometers. It is an important ecological barrier in northern China. The Xilingol Grassland transitions from meadow grassland to desert grassland as one moves from the northeast to the southwest. The most typical grassland type is distributed in the middle of the Xiling League and accounts for more than half of the Xilingol Grassland area. Since the 1970s, the Xilingol grassland has suffered from various forms of degradation, including desertification. By 2010, the area of degraded, sandy and salinized grassland reached 75.14% of the total grassland area (Bai et al., 2017). Therefore, the objectives of this study were to 1) investigate abundance of CAZyme genesin soils of the Xilingol grassland along a gradient of degradation severity, 2) document the community structure of CAZyme gene along the degradation gradient, and 3) determine the contribution of various environmental variables to variation in CAZyme genes. We hypothesized that 1) grassland degradation causes significant variation in CAZyme gene abundance and composition; 2) grassland degradation increases the abundance of functional genes involved in C decomposition; and 3) plant community type and soil carbon fractions interact to affect CAZyme gene composition.

Soil sampling
We sampled soils in the Xilingol Grassland, Inner Mongolia, China (45°50′7.74 ″N, 116°33′21.19 ″E). The average annual rainfall and temperature are~300 mm and 1.6°C. Past overgrazing resulted in severe grassland degradation in this region. Based on the grassland degradation index (Dong et al., 2019), non-degraded (ungrazed for the last 10 years, GDI = 0.93), lightly degraded Frontiers in Environmental Science frontiersin.org (GDI = 0.75), moderately degraded (GDI = 0.63) and severely degraded (GDI = 0.52) sites were chosen for study. All sites occurred within a 5 km × 5 km area to minimize differences in climate and original vegetation composition among them. Vegetation characteristics including plant species, biomass, abundance, vegetation cover, and soil properties including total nitrogen, total carbon, total phosphorus, total potassium, pH and NO 3− -N, were also determined and listed in Supplementary Table S1. At each site we established three plots (1 × 1 m), and from each plot we randomly sampled five soil cores (diameter 4 cm) after removing the humus layer. Two soil depths (0-10 cm and 10-20cm) were sampled separately but for each depth the five soil cores were pooled to form a single independent sample per plot during the growing season (August) of 2021. Twenty-four samples were obtained (4 sites × 2 depths × 3 replicate plots). Soil samples (200 g) were sieved using a 2-mm mesh to remove debris and stones. Sieved samples were stored in a −80°C freezer prior to DNA extraction and at 4°C prior to chemical analyses. We also recorded the richness and abundance of each plant species from the 1 × 1 m sampling plots.

Metagenomic analysis
A FastDNA SPIN Kit (MP Biomedicals, Santa Ana, CA, United States) was used to obtain soil genomic DNA from 500 mg soil samples based on the manufacturer's protocol. We diluted the extracted DNA 10 ngμL −1 using a spectrophotometer (NanoDrop ND-1000, NanoDrop Technologies), which was then fragmented tõ 350 bp using an ultrasonicator (Covaris M220, Gene Company Limited, China) to construct paired-end libraries using a TruSeq DNA Sample Prep Kit (Illumina, San Diego, CA, United States). Then, blunt-end fragments were linked with adapters possessing the full complement of sequencing primer hybridization sites. The Illumina HiSeq 4000 PE150 platform was employed for pair-end sequencing (Illumina Inc., San Diego, CA, United States) based on the manufacturer instructions of the HiSeq 3000/4000 PE Cluster Kit and HiSeq 3000/4000 SBS Kits.
Trimmomatic 0.36, a read-trimming tool, was used to process the raw data (Bolger et al., 2014) to remove adapters and reads of moderate quality. Sequences with Q values < 20 were excluded. Short reads were assembled with De bruijn-graph-based assembler Megahit (v1.0.6) (Li et al., 2015), using a 19.7%-40.1% mapped ratio across all the samples. The open reading frames (ORFs) of each metagenomic sample were estimated with Prodigal (v2.60) (Hyatt et al., 2012), and those of lengths >100 bp were translated into amino acid sequences according to the NCBI translation table (https://www. ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi? chapter = tgencodes#SG1). CD-HIT (http://www.bioinformatics.org/ cd-hit/) was used to cluster sequences from gene sets with a 95% sequence identity (90% coverage) into a non-redundant gene catalog, which were then mapped with Bowtie 1.1.2. Next, sam2counts 0. 91 was employed to convert the mapped results to reference sequence counts to generate a gene table for functional annotation.
CAZymes in the metagenome contigs were annotated following gene calling with pipeline dbCAN2 and non-redundant protein (Yin et al., 2012), according to the manual curation of sequence alignments from the CAZy database (July 2020). Non-redundant protein taxonomical annotations were determined using the easy-taxonomy mode and nr database of MMseqs2 (Steinegger and S¨oding, 2017). Next, we mapped the high-quality sequences of the samples on the predicted gene sequences to assess the abundance of these genes using Salmon (Patro et al., 2017). We then used the transcripts per kilobase million (TPM) mapped reads to normalize the abundance values in the metagenomes, and normalized the total sequencing number to one million reads per sample to avoid the sequencing depth bias in the comparative analysis.

Soil properties
We determine soil total nitrogen (TN), total carbon (TC), total potassium, total phosphorus, extractable phosphorus, pH, water content, NH 4 + -N, NO 3 − -N, organic carbon, microbial biomass carbon (MBC), microbial biomass nitrogen (MBN), and microbial biomass phosphorus (MBP). Soil TC and TN were determined by performing elemental analyses (Elementar Vario MACRO, Germany). An automated discrete analysis was used to quantify ammonium nitrogen (NH 4 + -N) and nitrate nitrogen (NO 3 − -N) (CleverChem 380, Germany). Total K was determined by flame photometry (Lu, 1999). Total and extractable P concentrations were determined using a spectrophotometer (UV-1600 spectrophotometer, Beijing) (Lu, 1999). The standard Walkley-Black potassium dichromate oxidation method (Nelson and Sommers, 1982) was used for organic carbon. A 1:1 ratio of soil to water was used to measure soil pH (HANNA, Padova, Italy). Additionally, we gravimetrically measured soil water content (SWC) by oven drying to constant mass at 105°C, and MBC, MBN, and MBP were determined using the chloroform fumigation extraction method.

Statistical analysis
The effect of grassland degradation severity and soil depth on CAZyme gene family read abundances were determined using twoway analyses of variance (ANOVAs). If ANOVA indicated significant effects, then the means of treatments were compared using Tukey's honestly significant difference test (p < 0.05). Two-way ANOVA and multiple comparisons were also performed on each of the glycoside hydrolase (GH), glycosyltransferase (GT), polysaccharide lyases (PL), carbohydrate esterase (CE), carbohydrate-binding module (CBM), and auxiliary activity (AA) families.
Non-metric multidimensional scaling (NMDS) analysis was performed using Bray-Curtis distances to assess variation in the CAZyme gene community structure. Significant differences in CAZyme gene communities among different degradation severities and soil depths were determined using the PerMANOVA test (Vegan package in R, Oksanen et al., 2013).
After confirming that the structure of a CAZyme gene family was significantly affected by treatment, indicator species analyses were conducted to investigate genes with significantly different abundances using the 'signassoc' function in the 'indicspecies' package (DeCáceres and Legendre 2009) according to each gene's abundance in a sample. The mode zero (site-based) was used and p-values were obtained following Sidak's correction for multiple testing.
We assessed the association of CAZyme gene composition with plant community composition or soil characteristics using the Mantel tests. Partial mantel tests were performed between CAZyme gene composition and plant community composition or soil characteristics after controlling for other factors.

Frontiers in Environmental Science frontiersin.org
Variance partitioning of Bray-Curtis distances of CAZyme gene compositions was performed using a multiple regression for distance matrices (Swenson, 2014) to assess the significance of plant community compositions and soil characteristics in determining CAZyme gene composition.
We built structural equation models (SEM) with Mantel R values as inputs in AMOS 20.0 (Arbuckle, 2011) to determine the direct and indirect effects of plant communities on soil characteristics of the CAZyme gene families. Basing on the Mantel test and variance partitioning, which showed a significant interactive effect between plant communities and soil characteristics, a conceptual model was built whereby plant communities affected soil characteristics and both types of variables jointly affected CAZyme gene composition. SEM models with observations were compared using a maximum likelihood estimation method. Based on the methodology of Hu and Bentler (1999), we assessed the goodness of fit using a root mean squared error of approximation (RMSEA) <0.05 and TLI (Tucker-Lewis Index) >0.95.

Variation in CAZyme gene abundance across grassland degradation severity and soil depth
There were a total of 1,676,180 CAZyme gene sequences (minimum 67,091; maximum 72,168; mean 69,840 per sample). GT and GH were identified as the two dominant families of genes, representing an average of 32.29% and 25.33%, respectively. Degradation severity and soil depth, and their interaction, were significant with respect to CAZyme gene sequence abundance (p < 0.05; Figure 1; Supplementary Table S2). CAZyme gene abundance was significantly different between the two soil depths (p < 0.05). For the lightly and moderately degraded grassland, the abundance of the CAZyme genes were significantly higher in the 10-20 cm depth soil than in 0-10 cm soil depth. For the non-and severely degraded grassland, CAZyme gene abundances were not different between the two soil layers. Among the different degradation severities, CAZyme gene abundance significantly differed between nondegraded and severely degraded, between lightly and severely degraded, and between moderately and severely degraded, but not among non-degraded, lightly degraded and moderately degraded grassland (Figure 1; Supplementary Table S3).

Structures of CAZyme gene composition across grassland degradation severity and soil depth
The NMDS together with the two-way PerMANOVA (p < 0.05 for degradation severity, soil depth, and their interaction) indicated a significant difference in the structure of CAZyme gene families across degradation severity and soil depth (Figure 2; Table 1).
Indicator species analysis identified 201 gene families that differed significantly in abundance across treatments (Figure 3). Of these, 100 gene families were significantly higher in severely degradation grassland, of which 52 had a soil depth exceeding 10-20 cm and 48 exceeding 0-10 cm (Figure 3). Further, 49 out of the 100 gene

Relationship between structures of CAZyme gene families and plant community and soil characteristics
Mantel tests showed a significant relationship between the structure of CAZyme gene families and plant communities (p < 0.05; Table 2) and soil characteristics (p < 0.05; Table 2), at both of the soil depths. In the partial Mantel test, after controlling for soil characteristics, plant communities showed a significant relationship with structure of CAZyme gene families (p < 0.05; Table 2). However, soil characteristics were not significantly related to CAZymes after controlling for plant community structure (p > 0.05; Table 2).
Using the distance matrix multiple regression, plant communities and soil characteristics variance partitioning was used to examine their relative influence on CAZyme gene structure. These two factors together explained 30% and 22% for CAZyme composition at soil depths of 0-10 cm and 10-20 cm, respectively (Figure 4). The interpretation strength is mainly due to the interaction of plant communities and soil characteristics, with 23% and 18% for CAZyme structure at soil depths of 0-10 cm and 10-20 cm.
We found that plant community structure and soil characteristics explained much of CAZyme structure in the SEM model for both of the soil layers (for 0-10 cm, Chi-square = 0.375, df = 3, probability level = 0.945, RSMEA = 0.001, TLI = 0.999; for 10-20 cm, Chisquare = 0.004, df = 2, probability level = 0.998, RSMEA = 0.001, TLI = 0.999) ( Figure 5). The final model explained 33% and 37% of the variation in structure of CAZyme gene families for soil depths of 0-10 cm and 10-20 cm, respectively. Total carbon, organic carbon and MBC explained significant variation in the structure of CAZyme composition. Furthermore, the SEM model also identified the interactive effect of plant community structure and soil characteristics. Plant community structure had an indirect effect on CAZyme composition by influencing soil total carbon or soil organic carbon. Soil total carbon and organic carbon also exerted an indirect effect on CAZyme composition by influencing MBC.

Discussion
Grasslands sequester large amounts of organic C and are thus important for climate regulation (Bai and Cotrufo, 2022). As grasslands become degraded due to human activity and climate change, their ability to store carbon may be compromised. We therefore set out to determine the impact of grassland degradation on enzyme families responsible for much of C cycling in ecosystems. Our study showed that the abundance of soil microbial CAZyme gene families significantly increased in the severely degraded grassland compared to the grasslands of the other three degradation  severities. Therefore, the ability to sequester C may be reduced in severely degraded grassland. Severe degradation in grassland is accompanied by a decrease in litter input to soil and a decrease in plant-derived substrates for soil microorganisms. In our study, plant biomass decreased with increased degradation; plant biomass in severely degraded grassland decreased by 76% compared to the non-degraded grassland (Supplementary  Table S1). Previous studies showed that nutrient limitations due to low litter input could restrict the growth of microbes and decrease CAZymes abundance (õZifõc´akov´a et al., 2016;Ren et al., 2017;Zhong et al., 2020). However, some studies show that microorganisms secrete more extracellular enzymes with decreased litter input (Liang et al., 2017;Liang et al., 2019), which is consistent with our results For    Table S4). This suggests that enhanced microorganism enzyme activity might be achieved at the expense of metabolic reserves (õZifõc´akov´a et al., 2016) in the severely degraded grassland. In addition, GTs were enriched in the severely degraded grassland (Supplementary Table S4). GTs are related to peptidoglycan hydrolases and synthetases of bacterial cell walls. In some circumstances, bacteria may remodel their cell walls according to environmental changes (Horcajo et al., 2012;Cava and de Pedro, 2014;Bernal-Cabas et al., 2015). Severely degraded grassland experienced a significant increase in genes associated with bacterial outer membrane and cell wall biogenesis (COG0399 and COG2982) (data were not shown here) according to the eggNOG database (Huerta-Cepas et al., 2019). As degradation increases, grasslands experience larger environmental fluctuations as a consequence of reduced vegetation cover and poorer soil nutrient status. These changes in enzyme abundance in the severely degraded grassland may function to maintain cell wall homeostasis and enable adaptation to environmental disturbance.
Our results showed that structure of CAZyme genes varied significantly across a gradient of grassland degradation (Figure 2; Table 1), which was supported by Luo et al. (2020). The variation across the degradation gradient was associated with variation in plant community structure and soil characteristics and their interaction. Further, SEM suggests that CAZyme variation was primarily driven by interaction between plant community structure and soil total carbon, organic carbon and MBC. It was also demonstrated that organic matter and energy flux had pivotal roles in ecosystems' responses to degradation above-and below-ground (Hagedorn et al., 2009;Breidenbach et al., 2022). In this study, the NMDS showed that non-degraded grassland and the severe degraded grassland are grouped for the CAZyme gene composition. In grassland, severe degradation may result from overgrazing and subsequent plant species replacement, reduced plant production, root death and decomposition. These changes could also affect the structures and functions of microbial communities for adapting to reduced SOC contents. Thus, the results of NMDS might originate from a stimulation of microorganism enzyme activity by low substrate input in the severely degraded grassland, making it similar to that in the non-degraded grassland (õZifõc´akov´a et al., 2016). On the other hand, these phenomena may further promote the loss of fine-textured soil containing SOC.

Implications, limitations, and future directions
This study provided empirical evidence that abundance and composition structure of the soil C cycling functional genes, CAZyme gene families, varied across a gradient of grassland degradation. Some CAZyme families were overrepresented in the severely degraded grassland. The result of high CAZyme abundance and low litter input may lead to reduced C sequestration. The decrease in SOC in the severely degraded grassland suggests that unstabilized organic matter of plant origin (plant litter and root exudates) is rapidly mineralized. Moreover, a recent study reported that microbial functional changes result in an irreversible degradation of Tibetan Plateau pastures (Breidenbach et al., 2022). Thus, conventional strategies excluding grazing to mitigate severe degradation is unlikely to lead to ecosystem recovery without intervention. We urgently need to seek for ecological restoration approaches for severely degraded grassland. Reseeding may be one approach to restoring severely degraded grassland, but one must be aware that conventional N fertilization often carries the risk of N leaching and pollution (Mejias et al., 2021). For the non-degraded, lightly degraded and moderately degraded grasslands, more reasonable levels of grazing (e.g., rotational grazing; Dong et al., 2022) compared to the current grazing pattern could be explored to prevent further ecosystem degradation.
An important limitation associated with this present study was that we only performed sampling on a single occasion with only three replicate of each degradation severity. Thus, it is difficult for us to Frontiers in Environmental Science frontiersin.org generalize our findings to other grassland systems or to our grassland at different times of year. Thus, more studies comprising larger temporal and spatial scales across various degraded and restored grassland ecosystems are needed. Furthermore, future studies might fruitfully consider microbial activity with respect to C sequestration using meta-transcriptome and metaproteome approaches in the prediction of C accumulation.

Data availability statement
The data presented in the study are deposited in the CNGBdb repository, accession number CNP0003968.