- 1Department of Gastroenterology, Affiliated Hospital of Yunnan University, Kunming, China
- 2Department of Cardiology, Gansu Provincial Hospital, Lanzhou, Gansu, China
- 3Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, China
- 4College of Biological Sciences and Technology, Taiyuan Normal University, Jinzhong, China
- 5Microbiome Medicine and Advanced AI Technology, Kunming, China
- 6Computational Biology and Medical Ecology Lab, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- 7Biostatistics and Image Genetics Lab, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
The diversity-area relationship (DAR), an extension of the classic species-area relationship (SAR), provides a powerful framework for understanding how biodiversity scales across space. In this study, we applied DAR and its metagenomic counterpart (m-DAR) to investigate the spatial scaling of metagenomic genes (MGs) and metagenomic functional gene clusters (MFGCs) of seven functional databases in the gut microbiomes of individuals with inflammatory bowel disease (IBD) and healthy cohorts. Using shotgun sequencing data from 42 mucosal and 22 fecal samples from both healthy and IBD cohorts, we modeled how this MGs and MFGCs accrues with area (samples), estimating diversity scaling parameters (z), pair-wise diversity overlap (PDO), and maximal accrual diversity (MAD), which reflects the total potential diversity. We found that mucosal communities exhibited greater dissimilarity (less pair-wise diversity overlap) between individuals than fecal cowmmunities at the levels of gene richness and evenness (q = 1, 2), whereas fecal communities showed a stronger influence from dominant, abundant genes (q = 2, 3). Furthermore, healthy gut microbiomes showed greater similarity than those of IBD at the level of gene richness (q = 0), but showed greater dissimilarity at the level of abundant genes and dominant genes. Healthy gut microbiomes generally demonstrated a higher potential total diversity compared to those from IBD patients. Notably, fecal samples captured a broader range of microbial diversity than mucosal samples. Additionally, mucosal communities showed greater dissimilarity than fecal communities in almost all the MFGCs of the seven databases except ARDB, which showed the same trend as MGs. We also identified that specific functional clusters related to antibiotic resistance, such as genes for chloramphenicol and vancomycin resistance, displayed distinct scaling behaviors, suggesting their potential role in IBD pathogenesis. These findings demonstrate that the gut microbiome in IBD is not merely less diverse but is fundamentally restructured in its spatial architecture. The application of DAR provides a novel, quantitative insight to diagnose and understand this dysbiosis, moving beyond simple diversity metrics to capture the spatial diversity scaling of microbial genes and functions.
Introduction
The term “metagenome” refers to the collective genetic material of microbial organisms within a given environment, which can be observed within or among individual “microbes” (single individuals or populations of individuals). Microbial biogeography describes the spatiotemporal distribution of microbial species, operational taxonomic units (OTUs; e.g., Martiny et al., 2006, Hanson et al., 2012, Costello et al., 2012, van der Gast, 2013), and, more broadly, the distribution of genes or metagenomes. These distributions can be studied both within and among individuals (Costello et al., 2012). OTUs, derived from amplicon sequencing reads, are commonly used to represent microorganisms at a taxonomic level. In contrast, operational metagenomic units (OMUs; Ma and Ellison, 2024, BMC Bioinformatics) are a recently proposed concept based on whole-genome sequencing (also known as shotgun sequencing) and represent metagenomic genes, providing a refined framework for analyzing functional diversity.
Our recent work has expanded the study of microbial diversity to include OTUs, genes, and metagenomes, with a focus on diversity-stability relationships and diseases linked to alterations in the structure and interactions of an individual’s microbiome. This research has been documented in a series of publications (Ma, 2018b, 2018c, 2019, 2020; Ma and Li, 2018; Ma et al., 2018; Ma and Ellison, 2018, 2019). Building on these efforts, we recently introduced the concept of operational metagenomic units (OMUs; Ma and Ellison, 2024, BMC Bioinformatics), which are based on the binning of sequencing reads from whole-genome metagenomic data. OMUs provide a refined framework for analyzing metagenomic diversity at the gene and functional levels. The diversity-area relationship (DAR; Ma, 2018a, 2018b, 2019), an extension of the classic species-area relationship (SAR; Connor and McCoy, 1979), is applicable to both OTUs and OMUs. In this study, we apply DAR to OTUs and its metagenomic counterpart, m-DAR, to OMUs, to investigate how microbial and metagenomic diversity fluctuate and scale across space.
Inflammatory bowel diseases (IBD), including Crohn’s disease (CD) and ulcerative colitis (UC), are thought to arise from an inappropriate immune response to gut microbes in genetically susceptible hosts. Despite extensive research, the aetiology of these chronic inflammatory disorders remains unclear. The incidence of IBD has risen significantly in the Western world since the mid-twentieth century (Molodecky et al., 2012; Rocchi et al., 2012; Hammer et al., 2016), with prevalence plateauing at up to 0.5% of the general population in developed nations. In contrast, IBD prevalence continues to increase in developing countries (Benchimol et al., 2009, 2014; Kaplan, 2015). Research into the causes of IBD has focused on host genetics, immune responses, gut microbiota, and environmental factors. A growing body of evidence highlights the consistent association between gut dysbiosis and IBD. Advances in high-throughput sequencing technologies over the past decade have further elucidated the role of the microbiome in IBD pathogenesis, revealing its functional mechanisms and underscoring its importance as a secondary organ system for the host.
In this study, we apply the DAR framework to investigate the scaling of microbial diversity (using OTUs) and metagenomic diversity (using OMUs) across space, expressed as Hill numbers (Hill, 1973; Chao et al., 2014). The DAR model (Ma et al., 2018; Ma and Ellison, 2021), an extension of the traditional Species-Area Relationship (SAR), provides a comprehensive framework for understanding how metagenomic diversity scales with the number of individuals sampled (Chen et al., 2021; Li L. W. and Ma, 2019; Li W. D. and Ma, 2019; Ma, 2018a, 2018b, 2018c, 2019; Ma and Ellison, 2024; Ma and Li, 2018; Ma et al., 2018; Xiao and Ma, 2021). For metagenomic data, we estimate m-DAR parameters and associated measures—pair-wise diversity overlap (PDO), maximal accrual diversity (MAD), and the ratio of individual- to population-level diversity (RIP)—to characterize the spatial scaling of metagenomic diversity within and among individuals. Using data from faecal and mucosal microbiomes of IBD and healthy cohorts, we illustrate our approach and analyze m-DAR profiles, PDO, and MAD to identify within- and among-individual variation and markers distinguishing healthy and IBD-associated microbiomes. This approach offers several key advantages: (1) It unifies alpha and beta diversity: The scaling parameter (z) effectively captures the rate of diversity turnover across space, integrating information often split between alpha and beta metrics. (2) It estimates total diversity potential: The model allows us to estimate the Maximal Accrual Diversity (MAD), which represents the total potential diversity of the ecosystem, including the “dark” diversity that is not yet observed but theoretically present. (3) It quantifies community overlap: It enables the calculation of Pair-wise Diversity Overlap (PDO) and the ratio of individual- to population-level diversity (RIP), direct measures of community similarity across scales.
This study holds significant implications for understanding microbial diversity in the context of IBD. First, by estimating diversity scaling rates across space for both microbial diversity (using OTUs) and functional diversity (using OMUs), we provide a quantitative framework to describe how microbial and metagenomic diversity change with scale, offering insights into the organization and distribution of microbial communities. Second, we estimate the potential (dark) diversity of genes and functionalities, which represents the portion of diversity not yet observed but theoretically present in the microbial community. This approach sheds light on the hidden diversity that may play critical roles in ecosystem functioning and disease states. Finally, we test the influence of IBD on diversity scaling and potential diversity, addressing how disease states alter the spatial organization and richness of microbial communities. These findings advance our understanding of the gut microbiome’s role in IBD and provide a foundation for future research on microbiome-based diagnostics and therapies.
Materials and methods
Study design and sample collection
A total of 22 faecal samples and 42 intestinal mucosal samples were collected from 21 couples, comprising 42 participants aged 18 to 60 years from Kunming, Yunnan, China. Healthy volunteers had no history of gastrointestinal disorders and had not used antibiotics in the year preceding sample collection or taken any medications during endoscopy. Ulcerative colitis (UC) diagnoses were confirmed using standard endoscopic, radiographic, and histopathological criteria. All UC patients were undergoing treatment with Mesalazine. Mucosal specimens were collected in the morning without prior bowel preparation. Samples were obtained 10 cm from the anal margin using disposable biopsy forceps, immediately flash-frozen in liquid nitrogen, and stored at −80 °C until DNA extraction. All data reanalyzed in this study are available in the publication by Li et al. (2021). Written and verbal informed consent was obtained from all participants.
DNA extraction, sequencing, and preprocessing
Metagenomic shotgun sequencing was conducted on all 64 samples, yielding an average of 5.95 Gbp of high-quality data per sample. A non-redundant gene catalog comprising 999,310 genes was constructed. Sequencing was performed using the Illumina platform to generate paired-end reads. Raw reads were processed to remove low-quality sequences, reads containing ambiguous bases (N), and adapter sequences. For each sample, short reads were de novo assembled using multiple k-mer sizes in parallel. Assembled contigs were validated by mapping reads back to them, and the optimal assembly was selected based on contig N50 and mapping rate.
Functional annotation
Gene functional annotation was performed by aligning sequences against several databases, including eggNOG, Nr, GO, COG, Swiss-Prot, KEGG, and ARDB. The Nr database (NCBI RefSeq Non-Redundant Protein Database) provides a comprehensive collection of non-redundant protein sequences. The eggNOG database (evolutionary genealogy of genes: Non-supervised Orthologous Groups) offers precise functional annotation through orthologous gene groups. KEGG (Kyoto Encyclopedia of Genes and Genomes) facilitates the interpretation of high-level biological functions and systems using molecular-level data. The Gene Ontology (GO) knowledgebase is a globally recognized resource for gene function information. COGs (Clusters of Orthologous Groups of proteins) enable the analysis of protein function and evolution. Swiss-Prot is a curated protein database containing detailed information on protein origin, sequence annotation, and amino acid sequences. ARDB (Antibiotic Resistance Genes Database) is a centralized resource for annotating antibiotic resistance genes.
Estimation of metagenome diversity
Following the approach of Ma and Li (2018), metagenome diversity was estimated using Hill numbers (Hill, 1973; Jost, 2007; Chao et al., 2012, 2014) to quantify the diversity of metagenomes (MGs) and metagenomic functional gene clusters (MFGCs). The diversity of order q is defined as:
In Equation 1, G represents the number of MGs or MFGCs, pᵢ is the relative abundance of the i-th MG or MFGC, and q is the diversity order. For q = 0, 0D = G corresponds to the richness (number) of MGs or MFGCs. For q = 1, 1D represents diversity weighted by gene or functional gene cluster frequency. For q = 2, 2D emphasizes dominant (more abundant) genes or functional gene clusters, and for q = 3, 3D further increases the weighting of dominant genes or clusters (Ma and Li, 2018).
Metagenome diversity based on MGs relies on individual genes, whereas MFGC diversity can vary depending on whether clusters are defined by metabolic functions (e.g., KEGG) or protein functions (e.g., eggNOG). Ma and Li (2018) further classified MFGC diversity into two types based on gene abundance information. Type I MFGCs ignore gene abundances and count only the presence or absence of genes in a cluster (analogous to incidence data in macrobial diversity studies; e.g., Brom et al., 2015). In contrast, Type II MFGCs incorporate both gene presence and their relative abundances.
Fitting m-DAR models and constructing m-DAR diversity profiles
Building on Ma (2018a), who extended the classic species-area relationship (SAR) to the diversity-area relationship (DAR), Ma and Ellison (2021) employed a power law (PL) model to define the metagenomic diversity-area relationship (m-DAR):
In Equation 2, qD represents metagenome diversity of order q (Equation 1), A (“area”) denotes the number of sampled individuals, and c and z are fitted parameters. Here, c estimates the diversity of a single individual, while z represents the rate at which metagenome diversity increases with the number of individuals sampled. Following Plotkin et al. (2000) and Ulrich and Buszko (2003), Ma (2018a) modified Equation 2 to include a third parameter, d:
In this “power law with exponential cutoff” (PLEC) model, d < 0 and exp (dA) eventually overwhelms the exponential function at very large values of A, leading to an asymptotic value of qD. Using this exponential decay term makes sense because there are a finite number of people and thus a finite diversity of metagenomes. We use log-transformed versions of Equations 2, 3.
To estimate the parameters of the models because their computation is simpler; z is scale-invariant in Equation 4; and the ecological interpretation of z as a scaling parameter is preserved in Equation 5. On a log–log plot, z is the slope of the linearized functions. Fitting of Equations 4, 5 to the data was evaluated using the linear correlation coefficients (r) and associated p values.
In these equations, z remains scale-invariant and retains its ecological interpretation as a scaling parameter. On a log–log plot, z represents the slope of the linearized functions. Model fitting was evaluated using linear correlation coefficients (r) and associated p-values. Unlike natural ecosystems, human microbiomes lack a natural spatial order or environmental gradient among hosts. To address this, we enumerated all possible permutations of sample subject orderings and randomly selected 50 (for MGs) or 100 (for MFGCs) orderings. For each permutation, m-DAR models (Equations 4, 5) were fitted. Poorly fitting models (p > 0.05) and PLEC models with biologically infeasible Amax < 0 were excluded. The final model parameters were derived as averages from the remaining permutations. The relationship between diversity order q and scaling parameter z (Equation 2) defines the m-DAR profile, analogous to species diversity profiles (Ma, 2018a; Ma and Ellison, 2021).
Metagenomic maximal accrual diversity of metagenome
Ma (2018a) derived the maximal accrual diversity (MAD) in a cohort or population based on the DAR-PLEC model (Equation 3) as:
for which the number of individuals (Amax) reaching the maximum diversity (Dmax) is estimated as:
The m-MAD profile is defined as the set of Dmax values corresponding to different diversity orders q. qDmax serves as a proxy for potential (“dark”) diversity—genes or functional clusters absent locally but present in regional or global metagenomic pools (Partel et al., 2011; Real et al., 2017; Ma, 2019; Ma and Ellison, 2021).
Pair-wise diversity overlap
Assuming equal “areas” for sampled individuals, the scaling parameter z from the basic m-DAR model (Equations 2, 4) was used to estimate pair-wise diversity overlap (PDO). The PDO, g, between two individuals (i.e., the proportion of new diversity in the second individual) is:
Here, g ranges from 0 (no overlap; z = 1) to 1 (complete overlap; z = 0). The m-PDO profile is the set of g values corresponding to different diversity orders (q), approximating the similarity between pairs of human metagenomes.
Results
Metagenomic diversity-area relationships
The basic m-DAR power law (PL) model (Equations 2, 4) and the PLEC model (Equations 3, 5) both demonstrated satisfactory fit to the faecal and mucous microbiome data (p < 0.05; Table 1; Supplementary Table S1; Figure 1). For diversity order q = 0 and q = 1, the scaling parameter (z) in Equation 2 for metagenomic genes (MGs) of mucous microbiome (Mhm and Mpm) was found to be larger than that of faecal microbiome (Mhf and Mpf). For example, the diversity order q = 0 yielded values of 0.845 and 0.887 for Mhm and Mpm, respectively, while the values for Mhf and Mpf were 0.576 and 0.618, respectively, the parameter of the total samples fell between these two values (0.720). Nevertheless, for diversity order q = 2 and q = 3, the z value of the mucous microbiome (Mhm and Mpm) is less than that of the faecal microbiome (Mhf and Mpf). Additionally, the z value of the total samples is greater than that of the aforementioned four split groups. From an alternative standpoint, the z-value of healthy cohorts (Mhm and Mhf) of mucous and faecal microbiome exhibits a lower value than that observed in IBD cohorts (Mpm and Mpf) in diversity order q = 0. However, in diversity orders q = 1, 2, and 3, this value is larger (with the exception of the negative value observed in Mpm in q = 2 and q = 3).
Table 1. The key parameters of m-DAR (metagenome diversity-area relationship) models fitted for metagenomic gene (MG) diversity, averaged from 100 times of re-sampling.
Figure 1. The scaling parameter (z), ln(c), g, and Dmax of the m-DAR (metagenome-diversity area relationship) for the metagenomic-genes (MGs) of the healthy cohorts, IBD cohorts, and the total samples. Z and ln(c) are model fitting parameters, z is the diversity scaling, g is pair-wise diversity overlap (PDO), and Dmax is maximal accrual diversity (MAD), which reflects the total potential diversity. Mhm is mucous microbiome of helahy cohorts, Mpm is mucous microbiome of IBD cohorts. Mhf is faecal microbiome of healthy cohorts and Mpf is faecal microbiome of IBD cohorts.
This pattern indicates fundamental differences in community organization. The higher z-values in mucosal samples at lower diversity orders (q = 0, 1) suggest that mucosal microbial communities are more dissimilar between individuals in terms of gene richness and the diversity of moderately abundant genes. In contrast, the higher z-values in faecal samples at higher diversity orders (q = 2, 3) point to a greater role of dominant, highly abundant genes in driving inter-individual differences in the luminal environment. The shift in z-values between healthy and IBD cohorts suggests that IBD is associated with increased variability in gene richness between individuals, but a more homogenized set of dominant genes.
As would be expected, the pair-wise diversity overlap (PDO) profile exhibited the opposite pattern of the m-DAR profile (parameter g in Table 1; Supplementary Table S1). This is because z in the m-DAR profile quantifies the dissimilarity of neighboring individuals whereas g in the PDO profile quantifies the overlap or similarity between individuals (Equation 8). The PLEC model indicated the existence of an asymptote for MG diversity (d < 0 and Dmax values presented in Table 1) (Equations 6 and 7), as well as the number of individual subjects (Amax values in Table 1; Supplementary Table S1) required to reach this asymptote. The maximal accrual diversity (MAD) of MGs in the gut metagenome is 9.2 × 105 for the total samples of diversity order q = 0, which represents the estimated gene richness for this cohort. The faecal microbiome exhibits a larger MAD than the mucous microbiome for all four diversity orders (q = 0, 1, 2, 3; see Table 1; Figure 1). The magnitude of this difference is approximately four to seven times larger in the faecal microbiome than in the mucous microbiome (879,346 vs. 198,752 for Mhf vs. Mhm, 779,334 vs. 104,668 for Mpf vs. Mpm for diversity order q = 0). In comparing the healthy cohorts with the IBD cohorts, the former cohorts exhibited a larger MAD than the latter cohorts, with the exception of the NA value observed in Mpm in both q = 2 and q = 3.
The significantly higher MAD in faecal samples implies that the luminal microbiome possesses a much larger total potential diversity, or “gene pool,” than the mucosa-associated microbiome. This is consistent with the faecal microbiome representing a transient and diverse collection of microbes from throughout the gut, while the mucosal community is a more specialized, host-adapted subset. The generally greater MAD in healthy cohorts further suggests that a healthy state is associated with a greater reservoir of microbial genetic potential, which may be depleted in IBD.
Metagenome functional gene cluster (MFGC) diversity-area relationships
The MFGC-DAR PL model was observed to accommodate all MFGC randomizations, as detailed in Supplementary Tables S2–S8. In general, the z-values for the mucous microbiome (Mhm and Mpm) are greater than those for the faecal microbiome (Mhf and Mpf) for diversity order q = 0 of MFGCs from all seven databases (Supplementary Tables S2–S8). For diversity orders q > 0, the z-value exhibited a similar pattern to that observed for q = 0 in the majority of MFGCs across all seven databases, with the exception of a few instances where negative values were observed, particularly for higher orders of Mpm. In the comparisons between healthy cohorts and IBD cohorts, the z-values of the healthy cohorts are lower than those of the IBD cohorts in the mucous microbiome across all seven databases, with the exception of instances where negative values were observed. However, this is not the case for the z-values of the faecal microbiome, which differ between databases (Supplementary Tables S2–S8). Nevertheless, the majority of z-values are observed to be larger in healthy cohorts than in IBD cohorts across all diversity orders (Figure 2).
Figure 2. The scaling parameter (z), ln(c), g, and Dmax of the m-DAR (metagenome-diversity area relationship) for the MFGCs of KEGG database of the healthy cohorts, IBD cohorts and the total samples. Z and ln(c) are model fitting parameters, z is the diversity scaling, g is pair-wise diversity overlap (PDO), and Dmax is maximal accrual diversity (MAD), which reflects the total potential diversity. Mhm is mucous microbiome of heahty cohorts, Mpm is mucous microbiome of IBD cohorts. Mhf is faecal microbiome of healthy cohorts and Mpf is faecal microbiome of IBD cohorts.
As same patterns described above in m-DAR of MGs, the pair-wise diversity overlap (PDO) profile exhibited the opposite pattern of the MFGC-DAR profile (Supplementary Tables S2–S8). This is because z in the MFGC-DAR profile quantifies the dissimilarity of neighboring individuals whereas g in the PDO profile quantifies the overlap or similarity between individuals. As the PLEC mode of m-DAR, The PLEC model of MFGC-DAR also indicated the existence of an asymptote for MFGC diversity (d < 0 and Dmax values presented in Supplementary Tables S2–S8), as well as the number of individual subjects (Amax values in Supplementary Tables S2–S8) required to reach this asymptote. The faecal microbiome displays a greater MAD than the mucous microbiome across all four diversity orders (q = 0, 1, 2, 3) for seven databases, as illustrated in Supplementary Tables S2–S8. A comparison of the healthy cohorts with the IBD cohorts revealed that the former exhibited a larger MAD than the latter, with the exception of the NA value observed in Mpm. The same patterns are observed in MFGC-DAR with m-DAR models.
Metagenomic genes of MFGCs diversity-area relationships
We proceeded to fit DAR models with MGs of each MFGC (for the purposes of this example, we selected the MFGCs contained within the ARDB database, which included those with more than 10 MGs). Figure 3 illustrates the z-value of the DAR model for each MFGC, which is employed to facilitate the sorting of the MFGCs for each cohort. As illustrated in the figure, the MFGCs with the highest z-values in the faecal microbiome are ardb 35, ardb 43 and ardb 17 in Mhf, and ardb35, ardb 17 and ardb8 in Mpf. The MFGCs with the highest z-values in the mucous microbiome are ardb1, ardb7, and ardb10 in Mhm, and ardb10, ardb1, and ardb25 in Mpm. Ardb 35 is “Group B chloramphenicol acetyltransferase, which can inactivate chloramphenicol. Also referred to as xenobiotic acetyltransferase,” ardb 17 is “Virginiamycin A acetyltransferase, which can inactivate the target drug,” ardb 1 is “VanG type vancomycin resistance operon genes, which can synthesize peptidoglycan with modified C-terminal D-Ala-D-Ala to D-alanine--D-serine,” ardb 10 is “ABC transporter system, bacitracin efflux pump.” It is possible that the MFGC with different z-values may play a distinct role in the development of IBD.
Figure 3. The scaling parameter (z) of the MFGC-DAR (metagenome-diversity area relationship) for the metagenomic-genes (MGs) of each ARDB MFGCs for the healthy cohorts, IBD cohorts, and the total samples, respectively.
The distinct scaling behaviors of specific antibiotic resistance genes (ARGs) are noteworthy. The top-scaling ARGs in faecal samples (ardb35: chloramphenicol acetyltransferase; ardb17: virginiamycin A acetyltransferase) and mucosal samples (ardb1: vancomycin resistance operon; ardb10: bacitracin efflux pump) suggest that these resistance mechanisms may play niche-specific and potentially important roles in shaping the microbial community structure in health and disease, possibly reflecting selective pressures within the gut ecosystem of IBD patients.
Discussion
The application of the diversity-area relationship (DAR) and its metagenomic counterpart (m-DAR) to the gut microbiome has provided novel insights into the spatial scaling of microbial gene and functional diversity in both healthy individuals and those with inflammatory bowel disease (IBD). By extending the classic species-area relationship (SAR; Connor and McCoy, 1979) to metagenomic data, this study demonstrates how microbial diversity scales across individuals and highlights the distinct organizational patterns of the gut microbiome in health and disease. Our findings reveal significant differences in diversity scaling between faecal and mucosal microbiomes, as well as between healthy and IBD cohorts, underscoring the utility of DAR and m-DAR in characterizing microbial ecosystems (Ma, 2018a; Ma and Li, 2018).
One of the key findings of this study is the contrasting diversity scaling patterns between faecal and mucosal microbiomes. For metagenomic genes (MGs), the mucosal microbiome exhibited higher scaling parameters (z-values) than the faecal microbiome at diversity orders q = 0 and q = 1, indicating greater dissimilarity between individuals in mucosal communities at lower diversity orders. However, this trend reversed at q = 2 and q = 3, where the faecal microbiome showed higher z-values, suggesting a greater influence of dominant genes in faecal communities. These results align with the known ecological differences between mucosal and luminal (faecal) environments. The mucosal microbiome is shaped by host-derived factors such as immune responses and mucosal adhesion, leading to higher inter-individual variability at the level of gene richness (q = 0) and frequency-weighted diversity (q = 1). In contrast, the faecal microbiome, which is influenced by diet and transit time, tends to have a more stable core of dominant taxa, reflected in higher z-values at q = 2 and q = 3 (Ma, 2018a; Ma and Li, 2018).
The maximal accrual diversity (MAD) further highlighted these differences, with the faecal microbiome exhibiting significantly higher MAD than the mucosal microbiome across all diversity orders. This suggests that faecal samples capture a broader range of microbial diversity, likely due to their representation of transient luminal communities. In contrast, mucosal samples, while more variable between individuals, may reflect a more specialized and host-adapted subset of the microbiome. These findings have important implications for microbiome sampling strategies, as they underscore the complementary nature of faecal and mucosal samples in capturing different aspects of microbial diversity (Ma, 2018a; Ma et al., 2018).
The comparison between healthy and IBD cohorts revealed distinct scaling patterns that may reflect the underlying pathophysiology of IBD. At q = 0, IBD cohorts exhibited higher z-values than healthy cohorts, indicating greater inter-individual variability in gene richness between individuals within IBD. This finding aligns with the established concept of dysbiosis in IBD, where disease-associated changes lead to a less stable and more heterogeneous microbial community structure between patients (Frank et al., 2007; Gevers et al., 2014). However, at higher diversity orders (q = 1, 2, and 3), healthy cohorts showed higher z-values, suggesting that IBD cohorts have a more uniform distribution of dominant genes. This could reflect the loss of rare taxa and the expansion of dominant, potentially pathogenic species in IBD, a phenomenon previously observed in microbiome studies of IBD patients (Ma, 2018a; Ma and Ellison, 2019).
The MAD results further supported these observations, with healthy cohorts generally exhibiting higher MAD than IBD cohorts. This suggests that healthy microbiomes have a greater potential for accruing diversity, possibly due to the presence of a more balanced and resilient community. In contrast, the reduced MAD in IBD cohorts may reflect the loss of microbial diversity and functional redundancy associated with disease. These findings highlight the potential of DAR and m-DAR as tools for quantifying dysbiosis and monitoring disease progression in IBD (Ma, 2018a; Ma and Li, 2018).
The analysis of metagenomic functional gene clusters (MFGCs) provided additional insights into the functional organization of the gut microbiome. Similar to the patterns observed for MGs, mucosal microbiomes showed higher z-values than faecal microbiomes at q = 0, but this trend varied across functional databases and diversity orders. Notably, specific MFGCs associated with antibiotic resistance, such as chloramphenicol acetyltransferase (ardb35) and vancomycin resistance operon genes (ardb1), exhibited distinct scaling behaviors. These MFGCs were among the top contributors to diversity scaling in both faecal and mucosal microbiomes, suggesting their potential roles in shaping microbial community structure (Ma and Ellison, 2024; Ma and Li, 2018). The prominence of antibiotic resistance genes in IBD cohorts is particularly noteworthy, as it may reflect the selective pressures imposed by antibiotic use or the dysregulated immune responses characteristic of IBD. The distinct scaling behaviors of these MFGCs suggest that they may contribute to the functional dysbiosis observed in IBD, potentially influencing disease severity and treatment outcomes. Future studies could explore the functional consequences of these scaling patterns, such as their impact on microbial resilience and host–microbe interactions (Ma, 2018a; Ma and Ellison, 2019).
The application of DAR and m-DAR to metagenomic data represents a significant methodological advancement in microbiome research. By extending the SAR framework—a cornerstone of island biogeography and macroecology—to include gene and functional diversity, these models provide a unified approach for quantifying microbial diversity across spatial scales. The parameter z, which quantifies the rate of diversity accumulation, has a direct analog in ecological theory, where higher z-values are typically associated with communities in more heterogeneous environments. Our observation of higher z-values in mucosal samples at lower q-orders may therefore be interpreted as evidence of a more individualized and patchy ecological landscape at the mucosal interface, consistent with its role as a primary host–microbe interaction site. The use of Hill numbers and the PLEC model, which accounts for finite diversity, further enhances the ecological realism of this approach (Chao et al., 2014; Ma, 2018a).
The inclusion of the power law with exponential cutoff (PLEC) model adds another layer of ecological realism, as it accounts for the finite nature of microbial diversity in human populations. The estimation of maximal accrual diversity (MAD) and pair-wise diversity overlap (PDO) further enriches the analytical toolkit, providing measures of potential (“dark”) diversity and inter-individual similarity, respectively. These metrics have broad applicability beyond IBD, offering new ways to study microbial diversity in other diseases, environmental microbiomes, and host-associated ecosystems (Ma, 2018a; Ma and Li, 2018).
While this study demonstrates the utility of DAR and m-DAR in characterizing microbial diversity, several limitations should be acknowledged. First, the sample size, though sufficient for initial exploration, may limit the generalizability of the findings. Second, potential technical biases, including variation in sequencing depth across samples, the dependence of functional annotation on the choice of reference databases, and the inherent sampling heterogeneity between faecal and mucosal sites (e.g., differences in biomass and host DNA contamination), could influence diversity estimates and comparisons. Larger, multi-center cohorts and standardized protocols would help mitigate these issues. Third, the reliance on cross-sectional data limits causal inference. Longitudinal studies tracking changes in diversity scaling over time and in response to treatment are needed.
Future research should explore the integration of DAR and m-DAR with other omics data to provide a more holistic understanding of host–microbe interactions (Prins et al., 2024; Valdés-Mas et al., 2025; Zhang et al., 2025; Ning et al., 2023; Zhao et al., 2025). For instance, recent work has shown that circulating and tissue microRNAs, such as those identified by Tocia et al. (2023), reflect disease activity in Crohn’s disease. Combining such host-derived molecular biomarkers with our ecological scaling analysis could bridge the gap between microbial community structure and host response, significantly broadening the translational relevance of this approach. Applying these models to other microbial ecosystems could also reveal general principles of microbial diversity scaling.
In conclusion, this study demonstrates the power of DAR and m-DAR in characterizing the spatial scaling of microbial gene and functional diversity in the gut microbiome. By revealing distinct scaling patterns in faecal and mucosal microbiomes, as well as between healthy and IBD cohorts, these models provide new insights into the ecological organization of the gut microbiome and its role in health and disease. The identification of specific MFGCs associated with antibiotic resistance further highlights the potential of this approach for uncovering functional drivers of dysbiosis. As microbiome research continues to evolve, DAR and m-DAR offer promising tools for advancing our understanding of microbial ecosystems and their implications for human health (Ma, 2018a; Ma and Li, 2018).
Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.
Author contributions
FY: Project administration, Writing – original draft, Investigation. JS: Writing – original draft, Formal analysis, Conceptualization, Data curation. LQ: Software, Writing – original draft, Investigation, Resources. JL: Conceptualization, Writing – original draft, Investigation. YY: Writing – original draft. WL: Writing – original draft, Investigation, Conceptualization. LL: Writing – review & editing, Formal analysis, Conceptualization. ZM: Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by grants from Key Research and Development Program of Yunnan Province (202402AA310017), Basic Research Program of Shanxi Province (202203021222244).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Gen AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2025.1660973/full#supplementary-material
References
Benchimol, E. I., Guttmann, A., Griffiths, A. M., Rabeneck, L., Mack, D. R., Brill, H., et al. Increasing incidence of paediatric inflammatory bowel disease in Ontario, Canada: evidence from health administrative data. Gut. (2009) 58:1490–7. doi: 10.1136/gut.2009.188383
Benchimol, E. I., Mack, D. R., Nguyen, G. C., Snapper, S. B., Li, W., Mojaverian, N., et al. Incidence, outcomes, and health services burden of very early onset inflammatory bowel disease. Gastroenterology. (2014) 147:803–813. doi: 10.1053/j.gastro.2014.06.023
Brom, K. M., Hooten, M. B., and Fizpatrick, R. M. (2015). Accounting for imperfect detection in hill numbers for biodiversity studies. Methods Ecol. Evol. 6, 99–108. doi: 10.1111/2041-210X.12296
Chao, A., Chiu, C. H., and Hsieh, T. C. (2012). Proposing a resolution to debates on diversity partitioning. Ecology 93, 2037–2051. doi: 10.1890/11-1817.1,
Chao, A., Chiu, C. H., and Jost, L. (2014). Unifying species diversity, phylogenetic diversity, functional diversity and related similarity and differentiation measures through Hill numbers. Annu. Rev. Ecol. Evol. Syst. 45, 297–324. doi: 10.1146/annurev-ecolsys-120213-091540
Chen, H., Yi, B., Liu, Q., Xu, X., Dai, L., and Ma, Z. S. (2021). Diversity scaling of human digestive tract (DT) microbiomes the intra-DT and inter-individual patterns. Front. Genet. 12:724661. doi: 10.3389/fgene.2021.724661,
Connor, E. F., and McCoy, E. D. (1979). The statistics and biology of the species–area relationship. Am. Nat. 113, 791–833. doi: 10.1086/283438
Costello, E. K., Stagaman, K., Dethlefsen, L., Bohannan, B. J. M., and Relman, D. A. (2012). The application of ecological theory toward an understanding of the human microbiome. Science 336, 1255–1262. doi: 10.1126/science.1224203,
Frank, D. N., St Amand, A. L., Feldman, R. A., Boedeker, E. C., Harpaz, N., and Pace, N. R. (2007). Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc. Natl. Acad. Sci. USA 104, 13780–13785. doi: 10.1073/pnas.0706625104,
Gevers, D., Kugathasan, S., Denson, L. A., Vázquez-Baeza, Y., Van Treuren, W., Ren, B., et al. (2014). The treatment-naive microbiome in new-onset Crohn's disease. Cell Host Microbe 15, 382–392. doi: 10.1016/j.chom.2014.02.005,
Hammer, T., Nielsen, K. R., Munkholm, P., et al. The Faroese IBD Study: Incidence of Inflammatory Bowel Diseases Across 54 Years of Population-based Data. J Crohns Colitis (2016) 10:934–42.
Hanson, C. A., Fuhrman, J. A., Horner-Devine, M. C., and Martiny, J. B. H. (2012). Beyond biogeographic patterns: process shaping the microbial landscape. Nat. Rev. Microbiol. 10, 497–506.
Hill, M. O. (1973). Diversity and evenness: a unifying notation and its consequences. Ecology 54, 427–342.
Jost, L. (2007). Partitioning diversity into independent alpha and beta components. Ecology 88, 2427–2439. doi: 10.1890/06-1736.1,
Kaplan, G. G. The global burden of IBD: from 2015 to 2025. Nat Rev Gastroenterol Hepatol. (2015) 12:720–7. doi: 10.1038/nrgastro.2015.150
Li, L. W., and Ma, Z. S. (2019). Global microbiome diversity scaling in hot springs with DAR (diversity-area relationship) profiles. Front. Microbiol. 10:118. doi: 10.3389/fmicb.2019.00118
Li, W. D., and Ma, Z. S. (2019). Diversity scaling of human vaginal microbial communities. Zool. Res. 40, 587–594. doi: 10.24272/j.issn.2095-8137.2019.068
Li, W., Sun, Y., Dai, L., Chen, H., Yi, B., Niu, J., et al. (2021). Ecological and network analyses identify four microbial species with potential significance for the diagnosis/treatment of ulcerative colitis (UC). BMC Microbiol 21, 138. doi: 10.1186/s12866-021-02201-6
Ma, Z. S. (2018a). Sketching the human microbiome biogeography with DAR (diversity-area relationship) profiles. Microb. Ecol. 77, 821–838. doi: 10.1007/s00248-018-1245-6,
Ma, Z. S. (2018b). Diversity time-period and diversity-time-area relationships exemplified by the human microbiome. Sci. Rep. 8:7214. doi: 10.1038/s41598-018-24881-3,
Ma, Z. S. (2018c). Extending species-area relationships (SAR) to diversity-area relationships (DAR). Ecol. Evol. 8, 10023–10038. doi: 10.1002/ece3.4425
Ma, Z. S. (2019). A new DTAR (diversity–time–area relationship) model demonstrated with the indoor microbiome. J. Biogeogr. 46, 2024–2041. doi: 10.1111/jbi.13636
Ma, Z. S. (2020) Predicting the Outbreak Risks and Inflection Points of COVID-19 Pandemic with Classic Ecological Theories. Advanced Science, 7:1–15.
Ma, Z. S., and Ellison, A. M. (2018) A unified concept of dominance applicable at both community and species scale. Ecosphere, doi: 10.1002/ecs2.2477
Ma, Z. S., and Ellison, A. M. (2019) Dominance network analysis provides a new framework for studying the diversity-stability relationship. Ecological Monographs. doi: 10.1002/ecm.1358
Ma, Z., and Ellison, A. M. (2021). Toward a unified diversity–area relationship (DAR) of species and gene diversity illustrated with the human gut metagenome. Ecosphere 12:e03807. doi: 10.1002/ecs2.3807
Ma, Z. S., and Ellison, A. M. (2024). Cross-scale scaling-law analyses for the heterogeneity and diversity of animal gut microbiomes from community to landscape. Oikos 2, e1059. doi: 10.1111/oik.10598
Ma, Z. S., and Li, L. W. (2018). Semen microbiome biogeography: an analysis based on a Chinese population study. Front. Microbiol. 9:3333. doi: 10.3389/fmicb.2018.03333
Ma, Z. S., Li, L. W., and Li, W. D. (2018). Assessing and interpreting the within-body biogeography of human microbiome diversity. Front. Microbiol. 9:1619. doi: 10.3389/fmicb.2018.01619,
Martiny, J. B. H., Bohannan, B. J. M., Brown, J. H., Colwell, R. K., Fuhrman, J. A., Green, J. L., et al. (2006). Microbial biogeography: putting microorganisms on the map. Nat. Rev. Microbiol. 4, 102–112. doi: 10.1038/nrmicro1341,
Molodecky, N. A., Soon, I. S., Rabi, D. M., Ghali, W. A., Ferris, M., Chernoff, G., et al.Increasing incidence and prevalence of the inflammatory bowel diseases with time, based on systematic review. Gastroenterology. (2012) 142:46–54.e42. doi: 10.1053/j.gastro.2011.10.001
Ning, L., Zhou, Y. L., Sun, H., Zhang, Y., Shen, C., Wang, Z., et al. (2023). Microbiome and metabolome features in inflammatory bowel disease via multi-omics integration analyses across cohorts. Nat. Commun. 14:7135. doi: 10.1038/s41467-023-42788-0,
Partel, M., Szava-Kovats, R., and Zobel, M. (2011). Dark diversity: shedding light on absent species. Trends Ecol. Evol. 26, 124–128. doi: 10.1016/j.tree.2010.12.004,
Plotkin, J. B., Potts, M. D., Yu, D. W., Bunyavejchewin, S., Condit, R., Foster, R., et al. (2000). Predicting species diversity in tropical forests. Proc. Natl. Acad. Sci. USA 97, 10850–10854. doi: 10.1073/pnas.97.20.10850,
Prins, F. M., Hidding, I. J., Klaassen, M. A. Y., Collij, V., Schultheiss, J. P. D., Uniken Venema, W. T. C., et al. (2024). Limited predictive value of the gut microbiome and metabolome for response to biological therapy in inflammatory bowel disease. Gut Microbes 16:2391505. doi: 10.1080/19490976.2024.2391505,
Real, R., Barbosa, A. M., and Bull, J. W. (2017). Species distributions, quantum theory, and the enhancement of biodiversity measures. Syst. Biol. 66, 453–462. doi: 10.1093/sysbio/syw072,
Rocchi, A., Benchimol, E. I., Bernstein, C. N., Bitton, A., Feagan, B., and Panaccione, R. Inflammatory bowel disease: a Canadian burden of illness review. Can J Gastroenterol. (2012) 26:811–7. doi: 10.1155/2012/984575
Tocia, C., Dumitru, A., Mateescu, B., Negreanu, L., State, M., Cozaru, G. C., et al. (2023). Tissue and circulating MicroRNA-31, MicroRNA-200b, and MicroRNA-200c reflects disease activity in Crohn's disease patients: results from the BIOMIR study. J. Gastrointestin. Liver Dis. 32, 30–38. doi: 10.15403/jgld-4656,
Ulrich, W., and Buszko, J. (2003). Self-similarity and the species-area relation of polish butterflies. Basic Appl. Ecol. 4, 263–270. doi: 10.1078/1439-1791-00139
Valdés-Mas, R., Leshem, A., Zheng, D., Cohen, Y., Kern, L., Zmora, N., et al. (2025). Metagenome-informed metaproteomics of the human gut microbiome, host, and dietary exposome uncovers signatures of health and inflammatory bowel disease. Cell 188, 1062–1083.e36. doi: 10.1016/j.cell.2024.12.016,
van der Gast, C. J. (2013). Microbial biogeography and what baas Becking should have said. Microbiol. Today 40, 108–111.
Xiao, W. M., and Ma, Z. S. (2021). Inter-individual diversity scaling analysis of the human Virome with classic diversity-area relationship (DAR) modeling. Front. Genet. 2:627128. doi: 10.3389/fgene.2021.627128
Zhang, J., Mak, J. W. Y., and Ng, S. C. (2025). Gut microbiome in IBD: past, present and the future. Gut :gutjnl-2025-335626. doi: 10.1136/gutjnl-2025-335626,
Keywords: inflammatory bowel disease (IBD), diversity-area relationship (DAR), metagenomic genes (MGs), metagenomic functional gene clusters (MFGCs), maximal accrual diversity (MAD)
Citation: Yu F, Song J, Qi L, Liu J, Yang Y, Li W, Li L and Ma ZS (2026) Gene and function diversity-area relationships in the inflammatory bowel disease fecal and mucosal microbiome. Front. Microbiol. 16:1660973. doi: 10.3389/fmicb.2025.1660973
Edited by:
Xinming Tang, Chinese Academy of Agricultural Sciences (CAAS), ChinaReviewed by:
Meng Zhang, Inner Mongolia Agricultural University, ChinaLuana Alexandrescu, County Clinical Emergency Hospital of Constanta, Romania
Copyright © 2026 Yu, Song, Qi, Liu, Yang, Li, Li and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Lianwei Li, bGlsaWFud2VpQG1haWwua2l6LmFjLmNu; Zhanshan (Sam) Ma, bWFAdmFuZGFscy51aWRhaG8uZWR1
†These authors have contributed equally to this work
Fubing Yu1†