Diversity Scaling Analysis of Chinese Gut Microbiomes Across Ethnicities and Lifestyles

Diversity scaling (changes) of human gut microbiome is important because it measures the inter-individual heterogeneity of diversity and other important parameters of population-level diversity. Understanding the heterogeneity of microbial diversity can be used as a reference for the personalized medicine of microbiome-associated diseases. Similar to diversity per se, diversity scaling may also be influenced by host factors, especially lifestyles and ethnicities. Nevertheless, this important topic regarding Chinese populations has not been addressed, to our best knowledge. Here, we fill the gap by applying a recent extension to the classic species–area relationship (SAR), i.e., diversity–area relationship (DAR), to reanalyze a large dataset of Chinese gut microbiomes covering the seven biggest Chinese ethnic groups (covering > 95% Chinese) living rural and urban lifestyles. Four DAR profiles were constructed to investigate the diversity scaling, diversity overlap, potential maximal diversity, and the ratio of local to global diversity of Chinese gut microbiomes. We discovered the following: (i) The diversity scaling parameters (z) at various taxon levels are little affected by either ethnicity or lifestyles, as exhibited by less than 0.5% differences in pairwise comparisons. (ii) The maximal accrual diversity (potential diversity) exhibited difference in only about 5% of pairwise comparisons, and all of the differences occurred in ethnicity comparisons (i.e., lifestyles had no effects). (iii) Ethnicity seems to have stronger effects than lifestyles across all taxon levels, and this may reflect the reality that China has been experiencing rapid urbanization in the last few decades, while the ethnic-related genetic background may change relatively little during the same period.


INTRODUCTION
The microbes that inhabit in and on the human body constitute the human microbiota. Thanks to the development of the Human Microbiome Project (HMP), we have opened up a new era in the study of human microbes (Turnbaugh et al., 2007). As a vital component of the human microbiome, the human gut microbiome has been extensively studied. Many studies exposed the truth that the gut microbiome has an important influence on the occurrence or development of inflammatory bowel disease, type 2 diabetes, obesity, epilepsy, and other diseases (Hollister et al., 2014;Ni et al., 2017;Heintz-Buschart and Wilmes, 2018;Canfora et al., 2019;Dahlin and Prast-Nielsen, 2019;Gurung et al., 2020;Rackaityte and Lynch, 2020). As far as we know, interindividual heterogeneities in diversity scaling and abundance also existed in healthy populations (Arumugam et al., 2011;Human Microbiome Project Consortium, 2012;Zhang et al., 2015). Since we were born, we have been constantly building our own microbial network. Factors such as diet, lifestyle, genetics, geography, and ethnic origin all play significant roles in shaping the human gut microbiomes (Yatsunenko et al., 2012;Miller et al., 2016;De Filippo et al., 2017;Kabwe et al., 2020).
A typical analysis of unweighted UniFrac principal coordinates for 314 individuals showed that individual microbial communities were clustered mainly based on their ethnicity/geography rather than lifestyle, and the principal component analysis and principal coordinates analysis (PCoA) revealed that there is a large inter-individual compositional variation . In the process of identifying the relative contributions of various factors, a recent study has found that, compared with host genetics, "environment" has a stronger ability to shape the human gut microbiome, and more than 20% of the inter-individual heterogeneities in microorganisms can be attributed to diet, drugs, and anthropometric measurements (Rothschild et al., 2018). By analyzing the gut microbiome of 7,009 individuals from 14 districts in one Chinese province, He et al. (2018) found that the geographical location of the host had the strongest correlation with gut microbiota variation. Researchers investigated the fecal microbiome of 2,084 different ethnic individuals and found that individuals living in the same city tended to have similar gut microbiome characteristics, regardless of ethnic origins (Deschasaux et al., 2018). Similar to bacterial communities, fungi in urban and rural South African individuals also exhibit different community structures (Kabwe et al., 2020). The human gut microbiome has been extensively and intensively studied from various aspects in the last decade, especially the ethnic differences among different races/ethnicities (Yap et al., 2011;Yatsunenko et al., 2012;Li et al., 2014;Obregon-Tito et al., 2015;Gomez et al., 2016;Gupta et al., 2017;Gaulke and Sharpton, 2018;Lin et al., 2020;Zuo et al., 2020;Sun et al., 2021). A recent study found that 20 host factors were significantly associated with human enterovirus variation, with geographic factors having the greatest impact and ethnically diverse diets associated with specific viral species (Zuo et al., 2020).
Understanding the relationship between human microbiome and various factors has far-reaching significance such as promoting personalized precision medicine (Herd et al., 2018). With the deepening of our understanding of the microbiome and advances in science and technology, microbiome-based diagnostic applications have been proposed in the diagnosis and prognosis of inflammatory bowel disease, pre-screening of colorectal cancer, treatment selection of melanoma, and early diagnosis and risk assessment of metabolic and cardiovascular diseases (Konstantinov et al., 2013;Tang et al., 2013;Dubinsky and Braun, 2015;Bouter et al., 2017;Versalovic et al., 2017;Routy et al., 2018). Based on metagenomic analysis, Yu et al. (2017) have even explored the potential of fecal microbiome as biomarkers for early diagnosis of colorectal cancer.
The objective of this study is to investigate the effects of ethnicities and lifestyles on the diversity scaling of the Chinese gut microbiomes. To achieve this objective, we choose to reanalyze a large dataset of over 300 Chinese covering the top seven most populous Chinese ethnic groups living rural or urban lifestyles, which was originally published by Zhang et al. (2015). Methodically, we apply the diversity-area relationship (DAR) approach extended by Ma (2018Ma ( , 2019b to reanalyze the Zhang et al. (2015) datasets. As an extension to the classic speciesarea relationship (SAR), the DAR model adopted Hill numbers as diversity measures to compensate for the limitation that SAR considers only species richness but ignores species richness in practical application.

The Chinese Gut Microbiome Dataset
We performed a secondary analysis of the Chinese gut dataset of 314 healthy young volunteers published by Zhang et al. (2015). The unrelated volunteers came from urban (145 samples) and rural (169 samples) areas in nine provinces of China, covering seven ethnic groups, namely, the Bai, Han, Kazakh, Mongol, Tibetan, Uyghur, and Zhuang, aged between 18 and 35 years. Pyrosequencing was performed on the V5-V6 region of the 16S ribosomal RNA (rRNA) gene of bacteria and archaea from 314 fecal samples, and a total of 5,102,015 high-quality sequences were generated. After operational taxonomic unit (OTU) classification by QIIME v-1.2.1 (Quantitative Insights Into Microbial Ecology, Caporaso et al., 2010), the microbial abundance information at four levels of phylum, family, genus, and species was finally obtained.

Analysis Design Schemes
In order to better compare the influence of ethnicities and lifestyles on individual gut microbiome, before the DAR analysis, we developed four schemes: (1A) the 314 individuals were divided into two cohorts according to their lifestyles (rural or urban) and compared; (1B) the 314 individuals were divided into 14 cohorts according to their ethnicities and lifestyles, and the urban and rural cohorts with the same ethnicity were compared; (2A) the 314 individuals were divided into seven cohorts according to their ethnicities with rural and urban combined and compared in pairs; and (2B) the 314 individuals were divided into 14 cohorts according to their ethnicities and lifestyles, and the different ethnic cohorts with the same lifestyle were compared (see Figure 1 for more details). In addition, we also integrated all the samples into one cohort (the total cohort) for model fitting. in order to avoid the possible deviation caused by any sorting, we sorted all samples of each cohort, randomly selected 100 results for DAR model fitting, and finally calculated the mean value as the parameter of the DAR model. The DAR model used here is an extension of the classic SAR model (Plotkin et al., 2000;Ulrich and Buszko, 2003;Tjørve, 2009). Hill number diversity was introduced into the DAR model, which extended the traditional SAR model of species abundance to a more comprehensive level of diversity and quantified the variation of community diversity on the spatial scale (Chao et al., 2012(Chao et al., , 2014aMa, 2018Ma, , 2019b. Diversity and area conform to the power law model: where q D means the diversity of order q, A is the area, and c and z are heterogeneity parameters. And on that basis, Ma further extended the general power law to the PLEC model (Ma, 2018(Ma, , 2019b: where d is a parameter that is usually negative in the DAR models and exp(dA) is responsible for the exponential decay of the PLEC model.
In order to estimate the parameters of the DAR model, linear transformation was made above the power law equation: The linear correlation coefficient R and p-value can be used to evaluate the efficacy of the fitting of the model. Based on above models, four diversity scaling profiles were defined for characterizing the biogeography maps of the microbial community: (i) The DAR profile describes the relationship between diversity scaling parameter (z) and diversity order (q) of the DAR model. The DAR profile can be used to quantify the changes of diversity scaling under different diversity order (q).
(ii) The PDO (pairwise diversity overlap) profile represents the relationship between the diversity overlap parameter g and diversity order (q) of the DAR-PL model.
where D A is the diversity of area A, D 2A is the diversity of area 2A, and z is the heterogeneity parameter of the DAR model.
(iii) The MAD (maximum accrual diversity) profile describes the relationship between maximum accrual diversity and the diversity order (q) and can be used to estimate the regional or global maximum theoretical value of microbial community diversity: where A max =−z/d , which represents the area when the diversity is maximized.
(iv) The LGD (the ratio of local diversity to global accrual diversity) profile represents the ratio of sample diversity to global diversity and it can be used to estimate the proportion of the microbial community within a region on a global scale : where c is the parameter of the DAR-PL model at the diversity order of q and D max corresponding to the diversity order can be generated according to Equation (6). After the relevant parameters of all profiles were obtained, we used a permutation (randomization) test to verify the difference of these parameters among all cohorts (Ma, 2019b;Li and Ma, 2019;Ma and Li, 2019).

Diversity-Area Relationship Model Fitting
After fitting the DAR-PL and DAR-PLEC models for each cohort of the Chinese gut microbiome dataset, we got all parameters of the DAR analysis, including the DAR (z), PDO (g), MAD (D max ), and LGD profiles. Table 1 shows the DAR parameters of all cohorts at the species level (for the parameters of other levels, see Supplementary Table 1). We observed that all cohorts fitted the DAR-PL and DAR-PLEC models well, with a p-value < 0.05. In the 100 times of reordering model fitting, especially when the diversity order q is 0, the 100 times of DAR-PL model fitting were all successful, with a p-value of 0.000. However, with the increase of the order of diversity, some model fittings failed. Similar situations also appeared in DAR-PLEC model fitting. Table 2 lists the results (percentages with significant differences) of the permutation tests for the differences in the parameters of the DAR models for the pairwise comparisons of the Chinese gut microbiome dataset (see Supplementary Table 2 for specific results).

Four Diversity-Area Relationship Profiles
(i) DAR profile: The diversity scaling parameter (z) was defined as the DAR profile, which quantitatively describes community similarity. The higher the diversity scaling parameter z is, the greater are the differences between individuals in the community.
With the gradual refinement of the taxonomy level, the interindividual heterogeneity in almost all the cohorts of four schemes progressively increased, as well as the total cohort (phylum: z = 0.169; family: z = 0.255; genus: z = 0.288; species: z = 0.309, when diversity order q = 0). Based on Scheme 1A, for the rural and urban cohorts, from the phylum level to the species level, the urban individuals had a higher diversity heterogeneity at diversity order q = 0. The parameter of diversity scaling (z) of each cohort increased gradually and always kept the trend of the rural cohort being smaller than the urban cohort at q = 0 (Figure 2). We then used Scheme 1B to further determine the influence of lifestyles on the diversity scaling of Chinese gut microbiome. We compared the DAR profiles of urban and rural individuals in the same ethnic group, and there was no statistically significant difference. In order to detect the impact of ethnicities on the DAR profiles, we set up 1,008 pairwise comparisons from phylum to species levels when diversity order q = 0-3 according to Schemes 2A and 2B (Scheme 2A: seven ethnic cohorts with urban and rural combined, 336 pairwise comparisons; Scheme 2B: cohorts with the same lifestyles but different ethnicities, 672 pairwise comparisons). We noticed that there were four pairwise comparisons of the DAR profiles showing statistically significant differences at the species level when diversity order q = 0 (Bai vs. Han; Bai vs. Zhuang; Bai-Rural vs. Han-Rural; Bai-Rural vs. Zhuang-Rural cohorts). Figure 3 displays the DAR profiles of all cohorts from Schemes 2A and 2B at the species level when diversity order q = 0. Different from lifestyles, we found that ethnicities influenced the diversity heterogeneity of the Chinese gut microbiome significantly.
(ii) PDO profile: The parameter of pairwise diversity overlap (g) was defined as the PDO profile, which also quantitatively describes the community similarity. The larger the parameter g is, the higher the overlap is, that is, the smaller the diversity heterogeneity among individuals in the community. In contrast to the DAR profile, with the gradual refinement of the taxonomy level and the rise of the diversity order, the PDO profile of each cohort showed a general downward trend (Figure 4). According to Scheme 1A, we noticed that the PDO profile presented a trend of the rural cohort being higher than the urban cohort. As for Scheme 1B, we found the PDO profiles of urban and rural cohorts of the same ethnic group were similar at various taxonomy levels when diversity order q = 0, except for the Bai nationality. Based on Scheme 2A, we noticed that the PDO profiles of the other ethnic groups, except the Bai, were relatively close. Figure 5 displayed the PDO profiles of all cohorts from Schemes 2A and 2B at the species level when diversity order q = 0.
(iii) MAD profile: The maximum accrual diversity (D max ) was defined as the MAD profile. When diversity order q = 0, D max is the maximum estimated diversity within a community, which means the maximum number of phylum, family, genus, and species. The D max of the total cohort was always the largest in all cohorts at the four taxonomy levels when diversity order q = 0. According to Scheme 1A, we found that the MAD profile of the rural cohort was lower than that of the urban cohort at the phylum level, but the opposite trend was shown at other taxonomy levels (Figure 6). The MAD profiles of the rural and urban cohorts of each ethnic group did not change as consistently as the rise of the taxonomy level based on Scheme 1B. In addition, we found that the potential maximum species number of the Bai nationality was lowest in seven ethnic groups according to Schemes 2A and 2B, although the numbers of potential maximum phyla, families, and genera of the Bai nationality were similar to  those of other ethnic groups. Figure 7 shows the MAD profiles of seven ethnic groups at the species level when diversity order q = 0 (Schemes 2A and 2B). By doing permutation tests for 1,136 pairwise comparisons based on four schemes (1A-2B), we found that only one pairwise comparison (the D max of the Bai-Rural vs. 029.9) differed significantly at the species level when diversity order q = 0. It was undeniable that ethnicity had a statistically significant impact on the potential maximal species richness in the Bai-Rural cohort and the Han-Rural cohorts. Especially with the increase of the diversity order, significant differences were found in more than 50 pairwise comparisons of two schemes (2A and 2B) according to the results of Table 2. These findings suggested that although ethnicities did not influence the potential maximal species richness of most communities, it significantly affected the potential maximal diversity of dominant phylum/family/genus/species. Because when q > 0, the MAD profile described the diversity of dominant phylum/family/genus/species. Interestingly, we did not observe similar effects of lifestyles on the MAD profile. (iv) LGD profile: The LGD profile estimated the ratio of the microbiome diversity within the local scale on a global scale, FIGURE 2 | The DAR profiles of the rural and urban cohorts at phylum, family, genus, and species levels when diversity order q = 0 (Scheme 1A).
FIGURE 3 | The DAR profiles of all cohorts from Schemes 2A and 2B at the species level when diversity order q = 0 (the same number of "*, **, ***, ****" indicates a significant difference in pairwise comparison).
Frontiers in Microbiology | www.frontiersin.org TABLE 2 | The results (percentages with significant differences) of the permutation tests for the differences in the parameters of the DAR models for the pairwise comparisons of Chinese gut microbiome datasets [see Supplementary Table 2  that is, the ratio of human gut microbiome diversity in the global ecosystem. In Scheme 1A, the LGD profile of the rural cohort is higher than that of the urban cohort at the first three taxonomy levels. According to Scheme 1B, we did not observe similar trends. Figure 8 exhibited the LGD ratio of urban and rural cohorts for each race at the species level when diversity order q = 0. At the species level, except for the Bai, Kazakh, and Mongol nationalities, the LGD profiles of other ethnic groups were smaller in the rural areas than in the urban areas; especially for the Tibetans, we have found a significant difference between Tibetan-Rural and Tibetan-Urban cohorts. Based on Scheme 2A, we found that the LGD profiles of all cohorts decreased with the refinement of taxonomy level, except for the Bai cohort. the LGD ratio of all cohorts for Schemes 2A and 2B at the species level.

CONCLUSIONS AND DISCUSSION
We applied the DAR model to explore the influence of ethnicity and lifestyle on the Chinese gut microbiome, by extending the SAR model from species richness to general biodiversity metrics (in Hill numbers) and providing tools to estimate some key biodiversity parameters such as maximal accrual diversity or potential diversity (also known as dark diversity), which includes diversities that are absent locally but present in the regional species pool (so that they may appear at some time point) (Ma, 2019a). Our study tested whether ethnicities and lifestyles influence the DAR parameters based on four design schemes (Figure 1).
Firstly, we found that there were significant differences in diversity scaling (DAR profile) in only 0.40% (4/1,008) of pairwise comparisons on ethnic groups (i.e., Bai vs. Han; Bai vs. Zhuang; Bai-Rural vs. Han-Rural; and Bai-Rural vs. Zhuang-Rural). The 0.40% pairwise comparisons with significant differences all occurred when diversity order q = 0, but there was no significant difference when q > 0. In contrast, lifestyles had no significant effect on DAR profiles at all diversity orders. In general, ethnicity and lifestyle have no effect on DAR profiles. These results also suggested that the structure of the dominant species in the Chinese gut microbiome might be stable and not affected by ethnicities or lifestyles, possibly because these species (i.e., Eubacterium rectale, Faecalibacterium prausnitzii, and Veillonella atypical) are related to important functions (i.e., metabolism) (Sokol et al., 2008;Berni Canani et al., 2012;Han et al., 2020). Moreover, there were significant differences only at the species level and none at higher taxonomy levels. These findings indicated that compared with the species level, higher taxonomy levels of Chinese gut microbiome were more stable without the influences of ethnicities and lifestyles. The PDO profiles estimated the pairwise diversity overlaps between two communities. In the comparison of the PDO and DAR profiles, it is not hard to see that the two profiles show reciprocal patterns (Figures 4-7). This is because the PDO profile is a precise function of the DAR profile (Equation 5), and the PDO profile provides a more intuitive and convenient measure of community overlap than the DAR profile.
Secondly, for the MAD profile, only one pairwise comparison (Bai-Rural vs. Han-Rural) differed significantly at the species level when diversity order q = 0, and 60 pairwise comparisons differed significantly when diversity order q > 0. A total of 1,008 pairwise comparisons were used to detect the impact of ethnicities, and 6.1% (61/1,008) of the comparisons showed significant differences. The numbers of pairwise comparisons with significant differences at phylum, family, genus, and species levels were 37, 4, 8, and 10, respectively. Similar to findings in  Frontiers in Microbiology | www.frontiersin.org FIGURE 7 | The MAD profiles of all cohorts from Schemes 2A and 2B at the species level when diversity order q = 0 ("*" indicated a significant difference between pairwise comparisons).

FIGURE 8 | The
LGD profiles of all cohorts from Scheme 1B at the species level when diversity order q = 0 ("*" indicated a significant difference between pairwise comparisons). the DAR profiles, lifestyles had no significant effect on MAD profiles at all diversity orders. We speculated that the main reason for the difference between Bai-Rural and Han-Rural might be that the individuals of the Han cohort came from six countries in four provinces, while the individuals of the Bai cohort came from only two countries in one province. This results in a significant difference of A max between Bai-Rural and Han-rural in the DAR models, which in turn affects the MAD profile. In addition, Li et al. (2015) found that the infection risk of Toxoplasma gondii in the Bai nationality of Dali was much higher than the national average, and they believed that the habit of eating raw pork and/or liver might be a potential risk of infection with T. gondii. We considered that this special dietary habit might also affect gut microbes. Of course, gut microbes are affected by many factors, and the specific reasons may need further investigation.
In addition, for the LGD profile, we noted that 0.79% (8/1,008) pairwise comparisons showed that ethnicities had a significant influence on LGD profiles (Bai vs. Han; Bai vs. Mongol; Bai vs. Zhuang; Bai-Rural vs. Han-Rural; Bai-Rural vs. Tibetan-Rural; Bai-Rural vs. Zhuang-Rural; Han-Urban vs. Tibetan-Urban; and Tibetan-Urban vs. Zhuang-Urban). The 0.78% (1/128) pairwise comparisons showed that lifestyles had a significant impact on LGD profile (Tibetan-Rural vs. Tibetan-Urban). Same as DAR profiles, there were significant differences on LGD profiles at the species level when diversity order q = 0, and there is no significant difference when q > 0 or at other taxonomy levels.
Across the three profiles, lifestyle affected only 0.78% of pairwise comparisons in the LGD profiles and not the other two profiles, while ethnicity affected 0.40, 6.1, and 0.79% pairwise comparisons in DAR, MAD, and LGD, respectively. These findings suggested that the influence of lifestyle on DAR parameters was negligible, and the influence of ethnicity was only reflected in few pairwise comparisons. Ethnicity seems to have stronger effects on DAR parameters than lifestyle, which is also consistent with the research of Zhang et al. (2015) that subjects mainly clustered by their ethnicity/geography, not lifestyle. These might be the result of China's rapid urbanization during the last few decades, which has brought rural and urban lifestyles closer together. The influence of ethnicity on these profiles may be attributed to its genetic effect. Existing studies have also shown that racial/ethnic background strongly influences metabolism and microbes (Human Microbiome Project Consortium, 2012;Dehingia et al., 2017). A study in the United Kingdom involving 416 pairs of twins also suggested that host genes determine gut microbiome and obesity phenotype (Goodrich et al., 2014). In addition, the effect on DAR parameters was mainly concentrated when diversity order q > 0, and we deduced that this is due to the small sample size of this study or other confounding factors, such as geography. Therefore, the future studies may try to collect more samples of the Chinese gut microbiome in more areas to verify and expand our conclusions.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://www.mg-rast.org/, mgp1538.

AUTHOR CONTRIBUTIONS
ZM and LD designed the study and revised the manuscript. WX and DG performed the data analysis and wrote the draft. HC and YQ performed the data curation. All authors approved the submission.