Contrasting Diversity and Composition of Human Colostrum Microbiota in a Maternal Cohort With Different Ethnic Origins but Shared Physical Geography (Island Scale)

Colostrum represents an important source for the transfer of important commensal bacteria from mother to newborn and has a strong impact on the newborn’s health after birth. However, the composition of the colostrum microbiome is highly heterogeneous due to geographic factors and ethnicity (maternal, cultural, and subsistence factors). By analyzing the colostrum 16S rRNA gene full-length sequencing dataset in 97 healthy mothers (60 from Han, 37 from Li) from the Hainan island of China, we showed that the ethnic differences of the colostrum microbiome in a maternal cohort with different ethnic origins shared physical geography. Results indicated that the richness of microbial community in colostrum of Han women was higher than that of Li women, but there was no significant difference in Shannon index and invsimpson index between the two groups. Visualization analysis based on the distance showed an obvious ethnicity-associated structural segregation of colostrum microbiota. The relative abundance of Firmicutes was higher in the microbiota of the Han group than in Li’s, while Proteobacteria was on the contrary. At the genus level, the most dominant members of the Han and Li ethnic groups were Acinetobacter and Cupriavidus, two common environmental bacteria, respectively, although skin-derived Staphylococcus and Streptococcus were still subdominant taxa. Cupriavidus lacunae was the most dominant species in the Li group, accounting for 26.10% of the total bacterial community, but only 3.43% for the Han group with the most dominant Staphylococcus petrasii (25.54%), indicating that human colostrum microbiome was more susceptible to local living environmental factors. Hence, the ethnic origin of individuals may be an important factor to consider in human milk microbiome research and its potential clinical significance during the perinatal period in ethnic-diverse societies, even within a small geographic scale.


INTRODUCTION
Colostrum is the first milk sucked by a baby after birth, and it is usually produced within 4-5 days after the mother has given birth (Fernández and Rodríguez, 2020;Stinson et al., 2021). Human milk (HM) contains the considerable beneficial nutrients and biologically active factors (Andreas et al., 2015;Williams et al., 2017), and it is universally considered the optimal source of nutrition for almost all healthy infants (Hunt et al., 2011;Lloyd-Price et al., 2016). In the meantime, increasing evidence shows that HM contains a diverse range of microbes (Oikonomou et al., 2020;Zimmermann and Curtis, 2020), which has important health implications for both mothers (mammary gland health) and infants (protection from diarrheal and respiratory diseases) (Lyons et al., 2020). Studies have shown that the relative abundance of potentially beneficial microbiota Lactobacillus and Bifidobacterium in exclusively breastfed infants is significantly higher than that in mixed-fed and formula-fed infants (Fehr et al., 2020;Lyons et al., 2020). Therefore, breastfeeding is one of the most optimum feeding regimes for newborn infants.
The maternal gut is thought to be the most important source of bacteria that are detectable in HM (via an entero-mammary pathway). However, it is incredible that more than 1,300 species and 3,500 operational taxonomic units of bacteria have been reported to be present in HM, even implying that the bacterial diversity in breast milk appears to be higher than in infant or maternal feces (Zimmermann and Curtis, 2020). Obviously, not all the bacteria detected in breast milk are considered inherent inhabitants of the mammary gland, and instead, a fairly large number of them come from environmental exposure (skin microbiota of the mother and the oral cavity of the infant), leading to significant differences between ethnic groups and/or even inter-individuals (Pannaraj et al., 2017). According to existing data, colostrum is characterized by higher diversity and more significant disparity in microbiome composition, compared with mature milk.
Depending on the source of bacteria, multiple factors could contribute to shaping the milk microbiota (Andreas et al., 2015;Zimmermann and Curtis, 2020). On the one hand, the microbial composition and diversity of breast milk may be influenced by maternal characteristics, including ethnicity (Deschasaux et al., 2018;Xu et al., 2020;Shafiee et al., 2022), pregnancy age, body mass index (BMI) (Cabrera-Rubio et al., 2012), mode of delivery (Cabrera-Rubio et al., 2012), parity, and intake of intrapartum antibiotics or probiotics (de Andrade et al., 2021). Meanwhile, several studies have reported differences in the microbiota composition of HM in different geographic locations, just as they do in the human skin metagenome (Gupta et al., 2017). Geography is an ensemble of multiple factors responsible for geography-based alterations in microbiota, including environmental (temperature, humidity, and altitude), population genetic, and cultural factors. In terms of microbiome studies, host surface-associated microbiomes could respond strongly to variations in environmental factors (Woodhams et al., 2020). Therefore, it is reasonable to speculate that bioclimatic factors would shape the composition of the human breast milk microbiome by exerting a force over skin microorganisms.
Modern molecular techniques, especially next-generation sequencing (NGS), are a more sensitive and less biased analytical method than the culture-based method and have been adopted for characterization of the composition and diversity of the human microbiome by using the 16S rRNA gene (Bardanzellu et al., 2017). To date, most of the studies utilized a shorter variation region of the 16S rRNA gene to profile human breast milk microbiota, such as the 16s rRNA gene V4 or V4-V5 region (Kumar et al., 2016;Ojo-Okunola et al., 2018) and the V1-V3 region (Williams et al., 2017). Due to the drawback of the short reading length, the composition of breast milk microbes cannot be exactly documented (Jost et al., 2012;Walker et al., 2015). By contrast, 16S rRNA gene full-length amplicon sequencing could achieve more accurate representation by providing species-level microbiome data (Lopez Leyva et al., 2021).
Human Milk presents an interplay between a mother and her infant from an evolutionary perspective. The various components of colostrum have a great impact on the newborn's health after birth. Among them, the human milk oligosaccharides (HMOs) are thought to play a role in preventing pathogenic bacterial adhesion and orchestrating the development of the microbiota (Cheema et al., 2022;Sprenger et al., 2022). Particularly, bacteria in colostrum can stimulate the anti-inflammatory response by stimulating the production of specific cytokines, gradually promoting the maturation of the newborn's immune system, although most of them might not be residents of infant gut microbiota. However, the high heterogeneity is characteristic of the composition of the colostrum microbiome depending on geographical and ethnic variations (maternal, cultural, and subsistence factors) (Gupta et al., 2017). So, parsing the appreciable disparity of colostrum microbiome between subpopulations in the same locality helps understand the clinical significance of breast milk microbiota in the perinatal period.
In the present study, we analyzed the NGS datasets of bacterial 16S rRNA full-length gene of colostrum samples in a maternal cohort containing two different ethnic groups, which included 97 healthy mothers (60 from Han and 37 from Li) from Hainan Island, the southernmost province in China. Historically, Li ethnic group is an indigenous people who live mostly in rural areas; Most of the Han Chinese are immigrants and live in cities or towns. We aimed to gain insight into the colostrum microbiome patterns of different sub-populations with shared physical climate in the narrow region (Island scale) and to assess how ethnicity (maternal, cultural factors, and subsistence) influences microbiota in the breast milk of healthy mothers.

Sample Collection
In this study, a total of 97 mothers (18-41 years old, with an average age of 28 years) after childbirth were recruited. The above volunteers all lived in Hainan for a long time. Among them, 37 mothers are of Li nationality and 60 mothers of Han nationality. Demographic data about the volunteer mothers' BMI, delivery mode, and the use of antibiotics and probiotics during pregnancy were summarized in Table 1. When collecting samples, they have informed and signed an informed consent form for themselves and their family members. In addition, this study has also been approved by the ethics committee of Shihezi University.
Colostrum samples were collected into sterile tubes by manual expression using sterile gloves after nipples and areolas were cleaned with a swab soaked in sterile water or saline (Rodriguez-Cruz et al., 2020); the first 1-2 mL of milk was discarded to avoid contamination from the environment as much as possible (Douglas et al., 2020). Then, 5-15 mL of milk was collected and was immediately frozen and stored at −80 • C until DNA extraction.

DNA Extraction
FastPure Bacteria DNA Isolation Mini Kit (Vazyme, Nanjing, China) was used for the extraction of breast milk DNA with slight modification and combined with the glass bead beating method Lyons et al., 2021). About 1 mL of breast milk was centrifuged at 12,000 rpm (∼13,400 × g) for 10 min at 4 • C, and the fat was removed with a sterile cotton swab and the supernatant was discarded (Ojo-Okunola et al., 2020). Add lysozyme (100 mg/mL) to the centrifuge tube and bath at 37 • C for 30 min to achieve the purpose of enzymatic hydrolysis; then add 0.25 g zirconium beads (0.1 mm) and use a cell tissue disruptor to physically break the cell wall. After the fragmentation is completed, add 250 µL of Buffer GB, shake and mix, and incubate at 70 • C for 10 min; add 4 µL RNase A to the digestion solution and heat at 65 • C for 10 min to remove the RNA and obtain pure DNA as much as possible; add Proteinase K (20 mg/mL) to the sample and incubate at 58 • C for 30 min to make it fully active (Hunt et al., 2011); then follow the steps in the instructions for column purification. Each DNA pellet was resuspended in 50-100 µL of Elution Buffer. The DNA was quantified using a NanoDrop ND-2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, United States), and the remaining DNA was stored in a refrigerator at −20 • C until the next step.

Raw Sequence Analysis
The original data files are converted into FASTQ format files and saved by using CCS v 4.0.0 [Generate Highly Accurate Single-Molecule Consensus Reads (HiFi Reads) software]. Use Perl 1 script to divide the barcode sequence at both ends of the sequence, remove the barcode, and then transpose the reverse complementary sequence to the forward direction according to the primer sequence. Raw sequences were processed by using a pipeline combining USEARCH 11.0 Linux 64-bit and QIIME2. High-quality reads, as selected using the default values in USEARCH, were binned into amplicon sequence variants (ASVs) according to the denoising (error-correcting) Illumina amplicon reads using Unoise3, through an open-reference strategy. Taxonomic identification of ASVs for the sequences was assigned using the Naive Bayes classifier of the Ribosomal Database Project (RDP) against the Greengenes database and generated the feature table for subsequent analysis.

Diversity Analysis and Significant Difference Analysis Between Ethnic Groups
Alpha diversity indices were calculated in QIIME2 from rarefied samples using the Chao1 and ACE indexes for richness, and the Shannon and invsimpson indexes for diversity, and statistics and the difference check box plot were performed using the personalbio genescloud platform. 2 Beta diversity was calculated using Bray-Curtis distance, and principal coordinates analysis (PCoA) was performed. VENN analyses were also conducted using the R package Statistical analyses between different groups and were analyzed using ANOVA (Liu et al., 2021). Cytoscape_v3.8.2 was used to draw the network diagram. Mann-Whitney U-test was used for diversity and taxonomic comparisons between groups at different levels (phylum, genus, and species). Based on the standardized matrix and grouping information, STAMP 2.1.3 was used to analyze different genus and species. Linear discriminant analysis (LDA) effect size (LEfSe) analysis was performed at http://www.ehbio.com/Cloud_ Platform/front (Liu et al., 2021). IBM SPSS statistical 26 was used to calculate the p-value. The p < 0.05 was considered statistically significant.

Description of the Study Population
The socio-demographic characteristics of Li (n = 37) and Han's mothers (n = 60) are summarized in Table 1. The mean maternal 2 https://www.genescloud.cn/login age in both groups was 28 years old. The average BMI was 27.1 and 25.5 for women in the Li and Han ethnic groups, respectively. Table 2 showed the significant analysis results of the interaction between ethnic and other mother-related factors. BMI was significantly correlated with ethnic factors and the same as the mode of delivery (p < 0.05). There were extremely significant differences between the Li and Han groups in the mother's lifestyle and the use of intrapartum antibiotics (Chi-square, both p = 0.001). There were no significant differences in maternal age and parity between Li and Han ethnic groups (all p > 0.05).
At the same time, we conducted a multivariate analysis of variance. The results of the analysis are shown in Supplementary  Table 1. We did not discuss the effects between the two groups in detail when considering the distribution of the sample and the effects of the statistical test.

DNA Sequencing and Filtering
A total of 859,638 16S rRNA raw reads were generated from the 97 samples. After filtering low-quality sequences, 859,345 filtered sequences were retained with lengths measuring 1,200-1,500 bp. All 859,345 high-quality sequences were clustered into ASVs at 100% sequence similarity using Quantitative Insights Into Microbial Ecology (QIIME2) software. An average number of high-quality sequences in each sample reached 17,543 and a total of 789 ASVs were discovered in 97 samples.

Diversities of Bacterial Communities of Colostrum Across Two Ethnic Groups
According to the number of ASVs, the alpha diversity of microflora in colostrum was calculated under different maternalrelated factor grouping (Supplementary Table 2). Results showed that ethnicity, age, and lifestyle had a significant effect on the microbial richness of colostrum, among which the ethnic factor had the most significant impact on the richness. Then, when we grouped by ethnicity, the chao1 of colostrum microbiome in Han's mothers (151.54 ± 60.86) was significantly (p = 0.001) higher than the index of colostrum in Li's mothers (106.75 ± 40.06) (Figure 1A), and the same trend was observed in the index of ace (Li 108.99 ± 39.46 vs. Han 152.65 ± 59.29) ( Figure 1B). However, we did not find a significant difference in the Shannon and invsimpson index (Figures 1C,D). (Li 2.76 ± 0.69 vs. Han 2.82 ± 0.81) (Li 10.15 ± 4.97 vs. Han 11.15 ± 9.14). To investigate the taxonomic structural distinctiveness of colostrum microbial communities between Li and Han Ethnic Groups, beta-diversity analysis was conducted based on the Bray-Curtis distance ( Figure 1E). There is an obvious structural separation between the two ethnic groups.

Impact of Ethnic and Delivery Mode on Microbial Diversity in Colostrum
When we grouped all the data by the mode of delivery (vaginal vs. cesarean), the diversity of the cesarean group was higher than that of the vaginal group (Supplementary Table 2   Li_cesarean). The chao1 and ace indexes showed that there were significant differences between the colostrum microbial α-diversity of the mothers of the Li_vaginal group and the Han_vaginal group (Figures 2A,B). When we consider the richness of microbiota, the results showed that the richness indexes (chao1 and ace) of the Han_cesarean group were the highest, followed by the Han_vaginal group. Different from the Han ethnic group, the diversity index of the Li_vaginal group was higher than that of the Li_cesarean group. The Shannon and invsimpson index showed that there was no significant difference among the four groups (Figures 2C,D). β-diversity analysis showed that the two groups of the same ethnic group among the four groups were gathered, that is, the Han_vaginal group and the Han_cesarean group were clustered together and the Li_vaginal group and the Li-cesarean group were gathered ( Figure 2E). In conclusion, PCA showed that there was an obvious division in the diversity of colostrum microbes between Li and Han ethnic groups regardless of the grouping of delivery modes.

Colostrum Microbiota Compositional Analysis
We profiled the bacterial composition of colostrum microbiota between different groups at the level of phylum, genus, and species. Bacterial taxa with a relative abundance of less than 1% in individual samples were categorized into the "others" group.

Core Microbiota Analysis of Colostrum Based on Amplicon Sequence Variant Level
The core composition of colostrum bacteria and specific ASVs or species with a relative abundance of more than 0.1% were screened from 97 samples. The overlapping areas of the circles in the Venn diagram represent the core microbiome, which is generally defined as a shared group of microbiome members from similar habitats. As shown in the Venn diagram, a total of 32 ASVs were observed as common ASVs, 96 ASVs in the Han ethnic group, and 60 ASVs in the Li ethnic group (Figure 3A). The 32 ASVs assigned to the eight-core genera were Staphylococcus, Acinetobacter, Streptococcus, Cutibacterium, Cupriavidus, Enterobacter, Rhodopseudomonas, and Paucibacter. Among the 16 ASVs belonging to Staphylococcus, 14 ASVs can be classified to the Staphylococcus species level: Staphylococcus pseudoxylosus (ASV_59 and ASV_83) and Staphylococcus petrasii (ASV_3, ASV_4, ASV_18, ASV_34, ASV_62, ASV_65, ASV_66, ASV_74, ASV_75, ASV_82, ASV_124, and ASV_282). The four ASVs of Acinetobacter were divided into two species: Acinetobacter courvalinii (ASV_5 and ASV_10) and Acinetobacter oleivorans (ASV_21 and ASV_78). Among the 3 ASVs belonging to Streptococcus, ASV_29 and ASV_72 were classified as Streptococcus himalayensis, while ASV_9 could not be classified to the species level. ASV_47 and ASV_60 both belonged to Cutibacterium acnes. In addition, Cutibacterium modestum (ASV_26) could also be detected in our study. Other ASVs that can be identified include Cupriavidus lacunae (ASV_1), Cupriavidus nantongensis (ASV_80), Enterobacter bugandensis (ASV_15), Rhodopseudomonas boonkerdii (ASV_31), and Paucibacter oligotrophus (ASV_6).

Microbial Signatures in Different Ethnic Group Samples
Linear discriminant analysis effect size (LEfSe) analysis of ASVs, with an average relative abundance of>0.01%, was further conducted to detect microbial signatures in the colostrum of Han and Li ethnic groups. Figure 4A was a histogram of LDA value distribution, showing species with LDA Score greater than 3.0. The significant biomarkers in the entire Han and Li groups were mainly distributed in Proteobacteria and just several significant biomarkers were distributed in Actinobacteria, Firmicutes, and Deinococcus_Thermus. Analysis of the different species in Li people showed that they were all belonging to Proteobacteria. In the Li group, the LDA score was highest in Burkholderiales. The highest LDA score was found in the Acinetobacter of Han ethnicity. The LEfSe cladogram analysis revealed that 39 biomarkers of different classification levels were significantly different among the two groups ( Figure 4B). Notably, we did not find any differences in biomarkers at the species level.
Using the stamp software and the Benjamin FDR method, the extend-bar plot showed the difference between the two groups. Overall, 17 distinct genera and 25 distinct species were identified. Cupriavidus and Enterobacter were the two most significant genera in the Li ethnic group. Staphylococcus    and Actinobacteria were the two most abundant genera in the Han ethnic group (Figure 5A). There were 11 species with higher abundance in the Li ethnic group and 14 species with higher abundance in the Han ethnic group (Figure 5B). Cupriavidus lacunae were the dominant species and the most distinct species in the Li ethnicity. In the meantime, Enterobacter hormaechei was rich in the Li group. The different species in the Han group were Staphylococcus petrasii and Acinetobacter proteolyticus.
In the present study, to obtain more accurate taxonomic results at the species level, the representative sequences of all five ASVs were identified as members of the Bifidobacterium genus. Due to the resolving power of the 16S rRNA gene in the identification of different bacteria species, four out of five ASVs were assigned to the Bifidobacterium species level: Bifidobacterium castoris (ASV_225), Bifidobacterium longum (ASV_461), and Bifidobacterium scaligerum (ASV_307 and ASV_647), respectively. ASV_607 can be only classified into the Bifidobacterium genus. In the Li ethnic group, Bifidobacterium did not detect in the colostrum of the Li ethnic group. While in the Han ethnic group, Bifidobacterium was detected in 35 samples, and the detection rate of Bifidobacterium was 60%. Overall, five samples (HNb15, HNb21 HNb27, HNb30, and HNb41) contained B. castoris, B. longum, and B. scaligerum with mean relative abundance ranging from 0.05 to 0.90%.

DISCUSSION
The breast milk microbiome can have a profound impact on human health by affecting the establishment of the neonatal intestinal flora and the development of the immune system (Fernández and Rodríguez, 2020;Yi and Kim, 2021). Some of the ethnic variations in microbiome structure have been attributed to differences in host genetics and innate/adaptive immunity, while in many other cases, maternal factors (age, BMI, mode of delivery, etc.), cultural features (diet, hygiene, environmental exposure, etc.), and subsistence factors overshadow genetics (Gupta et al., 2017). We focused on the ethnic group, which represents a highly diverse demographic character of the Chinese population (Table 1). A total of 97 Li and Han's mothers, who lived in Hainan of China for a long time, were selected to collect their milk within 2-5 days after delivery and were used to compare the composition and diversity of colostrum microbiota.
Maternal factors, including pre-gestational BMI, age or mode of delivery, and other related factors have been proposed to influence colostrum microbiota composition (Zimmermann and Curtis, 2020). Our results showed that BMI and parity had no significant effect on the alpha-diversity of the colostrum microbial community (Supplementary Table 1). In terms of the delivery mode, our results reported a higher alpha-diversity in the colostrum of women delivered cesarean, which was consistent with the result of the study on the diversity of breast milk microbes in Taiwan and Mainland China . However, the other two studies (84 and 393 participants, respectively) did not confirm this (Kumar et al., 2016;Moossavi et al., 2019). Previous studies also reported higher alpha-diversity and richness in the HM microbiota of women receiving intrapartum antibiotics (Hermansson et al., 2019). Our results suggest that intrapartum antibiotics had no significant influence on the diversity of colostrum microbiota. This could be attributed to the fact that women who have c-sections have a high rate of taking antibiotics during the perinatal period. It is now generally believed that the establishment of the human gut microbiota was influenced by the host's genetics and diet and environmental exposure. The maternal gut is thought to be the most important source of bacteria in HM (via an entero-mammary pathway). So, the mother's diet might influence the HM microbiota diversity by modifying the composition of the maternal gut microbiota (Biagi et al., 2017;Padilha et al., 2019).
Most studies reported consistently Firmicutes and Proteobacteria to be the most predominant phyla in both mature milk and colostrum (Sakwinska et al., 2016;Biagi et al., 2017). However, at the genus and species levels, there are significant differences in the composition of breast milk microbiome reported, with many genera found in less than 10% of studies. In addition, to distinguish stable and permanent microbiome members from the highly complex colostrum microbiota, which includes thousands of different species, we aimed to use the concept of the core microbiome (Lemanceau et al., 2017;Toju et al., 2018). Due to the resolution limits of DNA-based analyses, core microbiota had been predominantly defined using genus-level discrimination of a population. Nevertheless, a core microbiota of seven to nine bacterial genera was often proposed based on sample abundance (intestinal microbes, environmental microbes, and other related fields). In our study, using the 16S rRNA full-length amplicon technique, Staphylococcus, Acinetobacter, Streptococcus, Cutibacterium, Cupriavidus, Enterobacter, Rhodopseudomonas, and Paucibacter (Table 3) were selected as the core microbiota in 97 maternal colostrums from Hainan province according to the relative abundance of microbiota. Among them, Cupriavidus, being the most abundant genera in the Li ethnic group, was often found in soil (Estrada-de Los Santos et al., 2014), with isolates of Cupriavidus lacunae recovered in pond-side soil (Feng et al., 2019). Similarly, Paucibacter was also an environmental bacteria found in aquatic sediment.
Environmental exposure during the perinatal period (skin microbiota of the mother and the oral cavity of the infant) may be the main reason for broad differences in breast milk microbiome. Based on published data, colostrum displayed higher diversity and more significant disparity in microbiome composition than mature milk across geographically different populations, characterized by a higher prevalence of environmental bacteria. Indeed, the oral and skin microbiome are the next most diverse. In the case of the skin microbiome, rural and urban Chinese populations show variation in the abundance of some taxa, such as Trabulsiella and Propionibacterium (Gupta et al., 2017). Generally, host surface-associated microbiomes, such as skin microbiome, might respond strongly to variations in bioclimatic factors, thereby they may shape the composition of the human breast milk microbiome (Woodhams et al., 2020). In our study, Hainan Island of China has a tropical monsoon climate, characterized by hot and humid yearround, abundant rainfall. Interestingly, some thermotolerant environmental bacteria taxa first isolated from a hot spring, such as Thermus amyloliquefaciens, Thermus caldifontis, and Meiothermus luteus (Yu et al., 2015;Habib et al., 2017;Khan et al., 2017), were found in the most of Hainan colostrum samples. Moreover, another peculiarity from our data was the prevalence of other soil environmental bacteria in colostrum samples of Li's mothers, such as the genus Agrobacterium. In fact, previous multiple studies showed that about half of dominant genera in colostrum belonged to environmental bacteria ubiquitous in soil and water, such as Pseudomonas, Rhizobium, Acinetobacter, Alcaligenes, and so on (Drago et al., 2017;Toscano et al., 2017). Consequently, some of the ethnic variations in the colostrum microbiome could be attributed to differences in cultural features/subsistence like diet, hygiene, and labor practice. This result is also consistent with the fact that most mothers the Li ethnic group recruited live mostly in rural areas and are engaged in farming. In other words, for most of the studies on breast milk microbiome, alcohol disinfection is not effective in preventing the detection of skin-associated microorganisms probably derived from exposed environment (soil and vegetation).
Based on the excessive presence of exogenous bacteria in breast milk, the most prevalent genera in breast milk microbiota were generally distinct from the most prevalent genera of the infant gut (Pannaraj et al., 2017;Fehr et al., 2020). However, there is a consensus that the first beneficial bacteria that enters the infant's gut should be from the colostrum. Particularly, the commensal bacteria in colostrum could be selected to serve as seeds for newborns to initially establish a healthy gut microbiome. Several studies have shown that Bifidobacterium and Lactobacillus are highly present in the gut microbiota of infants and have been considered to be transmitted from mother to infants shortly after birth by breastfeeding, which can effectively avert irritable bowel syndrome and contribute to the development and balance of intestinal flora for infants (Yassour et al., 2018). Therefore, these potential probiotic commensal bacteria in colostrum are of particular concern, especially Bifidobacterium, Lactobacillus, and so on.
By reviewing the existing literature, the presence of Bifidobacterium and Lactobacillus was sporadically reported in a few colostrum samples or not at all (Gupta et al., 2017). In our study, about 48.5% of colostrum samples were retrieved using the 16S rRNA ASVs corresponding to family Lactobacillaceae, with about 0.28% mean relative abundance. According to NCBI BLAST homology search of 16S rRNA gene full-length sequencing, ASVs belonging to five new genera revised of the family Lactobacillaceae were retrieved, including Limosilactobacillus, Lactobacillus, Lacticaseibacillus, Levilactobacillus, Lactiplantibacillus, and Leuconostoc (Zheng et al., 2020). Taxa identified at the species level were Limosilactobacillus reuteri, Limosilactobacillus caviae, Lactobacillus colini, Lactobacillus acidophilus, Lacticaseibacillus rhamnosus, and Levilactobacillus bambusae. Surprisingly, Bifidobacterium did not detect in the colostrum of the Li ethnic group, but the detection rate was nearly 60% in the Han group, with mean relative abundance ranging from 0.05 to 0.9%. The Bifidobacterium species identified mainly included B. longum, B. castoris, and B. scaligerum. To date, most studies on breast milk using the NGS of different 16S variable gene regions reported their presence only at the taxonomic level of genus, with significantly different results. For example, in a study based on the V4 variable region of the 16S rRNA gene, the average relative abundances ranged from 0.1 to 1% for Bifidobacterium and from 0.1 to 0.3% for Lactobacillus (Padilha et al., 2019). In another study based on the V3-V4 region of the 16S rRNA gene, investigators reported around 2% average relative abundances of Bifidobacterium and Lactobacillus in the first weeks after delivery (Murphy et al., 2017). Intriguingly, an ASV affiliated with Akkermansia (as a kind of emerging candidate probiotics) was also detected in eight colostrum samples of the Han ethnic group (0.1%), belonging to Akkermansia glycaniphila. This was the first report that Akkermansia was detected in breast milk (Ouwerkerk et al., 2016).
To the best of our knowledge, this is the first study to reveal the composition and diversity of colostrum microbiome in different ethnic groups living in narrow geographical areas on an island scale. We tried to understand the influence of ethnicity on the colostrum microbiome in different sub-populations with shared physical geography by minimizing environmental factors. In fact, it is hard to tease out the relative contributions of geography and ethnicity to the breast milk microbiome, which are intertwined. Our study has limitation concerning the sample size and cohort populations. We will recruit multiple cohorts, including different cohorts of the same ethnic groups with different subsistence and living environments, and different ethnic groups sharing similar subsistence and living environments.

CONCLUSION
In the present study, by analyzing the colostrum 16S rRNA gene full-length sequencing dataset in 97 healthy mothers (60 from Han, 37 from Li) from the Hainan island of China, we show the ethnic differences of the colostrum microbiome in a maternal cohort with shared physical geography. The analysis based on the Bray-Curtis distance showed an obvious ethnicityassociated structural segregation of colostrum microbiota. The human colostrum microbiome is more susceptible to local living environmental factors, although skin-derived Staphylococcus and Streptococcus are still subdominant taxa. Probably, environmental exposure during the perinatal period may be the main reason for broad differences in the colostrum microbiome. Consequently, colostrum displayed higher diversity and more significant disparity in microbiome composition than mature milk, characterized by a higher prevalence of environmental bacteria. In addition, despite the low relative abundance and presence of inter-population differences, the potential probiotic bacteria do exist in colostrum, especially Bifidobacterium and Lactobacillus. Our results suggest that the ethnic origin of individuals may be an important factor to consider in HM microbiome research and its potential clinical significance during the perinatal period in ethnic-diverse societies, despite a small geographic scale. Finally, further research is needed to tease out the relative contributions of geography and ethnicity to the breast milk microbiome.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI -PRJNA845888, SRR19548162 -SRR19548258.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of the First Affiliated Hospital, Shihezi University School of Medicine (KJ2022-080-01). The patients/participants provided their written informed consent to participate in this study.