Comparison of Two Approaches for the Metataxonomic Analysis of the Human Milk Microbiome

Recent work has demonstrated the existence of large inter-individual and inter-population variability in the microbiota of human milk from healthy women living across variable geographical and socio-cultural settings. However, no studies have evaluated the impact that variable sequencing approaches targeting different 16S rRNA variable regions may have on the human milk microbiota profiling results. This hampers our ability to make meaningful comparisons across studies. In this context, the main purpose of the present study was to re-process and re-sequence the microbiome in a large set of human milk samples (n = 412) collected from healthy women living at diverse international sites (Spain, Sweden, Peru, United States, Ethiopia, Gambia, Ghana and Kenya), by targeting a different 16S rRNA variable region and reaching a larger sequencing depth. Despite some differences between the results obtained from both sequencing approaches were notable (especially regarding alpha and beta diversities and Proteobacteria representation), results indicate that both sequencing approaches revealed a relatively consistent microbiota configurations in the studied cohorts. Our data expand upon the milk microbiota results we previously reported from the INSPIRE cohort and provide, for the first time across globally diverse populations, evidence of the impact that different DNA processing and sequencing approaches have on the microbiota profiles obtained for human milk samples. Overall, our results corroborate some similarities regarding the microbial communities previously reported for the INSPIRE cohort, but some differences were also detected. Understanding the impact of different sequencing approaches on human milk microbiota profiles is essential to enable meaningful comparisons across studies. Clinical Trial Registration www.clinicaltrials.gov, identifier NCT02670278.

Recent work has demonstrated the existence of large inter-individual and inter-population variability in the microbiota of human milk from healthy women living across variable geographical and socio-cultural settings. However, no studies have evaluated the impact that variable sequencing approaches targeting different 16S rRNA variable regions may have on the human milk microbiota profiling results. This hampers our ability to make meaningful comparisons across studies. In this context, the main purpose of the present study was to re-process and re-sequence the microbiome in a large set of human milk samples (n = 412) collected from healthy women living at diverse international sites (Spain, Sweden, Peru, United States, Ethiopia, Gambia, Ghana and Kenya), by targeting a different 16S rRNA variable region and reaching a larger sequencing depth. Despite some differences between the results obtained from both sequencing approaches were notable (especially regarding alpha and beta diversities and Proteobacteria representation), results indicate that both sequencing approaches revealed a relatively consistent microbiota configurations in the studied cohorts. Our data expand upon the milk microbiota results we previously reported from the INSPIRE cohort and provide, for the first time across globally diverse populations, evidence of the impact that different DNA processing and sequencing approaches have on the microbiota profiles obtained for human milk samples. Overall, our results corroborate some similarities regarding the microbial communities previously reported for the INSPIRE cohort, but some differences were also detected. Understanding the impact of different sequencing approaches on

INTRODUCTION
Despite the fact that human milk has long been considered sterile, research conducted over the last decade has provided convincing evidence that this biological fluid harbors a rich microbial community under all physiological circumstances (Martıń et al., 2006;Martıń et al., 2007;Solis et al., 2010;Fernández et al., 2013). Milk's microbial community contains an important arsenal of bacterial and fungal species of substantial interest as they likely play crucial roles in the maintenance of maternal and infant health (as reviewed in Boix-Amoroś et al., 2019;Moossavi et al., 2020;Stinson et al., 2020). For instance, they likely seed the breastfed infant's gastrointestinal (GI) tract, initiating the assembly of a mature healthy human GI microbiota, and orchestrating the innate immunity maturation and programming that will condition infant health outcomes in the short and long term (as reviewed in Milani et al., 2017). The milk microbiota in conjunction with other bioactive factors present in human milk, such as oligosaccharides and immune factors, have been deemed responsible for many of the long-term, health promoting effects associated with exclusive breastfeeding in early life, which correlate with reduced incidence of chronic inflammatory and metabolic conditions in infancy and adulthood (Huërou-Luron et al., 2010;Roger et al., 2010;Musilova et al., 2017).
Indeed, a growing literature supporting the crucial roles exerted by the human milk microbiota in maternal and infant health, in conjunction with advances in high-throughput sequencing (HTS) technologies, have led to a rapidly growing interest in the study of this microbial community and its variation, mainly in relation to maternal and infant factors. In this regard evidence suggests the existence of a strong interindividual variation in the composition of the human milk microbiota across different populations, in relation to variable delivery factors (Cabrera-Rubio et al., 2016;Hoashi et al., 2016;Asbury et al., 2020), lactation stage (Khodayar-Pardo et al., 2014), maternal conditions (either chronic pre-pregnancy situations or those developed during pregnancy) (LeMay-Nedjelski et al., 2020;Volery et al., 2020;Wan et al., 2020), lifestyle habits Padilha et al., 2020), psycho-social and economic conditions (Ojo-Okunola et al., 2019), and infant health outcomes (Demmelmair et al., 2020) as summarized previously in Ruiz et al. (2019) and Lackey et al. (2019). More recently, a combination of HTS and culturomic approaches has further supported the existence of a diversity of viable bacterial cells in healthy human milk wider than previously anticipated, and has offered novel opportunities to conduct mechanistic studies on the metabolic potential of this microbial community (Schwab et al., 2019;Togo et al., 2019;Treven et al., 2019).
While most investigators have undertaken research on this field with the aim to identify imbalances or dysbiosis states under specific maternal/infant conditions or in relation to health outcomes, very little effort has been aimed at delineating the structure of a healthy human milk microbiota, even though defining the normal baseline of a given microbiota is essential for a comprehensive understanding into the variation associated with different health outcomes (Bäckhed et al., 2012). Moreover, early studies on the structure of the healthy human milk microbiota conducted on either US (Hunt et al., 2011), Finnish (Cabrera-Rubio et al., 2012, Mexican-American (Davéet al., 2016), Chinese (Li et al., 2017), and Canadian populations , suggested the existence of significant variation across populations. For instance while most studies concur regarding the identification of a few "core" and dominant bacterial genera including mainly Staphylococcus and Streptococcus species, other representative microbial organisms belonging to the lactic acid bacteria group (Lactobacillus, Lactococcus, Leuconostoc, Weisella), or typical skin inhabitants such as Propionibacterium (Cutibacterium) or Corynebacterium are not universally detected across all populations analyzed despite appearing as predominant in some. For instance, Lactococcus, Leuconostoc, and Weisella appear to be dominant taxa in Finnish women (Cabrera-Rubio et al., 2012), while in Chinese and Taiwanese women the dominant populations included Pseudomonadaceae and Lactobacillaceae (Li et al., 2017). In addition, some research such as that reported by Moosssavi and colleagues, have identified different milk biome "types" even within the same Canadian cohort, supporting the existence of intrapopulation variability . However, most of these studies have been conducted on a limited number of samples or population groups, and have employed non-standardized procedures which could have introduced important biases in the microbiota profiling (Panek et al., 2018). These facts have impeded our ability to make meaningful comparisons across studies and prevented our ability to understand genuine biological variation in the microbiota inherent in milk produced by healthy women across populations.
It is also worth remarking that, whereas some efforts to achieve standardization of sample processing and analysis in the context of the human GI microbiota have been reported as recently reviewed (Wu et al., 2019), such initiatives have not yet been tackled in the context of the human milk microbiota which, due to its intrinsic physiological and microbiological characteristics , might be strongly affected by variable collection and analytical processing.
We recently attempted to fill these knowledge gaps by reporting a large, cross-sectional study on the healthy human milk microbiome across a cohort of over 400 healthy lactating women from selected geographically diverse populations living across three different continents, by using standardized sample collection and processing approaches . As expected, we found substantial variation in the milk microbiome among cohorts. In the present work, we provide even more insight into the impact that variable sample processing and data analysis pipelines might have on the study and interpretation of the milk microbiota landscape, through extraction and sequencing the same set of milk samples using an amplicon approach targeting a different 16S rRNA variable region with greater sequencing depth, and performing a comparative analysis with the previously reported dataset.

Design, Setting, and Sampling
The design of the cross-sectional, epidemiological, multi-cohort study has been described in detail Ruiz et al., 2017;Lackey et al., 2019;Lane et al., 2019 Figure 1). Milk was collected as described previously . Briefly, following skin cleaning twice with single use and using gloved hands, milk was manually expressed and collected into disposable sterile containers, with the exception of milk samples from USC, USW, SW and PE cohorts, which were pump-expressed by using an electric pump and sterile disposable containers. Milk from each woman was aliquoted into two samples for metataxonomic analysis; one was shipped on dry ice to the University of Idaho (USA), and the second was shipped on dry ice to the Complutense University of Madrid (Spain); in both locations the samples were immediately frozen at −20°C. Due to unreliable access to electricity supply and/or freezers, milk samples collected from the ETR cohort were immediately mixed at a 1:1 ratio with Milk Preservation Solution (Norgen Biotek, Ontario) and frozen within 6 days as it has previously been demonstrated capable to preserve bacterial DNA integrity for at least two weeks (Lackey et al., 2017). Whereas milk microbiome data garnered from methods utilized at the University of Idaho have been published , here we report results from a companion analysis conducted in our laboratory in Spain. We also compare our results to those previously reported using different analytical and bioinformatic approaches.

DNA Extraction From Milk
For the DNA isolation from all sample cohorts, with the exception of ETR, approximately 1 mL of each sample was used for DNA extraction following the method described by Castro et al. (2019). Briefly, milk samples were thawed on ice and centrifuged (13,000 rpm, 10 min at 4°C), the lipid and supernatant layers were removed and the cell pellet was resuspended in 500 µl TE50 buffer. This solution was further processed for DNA isolation by performing an enzymatic lysis adding 100 µl of an enzymatic mix containing 5 mg/ml of lysozyme (Sigma-Aldrich), 1.5 KU/ml mutanolysin (Sigma-Aldrich) and 120 U/ml lysostaphyn and incubating the samples for 1 hour at 37°C. Subsequently samples were subjected to physical lysis by bead beating with FastPrep Fp120 (Thermo Scientific, Waltham, MA) and glass bead matrix tubes (3 cycles × 60 s, speed 6) in step 4. Finally, DNA was purified by using a modified version of the Qiamp DNA mini kit (Qiagen) columns whereby 100 µl of 3 M sodium acetate pH 5.5 were added to the lysate prior to its addition to the column. Extracted DNA was eluted in 22 mL of nuclease-free water and stored at -20°C until further analysis. Purity and concentration of each extracted DNA was initially estimated using a NanoDrop 1000 spectrophotometer (NanoDrop Technologies, Inc., Rockland, USA). Negative controls (blanks) were added during the extraction to account for possible contaminants introduced during sample manipulation and DNA isolation. In the particular case of ETR samples, only 250 µl of milk samples were used for DNA isolation since they were the only samples treated with a preservation solution and, under this circumstance, the manufacturer of the preservation solution and companion milk DNA isolation kit recommends not to use a volume larger than 250 µl (Milk DNA Preservation and Isolation Kit, NorgenBiotek, Throlod, Canada).

Sequencing of Microbial DNA and Bioinformatic Analysis
The V3-V4 hypervariable region of the 16S rRNA was amplified by PCR and sequenced as previously described (Klindworth et al., 2013). Briefly, universal primers S-D-Bact-0341-b-S-17 (CCTACGGGNGGCWGCAG) and S-D-Bact-129 0785-a-A-21 (GACTACHVGGGTATCTAATCC) were used to amplify the V3-V4 hypervariable region of the 16S rRNA and then, barcodes were appended to 3' and 5' terminal ends of the PCR amplicons in a second PCR-reaction in order to allow separating forward and reverse sequences. The pooled, purified and barcoded DNA amplicons were sequenced using the Illumina MiSeq 2 x 300 bp paired-end protocol (Illumina Inc., San Diego, CA, USA) following the manufacturer's recommendations at the facilities of Parque Cientıfco de Madrid (Tres Cantos, Spain) (Klindworth et al., 2013).
Raw sequence data were demultiplexed and quality filtered using Illumina MiSeq Reporter analysis software. Microbiome bioinformatics were performed with QIIME 2 2019.1 (Bolyen et al., 2019). Denoising was performed with DADA2 (Callahan et al., 2016). The forward reads were truncated at position 277 by trimming the last 15 nucleotides, while the reverse ones were truncated at the 250 nucleotides by trimming the last 15 nucleotides, in order to discard positions for which nucleotide median quality were Q20 or below. Samples with less than 1000 sequences (n= 10) were excluded from further analysis.
Taxonomy was assigned to ASVs using the q2-featureclassifier (Bokulich et al., 2018) classify-sklearn naïve Bayes taxonomy classifier against the SILVA 138 reference database . Subsequent bioinformatic analysis was conducted using R version 3.5.1 (R Core Team, 2013; https:// www.R-project.org). The decontam package version 1.2.1 (Davis et al., 2018) was used in order to identify, visualize and remove contaminating DNA with two negative control samples.
A set of "core" genera were characterized for each sample type both in the overall dataset and within each cohort. To be included in the core taxa, a genus must have been represented with a relative abundance higher than 0.1% in, at least, 90% of the samples from one or more cohorts. The 4 most abundant phyla from all the milk samples were selected as most abundant phyla, the rest were included in the "minor_phyla" group and the sequences whose phyla were unknown were grouped in the "unclassified_phyla" group. The 18 most abundant genera from all the milk samples were selected as most abundant genera, the rest were included in the "minor_genera" group and the sequences whose genera were unknown were grouped in the "unclassified_genera" group.

Comparison of the Results With Those Obtained With the Same Set of Samples but Using a Different Metataxonomic Approach
The results of a previous different metataxonomic approach with the same set of milk samples has been published . Methods used for DNA extraction, amplification, assessment of DNA quality, sequencing and statistical analyses are detailed in that publication. Briefly, a dual-barcoded, twostep 30-cycle polymerase chain reaction (PCR) was conducted to amplify the V1-V3 hypervariable region of the 16S rRNA bacterial gene. For the first step, a 7-fold degenerate forward primer targeting position 27 and a reverse primer targeting position 534 (positions numbered according to the Escherichia coli rRNA gene) were used as described previously (Carrothers et al., 2015). For the second step, a unique barcoded primer pair with Illumina adaptors attached was added to each sample. Sequences were obtained using an Illumina MiSeq (San Diego, CA) v3 paired-end 300-bp protocol for 600 cycles at the University of Idaho IBEST Genomics Resources Core. However, due to the quality of the ends of the reverse reads, few reads were able to be merged and thus only the forward reads were used in the analyses.
A comparison of data obtained from both metataxonomic studies (V1-V3 reported by Lackey and colleagues (2019) versus V3-V4 studies conducted in this work) was made at genus and phylum levels using the SILVA 132 reference database since it was the one used in the first study. A schematic representation highlighting the main differences in the methodological and analytical strategies followed in the study reported by Lackey and colleagues, as compared to the sequencing approach conducted herein with the same dataset of samples is provided in Figure 1.

Statistical Analysis
Quantitative data were expressed as the median and interquartile range (IQR). Differences between groups were assessed using Kruskal-Wallis tests and pairwise Wilcoxon rank sum tests to calculate comparisons between groups. Bonferroni corrections were made to control for multiple comparisons. A table of amplicon sequence variants (ASVs) counts per sample was generated, and bacterial taxa abundances were normalized to the total number of sequences in each sample. Alpha diversity was studied with the Shannon and Simpson diversity indexes with the R vegan package (Version:2.5.6) (Oksanen et al., 2012). Principal coordinates analysis (PCoA) was used to evaluate beta diversity and to plot patterns of bacterial community diversity through a distance matrix containing a dissimilarity value for each pairwise sample comparison. Quantitative (relative abundance) and qualitative (presence/absence) analyses were performed with the Bray-Curtis index and binary Jaccard index, respectively. Analysis of variance of the distance matrices were performed with the "nonparametric MANOVA test" Adonis with 999 permutations as implemented in the R vegan package to reveal statistical significance. For multilevel pairwise Adonis comparisons, the method used for p-value correction was Holm-Bonferroni method with "pairwiseAdonis" R package (version 0.4) (Martıńez, 2020). Heatmap hierarchical clustering was performed by using the Euclidean distance and complete hclust_method. The median values for the Shannon diversity index oscillated between 3.48 (ETR) and 2.44 (USC) while those for the Simpson diversity index ranged between 0.91 (ETR) and 0.93 (GBU) (Supplementary Table 1). Overall, there was a significant effect of cohort on both diversity indices (p = 0.001 and p = 0.001, respectively). A comparison at the continent level showed significant differences on both diversity indices when the African cohorts were compared (p<0.002 and p<0.0002, respectively) since samples collected in ETR and GBU exhibited a higher diversity than those obtained in KE. In contrast, no significant differences were found when the different European and US cohorts were compared (p = 0.099 and p = 0.33, respectively) (Supplementary Table 1).

Metataxonomic
The overall analysis of the beta diversity, calculated according to the relative abundance of ASVs (Bray-Curtis distance) and the presence/absence of ASVs sequences (binary Jaccard distance matrix), indicated that the profiles of bacterial genera of the different cohorts apparently clustered into different groups (p < 0.001 and p < 0.001, respectively; PERMANOVA) and the ETR cohort was observed to be more clearly separated from the other locations with both distance metrics (Figures 2A, B). Besides, when beta diversity of the samples was evaluated according to the continent where the different samples were collected from, the ordination based on relative abundance of ASVs (Bray-Curtis distance) revealed differences between the African and European samples (p < 0.001) and, also, between the African and American samples (p < 0.001). In contrast, differences between the European and American cohorts did not reach statistical significance (p = 0.059). In relation to the analysis of the beta diversity according to the presence/absence of ASVs (binary Jaccard distance matrix), all the comparisons (African vs. European samples, African vs. American samples, and European vs. American samples) revealed the existence of significant differences (p < 0.001). Among African countries all the cohorts showed significant differences for both the relative abundance and the presence/absence of ASVs (p < 0.05). In the European and American countries, only the pairwise comparisons between the SW and USC (for both the relative abundance and the presence/absence); and SW and USW (for relative abundance) showed no statistical differences. The posthoc Holmes-Bonferroni correction of the multilevel pairwise Adonis comparisons increased the p-values of most of the previously significant pairwise comparisons from <0.05 to 0.05, with the exception of the comparison between GBU and GBR, which p-value changed from 0.04195804 to 0.17 and, therefore, it lost the statistical significance (Supplementary Table 2). It must be highlighted that the Holmes-Bonferroni correction involved a high number of comparisons (n=55), rendering it as a very exigent statistical test.
Comparison of the mean distances of samples to the centroids using PCoA plots based either on the Bray-Curtis dissimilarity index or on the Jaccard's coefficient of each cohort, also revealed the existence of significant differences (p<0.001) ( Figures 2C, D). Notably, GN was the cohort displaying a higher beta-dispersion, in terms of relative abundance while ETU displayed the highest beta-dispersion in terms of presence/absence of ASVs, suggesting the existence of a greater intrapopulation heterogeneity of microbiota profiles among the samples from these two cohorts, as compared to other populations analyzed in this study; whereas ETR, SW and USC displayed the smallest distances to centroids in terms of relative abundance data, revealing these cohorts present the most homogeneous microbiota profiles.
A total of 46 phyla were identified in the milk samples, with Firmicutes, Proteobacteria, Actinobacteriota (formerly Actinobacteria) and Patescibacteria being the most abundant. There was a significant effect of cohort on Firmicutes, Proteobacteria, Actinobacteriota, Patescibacteria and the group of "unclassified_phyla" (Table 1). Again, the ETR cohort showed greater differences with respect to the rest of the cohorts since it exhibited the lowest relative abundance of Firmicutes and the highest concerning Proteobacteria, Actinobacteriota and Patescibacteria (Table 1). An initial assessment of potentially dominant patterns in the bacteriological profile of the milk samples is shown in the heatmap plot presented in Figure 3. Overall, there was no clear separation between the milk samples from women of the different cohorts; however, a clear separation could be observed on the basis of the relative abundance of sequences belonging to the genera Staphylococcus and Streptococcus (Figure 3), which seems to be independent of the continent or the cohort. In addition, the clustering analysis suggested that ETR samples separation could be driven by the genera Rhizobium, Achromobacter, Corynebacterium and Stenotrophomonas ( Figure 3). A boxplot of the 9 most abundant genera (including the group of the unclassified ones) found in each location is shown in Supplementary Figure 2.
There was an effect of continent on the phyla Proteobacteria and Patescibacteria. More specifically, the abundance of A B FIGURE 1 | Schematic representation highlighting the main differences in the methodological and analytical strategies followed in the study reported by Lackey and colleagues, as compared to the sequencing approach conducted herein with the same dataset of samples. Proteobacteria was higher in African cohorts than the other cohorts (p < 0.001) while that of Patescibacteria was higher in the European samples than in those from the two other continents (p < 0.001). At the genus level, American and European cohorts seemed to be more similar to each other than to the African ones ( Table 2). African cohorts were characterized by a lower Streptococcus abundance (p < 0.001) and a higher Lactobacillus abundance (p < 0.001) (Tables 2 and 3). The relative abundances of the genera Rhizobium and Corynebacterium in ETR samples was statistically higher than in any other cohort (p < 0.001) while the contrary was observed for the genus Rhodanobacter (p < 0.001) (Tables 1 and 3). When European and American cohorts were compared, the relative abundance of the phylum Actinobacteriota was found to be significantly higher in USW cohort than in PE, SP and USC cohorts (p < 0.001), while the relative abundance of the phylum Patescibacteria was higher in SP cohort than in the PE, USW and USC cohorts (p < 0.001) ( Table 4). At the genus level, the relative abundance of the genus Streptococcus was significantly higher in the samples from the PE cohort than in those from the USC, SW and USW cohorts (p < 0.001) ( Overall, the comparison of the results obtained by Lackey et al. (2019) targeting the V1-V3 region and those obtained in this work, targeting the V3-V4 region, showed notable differences among the most abundant phyla and genera. However, some of these differences were the result of different nomenclatures used by SILVA 132 (used in Lackey et al., 2019) and SILVA 138 (used in the present study) database versions, as is the case of Actinobacteriota and Actinobacteria; or the genera Propionibacterium and Cutibacterium, in which the pipeline considers them as different microorganisms.
To avoid this bias and facilitate comparison among both studies, we re-analyzed our sequences on the V3-V4 regions with the same database used by Lackey and colleagues (SILVA 132). For this comparison reads that could not be classified to the genus level were included. Finally, a total of 26,461,984 high quality reads were used to perform this comparison (Supplementary Table 3).   Using the same reference database (SILVA 132) and posttaxonomic bioinformatic analysis pipeline yielded some specific differences in the alpha diversity results. In general, the V3-V4 16S rRNA region study showed a higher alpha-diversity [Shannon and Simpson indices 2.03 (1.46-2.49) and 0.77 (0.59-0.85) respectively] than V1-V3 study [Shannon and Simpson indices 1.74 (1.21-2.28) and 0.71 (0.53-0.81) respectively] (p < 0.001) (Figures 4A, B). In addition, PCoA plots based on the Bray-Curtis dissimilarity index ( Figure 4C) and on the Jaccard's coefficient ( Figure 4D) revealed differences in the beta diversity results.
In contrast, both sequencing approaches led to highly concordant results in relation to some individual phyla and genera, allowing similar comparisons across cohorts. For instance, in terms of individual phyla, in both studies Firmicutes, Proteobacteria, and Actinobacteriota collectively represented >90% of those identified (Table 5). Besides, both 16S region studies showed that the relative abundance of Firmicutes was lower in ETR than in all cohorts (p < 0.001), that Proteobacteria was relatively more abundant in milk collected in ETR than in all other cohorts (p < 0.001), and that Actinobacteriota was more abundant in ETR, ETU, GBR, and GBU than in GN, USC, SP, and PE (p ≤ 0.001). The higher relative abundance of Bacteroidetes in KE than GN was also detected in both studies (p < 0.001). There was also an effect of cohort on the "other"  and "minor_phyla" (this work) category; which relative abundance in GBU was higher than in ETR, ETU, GN, PE, SP, SW, and USC (p ≤ 0.001) according to both strategies.
At the individual genus analysis, both approaches found a higher relative abundance of Rhizobium, Achromobacter, and Corynebacterium in milk collected in ETR than all other cohorts. Besides, statistical differences were found in almost all pairwise comparisons (p < 0.05) with ETR V1-V3 region strategy, except for Corynebacterium with ETU. Both V1-V3 and V3-V4 approaches showed that African cohorts had the lowest abundance of Streptococcus (p < 0.05) while Peruvian milk bacterial communities had the highest relative abundance of this genus as compared to all the other cohorts (p < 0.05) in a pairwise comparison.
In our work, Lactobacillus had a higher relative abundance among some African cohorts (particularly in ETR, GBR, GBU and GN) than all other cohorts but these differences were not statistically significant. Interestingly, in agreement with these observations, Lackey et al. (2019) also analyzed the fecal microbiome of the breastfed babies and found a higher relative abundance of Lactobacillus in feces of ETR, GBR and GBU than in samples from the PE, SP, SW and US cohorts.
Finally, in relation to alpha and beta diversity analysis, both approaches found that African samples, and particularly those from ETR, displayed the highest alpha diversity as assessed by the Shannon and Simpson indices (Figures 5A,  B). Comparison of the mean distances of samples to the centroids using PCoA plots based either on the Bray-Curtis dissimilarity index or on the Jaccard's coefficient of each cohort, revealed no differences between both approaches (p=0.87 and p=0.37, respectively).    On the other hand, some differences were observed among the results obtained by Lackey et al. (2019) and those obtained in this work ( Table 5). The relative abundance of the phylum Proteobacteria detected in our work was higher than in Lackey et al. (2019) [13.84 (6.24-26.73) vs 9.18 (3.47-25.32), p = 0.002]. On the contrary, the phylum Actinobacteriota was higher in Lackey et al. (2019) than in our work [13.23 (3.95-25.09) vs 9.93 (5.39-16.65), p = 0.003] ( Table 5), whereas the medians of the Shannon or Simpson diversity indices were higher in this work (p < 0.001; Kruskal-Wallis tests with Bonferroni correction) (Figures 4 and 5). It is also worth noting that, despite finding comparable microbial community structures with both sequencing approaches, significant differences were found in the detection of the relative abundance of some bacterial genera which can be of great importance for the maternalinfant health, such as Staphylococcus, Streptococcus, Bifidobacterium and Lactobacillus (Table 5).

DISCUSSION
Findings from this study expand upon the milk microbiota profiling previously reported from the INSPIRE cohort , which is the most comprehensive study to date on the topic. In addition, our findings provide, for the first time and across globally diverse populations, evidence of the impact of different DNA processing and sequencing approaches on the microbiota profiles obtained for human milk samples. While comparison of sequencing approaches and DNA isolation procedures has been long studied in the context of its impact on the GI microbiome (Hill et al., 2016;Rintala et al., 2017;Panek et al., 2018), little effort has been devoted to achieving standardization of optimal procedures to tackle the study of the human milk microbiome. However, such standardization is required to enable meaningful comparisons of the datasets generated across different studies of these important microbial communities, especially considering particular intrinsic characteristics which may impose additional challenges for microbiome studies. First, human milk has a relatively low density of bacterial cells (between 10 3 to 10 5 cfu mL -1 ), can have a relatively high concentration of immune cells (and thus human DNA), and a high fat concentration that might entrap some bacterial cells. In addition, there exists an important risk of cross contamination of the samples with skin samples during collection or due to exposure to breast pumps; all of these factors can potentially introduce biases in microbiota studies (Boix-Amoroś et al., 2016;Moossavi and Azad, 2019). In this context, our hypothesis is that DNA isolation, library generation, sequencing approach and data analysis can significantly impact the human milk microbiota profiles obtained through HTS surveys. For this purpose, the same set of human milk samples previously studied by means of sequencing the 16S rRNA V1-V3 regions reported by Lackey and colleagues (2019) was herein reextracted and re-sequenced using different methods. For this study, the milk samples were extracted using the QIAamp DNA Stool Mini Kit with additional mechanical bead beating and using an amplicon sequencing approach that targeted a different 16S rRNA variable region (V3-V4) and that achieved higher sequencing depth. For the purposes of comparison, both datasets were downstream processed through identical post-taxonomy bioinformatics pipelines and reads generated in the present study (V3-V4 regions) were taxonomically re-assigned against the same reference database used in Lackey et al. (2019).
Overall, the dataset presented herein confirms the existence of large inter-individual and inter-populations variations in the healthy human milk microbiota across diverse geographical and ethnical populations as previously reported in several studies (Kumar et al., 2016;Vaidya et al., 2017). In agreement to the dataset previously reported by Lackey and colleagues on the same set of samples, this variation was evident between geographically distant but also, to some extent, between neighboring populations. For instance, both sequencing approaches found that the microbial communities present in African cohorts were dissimilar to those found in the European and American cohorts. The African cohorts also displayed higher diversity of microbial taxa, with ETR being the population harboring the most distinctive microbiota fingerprint. It must be highlighted that due to limitations in electricity access at ETR, this set of samples was preserved at room temperature in a preservation solution immediately following collection and thus this data must be interpreted with caution. While the utilization of a different preservation and DNA isolation procedure might have introduced bias in the microbiota profiles detected for this particular cohort which could be partially responsible for the high dissimilarity exhibited by the ETR cohort as compared to the rest of the dataset, at this moment we cannot rule out to what extent these dissimilarities are due to genuinely biological differences. Nonetheless, it is worth remarking that other African cohorts also displayed significantly different betadiversities as compared to American and European cohorts, particularly in terms of presence/absence of taxa, and that these observations were consistent among both 16S rRNA sequencing datasets. Besides, ETR together with GBU and GN were the cohorts displaying the highest alpha-diversity indices across all the studied cohorts. Overall, these observations agree with the overall reported loss of diversity in human microbiomes from populations with westernized and industrialized lifestyles (Segata, 2015), further suggesting that common genetic, environmental and/or lifestyle factors might have influenced a differential composition in the human milk microbiota in the cohorts under study. In line with "The Hygiene Hypothesis", these observations likely reflect a broad exposure to a wider array of microorganisms in some African cohorts as opposed to other European or American cohorts where westernized practices such as antibiotic utilization, water sanitation or reduced contact with animals, among others, may have reduced diversity in the human-associated microbial communities and the concomitant increase in non-communicable chronic diseases (Sonnenburg et al., 2016;Vandegrift et al., 2017). Further, even neighboring populations with different lifestyles, such as those represented by rural and urban communities of Ethiopia and rural and urban communities of Gambia, still exhibited significant differences in the diversity and/or structure of their respective milk microbiotas. These results support those reported by other authors when comparing the milk microbiota in urban and rural populations in India and China (Li et al., 2017;Vaidya et al., 2017). These observations reinforce the notion that microbial exposure, environmental factors and lifestyle habits strongly impact the assemblage of the human milk microbiome  and, due to the influence of these microbial communities on seeding the infant GI microbiome; such factors might have decisive implications in infant health outcomes (Browne et al., 2019). It is also worth noting that some African cohorts including ETR, GN and GBU presented taxa that were exclusively associated to their respective cohorts. For instance, GN samples were the only ones in which Akkermansia and Butyricicoccus were detected, both groups representing bacteria with attributed health promoting effects in models of inflammatory bowel disease (Eeckhaut et al., 2013;Ferrer-Picón et al., 2020), and likely representing candidates to develop prospective next-generation probiotics (O'Toole et al., 2017). The representation in the milk microbiota of other taxa traditionally including commensal microbes with attributed health promoting effects, such as Lactobacillus and Bifidobacterium, was also higher among African cohorts; and this result was independent on the sequencing approach employed. These observations strengthen recent research trends that defend the necessity to capture and preserve the microbial diversity from globally diverse human populations, as they might include taxa that could help mitigate non-communicable and chronic human diseases highly prevalent in industrialized and urbanized western populations (Dominguez-Bello et al., 2018;Sonnenburg and Sonnenburg, 2019).
The current dataset also supports the existence of a few universal core taxa consisting of dominant bacterial groups such as Staphylococcus and Streptococcus; as well as some other taxa that, despite being present in over 70% of the analyzed samples, exhibited minor relative abundances such as Cutibacterium, Corynebacterium, Rhodanobacter and Bifidobacterium. Some population-specific core taxa were also identified for some cohorts, in accordance with the results reported by Lackey et al., although population-specific core taxa appeared different depending on the sequencing approach. Moreover, the ETR core included Rhizobium, Bifidobacterium and Acinetobacter, the last two genera not present in the ETR core microbiota based on V1-V3 results. The prevalence of these bacterial groups likely drives the beta-diversity differentiation of this particular cohort from the other populations analyzed.
In conclusion, the V3-V4 approach enabled us to capture larger alpha-diversities, although the dissimilarity structures across cohorts, in terms of relative abundance of individual taxa, were relatively comparable in both datasets. Other reports have demonstrated that variable 16S rRNA regions can differently impact the taxa detected and thus the overall microbial community structures depicted, the largest differences being detected at lower taxonomic ranks (Bukin et al., 2019). Prior studies have described that V1-V3 regions may capture higher taxonomic diversities than V3-V4 when assessing oral and fecal samples (Zheng et al., 2015). This contrasts with the results observed in the present study with human milk samples, where V3-V4 revealed the highest alphadiversity indices, which also yielded a higher representation of low abundance groups although at expenses of yielding a higher representation of unclassified minority phyla. However, at this point we cannot conclude whether this is due to a better resolution of this 16S rRNA region in this particular ecological niche, characterized by a relative lower complexity than the human gut, or to differences in the DNA extraction method, the sequencing depth and/or the criteria to exclude or trim the sequences after the quality analysis. For instance, differences in the levels of Proteobacteria or Actinobacteriota levels detected through both approaches, could be the result of differences in the selected primers' efficiency to amplify those groups, whereas an overall increased alpha diversity in the V3-V4 approach could either be the result of more efficient amplification of unrelated bacteria with selected universal primers and/or of the higher sequencing depth achieved. Although none of the short 16S rRNA hipervariable regions can provide the taxonomic resolution achieved by full length 16S rRNA sequencing, they can still provide meaningful information on the composition and structure of human associated microbial populations, specifically when reaching sufficient sequencing depths.
Remarkably, despite all the methodological differences between the two approaches, both were able to delineate a similar structure for the human milk microbiome of the different cohorts from which samples were collected, in terms of overall most abundant taxa and the differences these presented among cohorts. These results agree with previous results where patterns of predictions compared among different pipeline analysis were comparable provided that the sequencing depth and choice of NGS remain similar (Rajan et al., 2019), and suggest that the main inter-population and intra-population differences previously reported for the milk microbiome of the INSPIRE cohort are genuine as they could be corroborated through an independent different sequencing approach.
In addition, it must be highlighted that new versions of databases may introduce an important bias when trying to compare results within the same laboratory or among different laboratories. As an example, some phyla, such as Actinobacteriota or Patescibacteria, that appear as relevant for human milk microbiome using Silva 138 did not appear as such by using the previous version Silva 132. Discordances at the genus level may also arise, for example in relation to Rhodanobacter versus Dyella or to Corynebacterium_1 versus Corynebacterium. Thus, to conduct meaningful comparisons across datasets researchers should consider reanalyzing raw reads through common pipelines and reference databases.

DATA AVAILABILITY STATEMENT
The datasets generated in this study can be found in the SRA repository (https://www.ncbi.nlm.nih.gov) under Bioproject Accesion Number PRJNA693239.

ETHICS STATEMENT
All study procedures were approved by the overarching Washington State University Institutional Review Board (#13264) and at each study location, consent was obtained from each participating woman. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
LR, CA, CG-C, EJ, KL, EK-M, EK, and SM conducted the research. MKM, CM, JF, DS, SEM, AP, DG, GO, RP, LB, MAM, JW, and JR designed the research. LR, CA, and JR wrote the manuscript. LR, CA, and JR had primary responsibility for the final content of the manuscript. LR and CA analyzed the data. All authors contributed to the article and approved the submitted version.