Characterization of Supragingival Plaque and Oral Swab Microbiomes in Children With Severe Early Childhood Caries

The human oral cavity harbors one of the most diverse microbial communities with different oral microenvironments allowing the colonization of unique microbial species. This study aimed to determine which of two commonly used sampling sites (dental plaque vs. oral swab) would provide a better prediction model for caries-free vs. severe early childhood caries (S-ECC) using next generation sequencing and machine learning (ML). In this cross-sectional study, a total of 80 children (40 S-ECC and 40 caries-free) < 72 months of age were recruited. Supragingival plaque and oral swab samples were used for the amplicon sequencing of the V4-16S rRNA and ITS1 rRNA genes. The results showed significant differences in alpha and beta diversity between dental plaque and oral swab bacterial and fungal microbiomes. Differential abundance analyses showed that, among others, the cariogenic species Streptococcus mutans was enriched in the dental plaque, compared to oral swabs, of children with S-ECC. The fungal species Candida dubliniensis and C. tropicalis were more abundant in the oral swab samples of children with S-ECC compared to caries-free controls. They were also among the top 20 most important features for the classification of S-ECC vs. caries-free in oral swabs and for the classification of dental plaque vs. oral swab in the S-ECC group. ML approaches revealed the possibility of classifying samples according to both caries status and sampling sites. The tested site of sample collection did not change the predictability of the disease. However, the species considered to be important for the classification of disease in each sampling site were slightly different. Being able to determine the origin of the samples could be very useful during the design of oral microbiome studies. This study provides important insights into the differences between the dental plaque and oral swab bacteriome and mycobiome of children with S-ECC and those caries-free.


INTRODUCTION
The oral cavity harbors one of the most diverse microbial communities within the human body (Stearns et al., 2011). A variety of oral niches (non-shedding tooth surfaces, tongue, cheek, hard and soft palates, and gingival sulcus) provide different levels of oxygen, nutrients, salivary flow, and masticatory forces (Hall et al., 2017). Each of these different microenvironments allow the colonization of unique and adapted microbial communities. Therefore, it is expected that the microbial composition of each oral site differs significantly from each other.
Usually, the oral microbiota exists in a homeostatic balance with the host and contributes to the development of the immune system. However, once this balance is disturbed, some microbial species can overgrow and diseases associated with site-specific microbes such as periodontitis (subgingival microbiota), dental caries (supragingival microbiota), and oral candidiasis (oral mucosal and salivary microbiota) may occur (Lamont et al., 2018;Vila et al., 2020). Therefore, it is important to select the most appropriate site of sampling for the study and/or diagnosis of each oral infectious diseases. Recent studies have shown that the SARS-CoV-2 virus, which causes the coronavirus disease 19 , can be detected in saliva (Fernandes et al., 2020). It has been reported that salivary glands can be important reservoir of the virus (Xu et al., 2020b). Consequently, the presence of high SARS-CoV-2 viral load in saliva could make it a suitable diagnostic tool for COVID-19. Therefore, this also validates the importance of exploring different sampling options for diagnosis of infectious diseases (Fernandes et al., 2020;Sapkota et al., 2020;Xu et al., 2020a).
Since the nineteenth century, it is known that the oral microbes play a crucial role in the development of dental caries (Russell, 2009). However, the establishment of new technologies, such as next generation sequencing (NGS) and machine learning algorithms, has provided a unique opportunity to an enhanced understanding of the role of oral microbes (bacteria, fungi, and viruses) on caries development and progression.
As dental caries continues to be one of the most prevalent chronic diseases among children worldwide, there is a clear need for a deeper understanding of how oral microbial communities and their interactions could impact children's oral health. The terms early childhood caries (ECC) and severe ECC (S-ECC) were first introduced in the 1990s (Ismail and Sohn, 1999). ECC is described as any caries experience in the primary dentition of children younger than 6 years of age. S-ECC is the severe form of ECC and has an important effect on children's development and well-being (Pierce et al., 2019;Folayan et al., 2020).
We hypothesized that the microbial (bacterial and fungal) profile of dental plaque significantly differs from that of oral swabs, and because the dental biofilm is in closer contact with the tooth surface, it would provide a better prediction model for caries onset. To test this hypothesis, first we characterized the Abbreviations: ASVs, Amplicon Sequence Variants; FDR, False Discovery Rate; HOMD, Human Oral Microbiome database; ITS1, Internal Transcribed Spacer 1; PCoA, Principal Coordinates Analysis; PERMANOVA, Permutational Multivariate Analysis of Variance using distance matrices; S-ECC, Severe Early Childhood Caries. differences between the dental plaque and oral swab bacterial and fungal microbiota in children with S-ECC and those cariesfree. Second, we analyzed which of those commonly used sampling sites (dental plaque and oral swab) would provide a better model for the classification of S-ECC vs. caries-free, using machine learning approaches. Third, we further evaluated whether the observed differences between the microbial profiles of the samples could be used for the differentiation between the sampling sites (dental plaque vs. oral swab) to assist researchers during the design of oral microbiome studies. This is one of the first studies to explore the oral microbiome profiles to classify oral sites.

Study Population
In this cross-sectional study, eighty children < 72 months of age were recruited between December 2017 and August 2018. Among those, 40 had S-ECC, according to the American Academy of Pediatric Dentistry definition (AAPD, 2020), and 40 were caries-free. Children with S-ECC were recruited at the Misericordia Health Centre (MHC), Winnipeg-MB, Canada, on the day of their full-mouth rehabilitative dental surgery under general anesthesia. Caries-free children were recruited from the community. Caries-free children had a dmft (cumulative score of the number of decayed, missing, or filled primary teeth) index equal to zero and had no incipient lesions. To confirm the caries-free status, a dental examination was performed by R.J.S. at the Children's Hospital Research Institute of Manitoba by means of visual/tactile examination using artificial light and no radiographs. Inclusion criteria: children less than 72 months of age who were caries-free (dmft = 0) or have been diagnosed with S-ECC (based on the American Academy of Pediatric Dentistry definition). Exclusion criteria: children older than 72 months of age, use of antibiotics, and children who did not satisfy the case definition of S-ECC.
Based on the power analysis published by La Rosa et al. (2012) at 5% significance level, with 40 samples per group and the average number of reads of 50,000 per sample our study would achieve a power > 97%. This study protocol was approved by the University of Manitoba's Health Research Ethics Board (HREB # HS20961-H2017:250) and by the MHC, Winnipeg, MB, Canada. Written informed consent was provided by the parents or legal caregivers (de Jesus et al., 2020). This work follows the STROBE guidelines checklist for cross-sectional studies (Supplementary Table).

Sample Collection
Due to the young age of the participants and their inability to spit saliva, oral swab samples were collected with a sterile polyester-tipped applicator (Fisher Scientific) by swabbing the buccal mucosa and anterior floor of the mouth under the tongue. The oral swabs were stored in RNAprotect Reagent (Qiagen, Cat. # 74324, Hilden, Germany) at −80 • C until further analysis. Supragingival plaque samples were collected from all available tooth surfaces with a sterile interdental brush (Agnello et al., 2017;de Jesus et al., 2020). They were dislodged into the RNAprotect Reagent (Qiagen, Cat. # 76506, Hilden, Germany) and stored at −80 • C until further analysis. For simplicity, supragingival plaque samples are referred to as dental plaque.

DNA Extraction and 16S and ITS1 rRNA Amplicon Sequencing
Total DNA was extracted from 160 samples (80 oral swabs and 80 dental plaque samples) using QIAamp DNA mini kit (Qiagen, Hilden, Germany) following manufacturer's protocol. An additional enzymatic digestion step with lysozyme treatment (20 µg/ml lysozyme in a buffer containing 20 mM Tris HCl, pH 8; 1.2% Triton X 100; 2 mM EDTA) was performed before DNA extraction from dental plaque samples.

Bioinformatics and Statistical Analysis
The sequences were received as demultiplexed, barcode removed, paired ends fastq files. The quality control analysis was performed with FastqC v0.11.8 (Andrews, 2010). The sequences were then imported and analyzed with QIIME2 2018.11 (Bolyen et al., 2019). The 16S pair-end sequences were quality trimmed, filtered to remove ambiguous and chimeric sequences, and merged using DADA2 implemented in QIIME2, resulting in the amplicon sequence variant (ASV) table (Callahan et al., 2016). The ITS1 pair-end sequences were trimmed using the Q2-ITSxpress QIIME2 plugin prior to the DADA2 step, with default parameters (Rivers et al., 2018). The taxonomic assignment of ASVs was performed using the Human Oral Microbiome Database (HOMD, version 15.1) for bacteria and the UNITE database (version 8.2; QIIME developer release) for fungi at 99% sequence similarity (Dewhirst et al., 2010;Agnello et al., 2017;Abarenkov et al., 2020b;de Jesus et al., 2020). Due to the presence of many fungal ASVs that were assigned only at kingdom level, further fungal ASV curation was performed with the R package LULU (Frøslev et al., 2017). The remaining ASVs assigned as Fungi at kingdom level only, or with unidentified phylum were manually assessed using the program BLASTN in NCBI (Zhang et al., 2000). The ASVs with non-fungal BLASTN results were discarded and the remaining were repeatedly assigned to new taxonomic assignments using different UNITE databases threshold levels (Abarenkov et al., 2020a,b,c) and taxonomy classification methods (q2-feature-classifier classify-sklearn and classify-consensus-blast) in QIIME2, as described previously (Martinsen et al., 2021). The data was imported into R using the R package "qiime2R" (version 0.99.13) and additional filtering was performed using "phyloseq" (version 1.30.0) to remove singletons and samples with less than 1,000 reads (McMurdie and Holmes, 2013;Bisanz, 2018;Depner et al., 2020). The ASV counts were then normalized using the cumulative-sum scaling (CSS) approach from the R package "metagenomeSeq" version 1.28.2 (Paulson et al., 2013).
The alpha diversity analyses (within-samples) were performed using the Chao1 and Shannon indices to estimate richness and diversity, respectively, using raw ASV count data from QIIME2 in "phyloseq". Pairwise comparisons of alpha diversity were done by the paired Wilcoxon signed rank test. Beta diversity measures were calculated on CSS normalized ASV data. This analysis was performed to compare the structure of the bacterial and fungal microbial communities between samples, using the permutational analysis of variance (PERMANOVA) test with 999 permutations in the R package "vegan" (adonis function; version 2.5.6) (Anderson, 2001). It was visualized using principle coordinate analysis (PCoA) with Bray-Curtis dissimilarity index in the R package "ggplot2" (version 3.3.3) (Beals, 1984;Wickham, 2016).
Differentially abundant species were identified using the DESeq2 negative binomial Wald test, controlling the false discovery rate (FDR) for multiple comparison, within "phyloseq" (Love et al., 2014). For this, the raw ASV counts were collapsed to the species level. For comparisons between dental plaque vs. oral swab, a paired DESeq2 analysis was performed. FDR adjusted P < 0.05 was considered significant.

Machine Learning Analysis
Machine learning methods were used to train multivariable classification models to identify the caries status, S-ECC and caries-free. To generate the machine learning models, taxonomic features were used in the form of ASV tables collapsed to specieslevel. For the classification, we used the workflow provided in "Siamcat, " which provides a machine learning toolbox for metagenome analysis through state-of-the-art machine learning methods (Wirbel et al., 2019(Wirbel et al., , 2021. The data were separately processed for fungi and bacteria and sample-wise relative abundance for the microbiome quantitative profiles was used as input data to maintain the uniformity. To process the data in "Siamcat, " features with a prevalence of less than five percent across samples were removed and the remaining features were normalized by centered log-ratio (CLR) transformation. The data was then prepared for cross-validation with eightfold and 5 repeats. After this, the models were trained using Lasso, Ridge, Elastic Net (Enet), and RandomForest classification methods in Siamcat, which uses the "mlr" package for machine learning based classification (Bischl et al., 2016). The models' performance for cross validation was evaluated using the area under the receiver operating characteristic (AUROC) value.
To show the importance of the model features, the model feature weights were converted to relative weights and up to the top 20 features were selected, based on their median values, to generate a heatmap using the R package "ggplot2" (Wickham, 2016). For the machine-learning based classification of plaque and swab samples, a pairwise sample analysis was performed using a boosting conditional logistic regression from R package "clogitboost, " which takes the paired nature of the dental plaque and oral swab samples into account (Shi and Yin, 2015). The model was fitted using component-wise smoothing spline. The caries-free and S-ECC samples were divided into training and test sets using three-quarters of the data for training and the remaining for test in a way that paired samples for plaque and swab should be together in either training or test sets. For the features (species) selection in training dataset, we obtained the p-values from the differential abundance analysis described above. The top features selected by the p-values were used to train the classification models. Since we have only 30 independent samples in the training set, we considered only top 5, 10, 15, 20, and 25 features to build the model, respectively. The models' performance was evaluated using AUROC. Each of the trained models were then tested on the test set. The Frontiers in Microbiology | www.frontiersin.org FIGURE 2 | Bacterial diversity of dental plaque and oral swab samples from children with S-ECC and those caries-free. (A) For alpha diversity (within-sample) the Shannon and Chao1 diversity and richness measures were calculated according to sample type in both caries-free and S-ECC groups. A significant difference between oral swab and dental plaque alpha diversity and richness was observed in both caries-free and S-ECC groups (P < 0.05, paired Wilcoxon test). (B) For beta (between-sample) diversity, Bray-Curtis distances were calculated, followed by principal coordinates analysis (PCoA). The plot shows the separation of samples according to sample type (pseudo-F = 40.4, R 2 = 0.2, P = 0.001, PERMANOVA accounting for the children's caries-status). The ellipses represent a 95% confidence level. S-ECC, severe early childhood caries.
training-test strategy/process was repeated for 30 iterations and the classification performance between caries-free and S-ECC samples were compared by the average of AUROC values from the 30 repeats.

RESULTS
Eighty children who fit the study criteria were recruited and 160 samples (80 dental plaque and 80 oral swabs) were collected. The Table 1 shows some characteristics of the study participants. Additional information about the participants have been recently published (de Jesus et al., 2020).
Bacterial alpha diversity (within samples) analysis showed a significant difference between oral swab and dental plaque alpha diversity (Shannon index, S-ECC: P = 0.0034; Cariesfree: P = 0.015) and richness (Chao1 index, S-ECC: P < 0.001; Caries-free: P = 0.025) in both caries-free and S-ECC groups (Figure 2A). Bacterial beta (between-sample) diversity analysis showed a clear separation of samples according to sampling site, oral swab and dental plaque (pseudo-F = 42.71, R 2 = 0.2, P = 0.001, PERMANOVA accounting for the children's S-ECC status and the paired samples; Figure 2B). A significant difference in bacterial community was also observed between the S-ECC and caries-free groups (pseudo-F = 2.85, R 2 = 0.014, P = 0.001). Figure 3A shows the relative abundance of the top 20 bacterial species across the subgroups. The differential abundance analysis revealed numerous species that were overabundant in dental plaque or oral swab samples within the S-ECC and caries-free groups (Figures 3B,C, adjusted P < 0.05, DESeq2). Interestingly, many species were significantly more abundant in dental plaque or oral swab in both S-ECC and caries-free groups. For instance, Capnocytophaga sp. oral taxon 326 (S-ECC: −10.83 log2fold change; Caries-free: −4.72 log2fold), Kingella sp. oral taxon 012 (S-ECC: −9.52 log2fold change; Caries-free: Granulicatella elegans (S-ECC: 3.26 log2fold change; Caries-free: 3.93 log 2fold change), and Haemophilus parainfluenzae (S-ECC: 1.26 log2fold change; Caries-free: 1.39 log2fold change) were more abundant in oral swabs in both caries-free and S-ECC groups. In children with S-ECC, the well-known cariogenic bacterium Streptococcus mutans was more abundant in dental plaque samples (−3.45 log2fold change, adjusted P < 0.05).
children's oral swabs (adjusted P < 0.05, DESeq2). The differences between the dental plaque microbial composition between children with S-ECC and those caries-free have been previously published (

Fungal Community Analysis
A total of 8,000,067 filtered ITS1 rRNA reads were obtained, with an average number of reads per sample of 50,000.42 (160 samples). The 622 ASVs where assigned to 63 genera and 59 species. After filtering, ten samples had low reads (<1,000) and were removed from the fungal analysis as well as their respective oral swab or dental plaque pairs, resulting in a total sample size of 140. Differential abundance analysis showed that among the top 20 most abundant fungal taxa, within the S-ECC group, Stereum rugosum (−29.03 log2fold change), Malassezia restricta (−16.44 log2fold change) and others were more abundant in dental plaque. Within the oral swab samples, Candida dubliniensis (12.92 log2fold change), Candida tropicalis (24.99 log2fold change), and Malassezia restricta (24.14 log2fold change) were more abundant in children with S-ECC compared to caries-free controls (Table 2, adjusted P < 0.05, DESeq2). The results of the differential abundance analysis according to caries status in dental plaque (caries-free vs. S-ECC) have been published previously (de Jesus et al., 2020). The fungal alpha diversity analysis showed a significant difference in Chao 1 diversity (P < 0.001, paired Wilcoxon test) in the caries-free group (Figure 4A). Fungal community (β-diversity) analysis also showed a significant difference between dental plaque and oral swab microbiomes (pseudo-F = 5.58, R 2 = 0.04, P = 0.001, PERMANOVA; Figure 4B). The fungal communities of samples from caries-free children and those with S-ECC also showed a significant difference (pseudo-F = 4.17, R 2 = 0.03, P = 0.001).

Machine Learning Analysis
We first evaluated the model performance using Lasso, Ridge, Elastic Net (Enet), and RandomForest methods to classify S-ECC vs. caries-free. Overall, the Ridge approach with default parameters provided the best classification accuracy while the other three methods provided similar AUROC values (data not shown). Hence, Ridge was the model of choice for further classification. For alpha diversity (within-sample) the Shannon and Chao1 diversity and richness measures were calculated according to sample type in both caries-free and S-ECC groups. A significant difference in richness was observed between the sampling sites in the caries-free group (P < 0.001, Chao1 index, paired Wilcoxon test). (B) For beta (between-sample) diversity, Bray-Curtis distances were calculated, followed by principal coordinates analysis (PCoA, pseudo-F = 11.58, R 2 = 0.04, P = 0.001, PERMANOVA). The ellipses represent a 95% confidence level. S-ECC, severe early childhood caries.
To evaluate which sampling site, dental plaque or oral swabs, would provide a better classification model for S-ECC vs. cariesfree, the samples were grouped according to sampling site. The AUROC values obtained by the Ridge model with bacterial species were 0.92 and 0.91 for dental plaque and oral swab samples, respectively ( Figure 5A). While, for fungal taxa, the AUROC values were 0.85 and 0.835, respectively ( Figure 5B). The median relative feature weights used to predict the corresponding models and their ranks are shown in Figures 5C,D. Among the most important bacterial features for the S-ECC vs. caries-free classification model are Gemella morbilorum, Lautropia mirabilis, Actinomyces oral taxon 525 and Capnocytophaga oral taxon 336. While for fungi, Mycosphaerella tassiana, Betamyces americae meridionalis, Wickerhamiella sp. and Cyberlindnera jadinii were among the most important discriminatory fungal species.
To evaluate if it is possible to differentiate dental plaque samples from oral swab samples based on their bacterial and fungal profiles, both in caries-free and S-ECC groups, the samples were grouped according to caries status. The AUROC values were compared for the models built based on the top 5, 10, 15, 20, and 25 species selected through differential abundance analysis in the training set. For bacteria, in caries-free samples, the maximum AUROC value was 0.80 using 10 species while for S-ECC, the maximum AUROC value was 0.73 with 25 species. For fungi, the maximum AUROC was obtained by 10 species in caries-free samples and 5 in S-ECC samples ( Table 3). The performance of paired analysis for different number of species is summarized in Table 3. It was notable that in site-based classification, in bacteria low number of species provide better classification in caries-free samples. While, for S-ECC samples high number of species are required for improving prediction. For fungi the classification was better with low number of species in both caries-free and S-ECC groups, which might be due to the low alpha diversity in the fungal samples.

DISCUSSION
In this study, first we confirmed that the bacterial and fungal community composition of dental plaque differed significantly from that obtained from oral swabs. Second, we investigated, using machine learning approaches, which sampling site would be the most appropriate to differentiate the oral microbial profile of children with S-ECC and those caries-free. Identifying the appropriate type of sample to be used is important to guide future caries association studies. Third, we evaluated whether it could be possible to predict the sampling site (dental plaque vs. oral swab) based on the microbial profile of the samples. Being able to determine the origin of the samples could be useful for the design of future microbiome studies. For instance, if researchers want to collect supragingival plaque, it would be useful to have a way of detecting if during sample collection the supragingival plaque got contaminated with subgingival plaque, as each of those should have unique microbial profiles. The species column shows the number of species used in the classification and the mean AUROC values are provided with the standard deviation of 30 iterations of the training-test based prediction. The highest AUROC value of each group is bolded.
The oral microbiome is considered highly diverse, compared to other body sites. Although dental plaque, saliva and the buccal mucosa are in close contact, they have diverse microbial communities. The Human Microbiome Project (HMP), for instance, compared the diversity of microbes among five major body areas of 242 healthy individuals and showed that supragingival plaque has higher bacterial alpha diversity compared to the oral mucosa, which agrees with the results reported in the present study (The Human Microbiome Project Consortium, 2012). Hall et al. identified a significant difference between the microbial communities of supragingival plaque, saliva, and tongue samples from health subjects, demonstrating the existence of site-specific oral microbiomes (Hall et al., 2017).
Interestingly, while dental plaque showed increased bacterial alpha diversity compared to oral swabs the fungal alpha diversity showed an opposite pattern, with oral swabs displaying increased fungal alpha diversity. The higher fungal diversity observed in the oral swab may be associated with more fungal DNA of transient colonizers from the environment through mouth breathing and food intake (Xu and Dongari-Bagtzoglou, 2015;Diaz and Dongari-Bagtzoglou, 2021). Furthermore, most oral fungi are present at low biomass and may be difficult to detect in oral samples (Diaz and Dongari-Bagtzoglou, 2021). The above factor may explain why the number of observed fungal ASVs was lower than that of bacteria.
Streptococci was the most abundant bacterial genera in oral swabs, similar to what has been previously reported (Caselli et al., 2020). Neisseria, Haemophilus and Veillonella, found to be the most abundant in dental plaque or oral swab samples, have also been reported as highly abundant in different oral sites by previous studies (Huse et al., 2012;Caselli et al., 2020). Streptococcus, Fusobacterium, Gemella, and Veillonella have all been considered core OTUs in different oral sites (Huse et al., 2012;Hall et al., 2017). Here we showed site-specific differences in the abundance of certain species from these genera, with some being significantly more abundant in dental plaque compared to oral swabs or viceversa. Among children with S-ECC, the known cariogenic bacterium S. mutans was significantly enriched in dental plaque samples compared to oral swabs. It also showed to be among the top 10 most important feature for the classification of S-ECC vs. caries-free in both dental plaque and oral swab samples. Other caries associated bacteria such as Leptotrichia spp. and Selenomonas spp. (Kalpana et al., 2020) were more abundant in dental plaque than oral swab samples from children with S-ECC.
Fungal species from the genera Candida, Malassezia, Meyerozima, and Trichosporon, were among the most abundant in dental plaque and oral swab, similarly to what has been reported in other studies (Shelburne et al., 2015;Baraniya et al., 2020;Robinson et al., 2020;Diaz and Dongari-Bagtzoglou, 2021). The differential abundance analysis showed a significant difference between C. dubliniensis and C. tropicalis in the oral swab of caries-free children and children with S-ECC. Those fungal species were also among the top 20 most important features for the classification of S-ECC vs. caries-free in oral swabs. Candida spp. are among the most abundant fungal species in the oral cavity and they are associated with different oral diseases (Peters et al., 2017;Diaz et al., 2019). C. dubliniensis has only recently been associated with dental caries in children (Al-Ahmad et al., 2016;de Jesus et al., 2020;O'Connell et al., 2020). Here we show that this fungus is not only highly abundant in the dental plaque of children with S-ECC, as previously reported, but it is also enriched in the oral swabs obtained from children with S-ECC compared to those caries-free.
In recent years, machine learning has become a commonly applied approach to early childhood oral health research (Peng et al., 2021). One of the challenges in microbiome data analysis is that the differential analysis methods generally lack the information about predictability. Thus, we used machine learning methods to identify site-specific taxonomic features in dental plaque and oral swabs. The results suggested that both dental plaque and oral swab samples provide a good model for S-ECC vs. caries-free classification. They also suggest that it is possible to differentiate dental plaque from oral swab samples using their microbial profiles. However, site-based classification through fungal species was not optimum in caries-free samples. This could be due to the small number of fungal species that significantly differed in abundance between dental plaque and oral swabs, as observed in the differential abundance analysis.
From our classification results for caries status, it appears that the models using the microbial composition of dental plaque or oral swabs were both able to discriminate between cariesfree and S-ECC samples. However, it is important to notice that the species considered to be important for the classification of disease for each sampling site are slightly different. Based on the results from other machine learning models (Lasso, Enet, and RandomForest), we also observed that the choice of the model does not significantly affect the outcome of the analysis (data not shown).
The limitations of this study include, but are not limited to, the lack of information about the socio-economic status of the participants and the convenient sampling used for recruitment, which means that during recruitment the groups were only matched by caries status. As many factors may influence the oral microbial composition, the results of this study may not be generalizable to other populations with different age groups and geographic locations. In this study, an additional enzymatic lysis step was used during DNA extraction from dental plaque samples to disrupt the dental plaque biofilm. Rosenbaum et al. compared the impact of using different DNA extraction methods, including the use of QIAamp DNA Mini Kit (Qiagen) with and without additional enzymatic lysis step, in the oral bacterial (16S rRNA) and fungal (ITS1 rRNA) microbiota. They showed that all tested DNA extraction methods were able to lyse Gram-positive bacterial species. They also reported no significant differences in bacterial and fungal diversity among DNA extraction methods (Rosenbaum et al., 2019). Other studies also found no significant effect of DNA extraction methods in the microbial composition of oral samples (Lim et al., 2017). Therefore, while we do not expect that the additional enzymatic lysis step significantly contributed to the differences observed between the dental plaque and oral swab microbiota, we cannot completely rule out the possible bias associated with the sample preparation on the analyses comparing dental plaque and oral swab microbiomes.
Currently, UNITE is the most commonly used database for taxonomic classification in mycobiome studies of different environments. However, there is an increased concern regarding the lack of taxonomic coverage on the available databases, which creates limitations to studies trying to characterize the human mycobiome (Nilsson, 2016). Here, a high proportion of fungal ASVs (37.14%) could not be classified to a meaningful taxonomic level beyond kingdom. As the reads passed through the quality control process, the observed high number of unclassified ASVs could be a limitation of the database used. Therefore, the construction of a curated ITS database specific for the oral mycobiome, as exists for the oral bacteriome, is urgently needed. This is a cross-sectional study. Thus, based on our results it is not possible to determine when a significant oral microbial shift from a healthy to a diseased state occurs. Xu et al. performed a longitudinal study where they did a 1-year follow-up of cariesfree 3-year-old children (Xu et al., 2018). The authors suggested that prior to any clinical sign of caries, there is a microbial shift that could potentially be used for the diagnosis and prevention of dental caries in young children. Therefore, future longitudinal studies aiming to further characterize the microbial shifts that precede the first clinical signs of dental caries are needed.
In summary, this study characterized the differences in microbial profiles of dental plaque and oral swab samples from children with S-ECC and those caries-free. Importantly, our machine learning results were able to predict the caries-status (S-ECC vs. caries-free) and sampling site (dental plaque vs. oral swab) based on the microbial profile of the samples. In the future, when data from related studies distinguishing oral sampling sites using microbiome profiles are available, we will perform the replication studies to validate our results.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/, PRJNA555320, PRJNA714139.

ETHICS STATEMENT
This study protocol was approved by the University of Manitoba's Health Research Ethics Board (HREB # HS20961-H2017:250) and by the MHC, Winnipeg, MB, Canada. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
VCJ and PC conceived the study. VCJ, MK, PH, and PC contributed to the design, data analysis, interpretation, and writing of the manuscript. VCJ, BAM, and RJS contributed to data acquisition. KD and RJS contributed to the design, data interpretation, and writing of the manuscript. KD, PH, RJS, and PC contributed to funding acquisition. All authors contributed to the article and approved the submitted version.