Microbiotyping the Sinonasal Microbiome

This study offers a novel description of the sinonasal microbiome, through an unsupervised machine learning approach combining dimensionality reduction and clustering. We apply our method to the International Sinonasal Microbiome Study (ISMS) dataset of 410 sinus swab samples. We propose three main sinonasal “microbiotypes” or “states”: the first is Corynebacterium-dominated, the second is Staphylococcus-dominated, and the third dominated by the other core genera of the sinonasal microbiome (Streptococcus, Haemophilus, Moraxella, and Pseudomonas). The prevalence of the three microbiotypes studied did not differ between healthy and diseased sinuses, but differences in their distribution were evident based on geography. We also describe a potential reciprocal relationship between Corynebacterium species and Staphylococcus aureus, suggesting that a certain microbial equilibrium between various players is reached in the sinuses. We validate our approach by applying it to a separate 16S rRNA gene sequence dataset of 97 sinus swabs from a different patient cohort. Sinonasal microbiotyping may prove useful in reducing the complexity of describing sinonasal microbiota. It may drive future studies aimed at modeling microbial interactions in the sinuses and in doing so may facilitate the development of a tailored patient-specific approach to the treatment of sinus disease in the future.


INTRODUCTION
Microbes present in any environment almost never live in isolation, but enter in various types of ecological relationships with one another (Lidicker, 1979;Faust and Raes, 2012). The end result of these interactions is either a domination of a certain organism, or coexistence through the establishment of metabolic or territorial niches (Coyte et al., 2015;Bauer et al., 2018). In each of these cases, a stable community configuration or state is reached (Lewontin, 1969;May, 1974;Beisner et al., 2003). The stable states of the human microbiota has been postulated based on the findings of high inter-individual variability coupled with relatively low temporal variability, taken as evidence of resilience against perturbations (Costello et al., 2009;Caporaso et al., 2011;Lozupone et al., 2012). Perturbations in an otherwise stable microbiome could be linked to the concept of "dysbiosis" (Olesen and Alm, 2016), which remains a vague term that attempts to explain the contribution of an unhealthy microbiome to disease (Olesen and Alm, 2016). Examining stable microbial states as "clusters, " as opposed to the traditional analysis of the differential abundances of microbial taxa one at a time, could therefore provide another important ecological perspective in describing the microbiome and, through potential unraveling of common commensal-pathogen interactions (Brugger et al., 2016), exploring its relevance to health or disease.
High inter-individual variability represent one of the findings that has already been demonstrated in the microbiome inhabiting the paranasal sinuses (Biswas et al., 2015). This adds a significant challenge when we attempt to determine its role in Chronic Rhinosinusitis (CRS). CRS is a heterogenous, multifactorial inflammatory disease of the sinuses, with a complex and incompletely understood aetiopathogenesis (Fokkens et al., 2012). Naturally, the potential role of the sinonasal microbiome and its "dysbiosis" in CRS pathophysiology has recently gained increased interest. The nature of the microbial dysbiosis and its role in disease causation and progression however remains unclear, with conflicting findings from the small sinonasal microbiome studies published thus far (Paramasivan et al., 2020). This provided the impetus for us to conduct the first multinational, multicenter "International Sinonasal Microbiome Study (ISMS)" (Paramasivan et al., 2020). This study, the largest and most diverse of its kind to date, attempted to address many of the limitations of the smaller previous studies, by standardizing collection, processing and analysis of the samples. Furthermore, its large sample size and multinational recruitment, meant that it was more likely to capture geographical and center-based differences if present. A recent meta-analysis of published sinonasal 16S rRNA sequences revealed that the largest proportion of variance was attributed to differences between studies (Wagner Mackenzie et al., 2017), highlighting a role Abbreviations: AERD, aspirin-exacerbated respiratory disease; ANCOM, "Analysis of Compositions of Microbiomes" method; BLAST, "Basic Local Alignment Search Tool"; CRS, chronic rhinosinusitis; CRSsNP, chronic rhinosinusitis sine nasal polyps; CRSwNP, chronic rhinosinusitis with nasal polyps; ISMS, the International Sinonasal Microbiome Study; PCoA, principal coordinate analysis; PCs, principal components; QIIME 2, "Quantitative Insights Into Microbial Ecology 2" software; SparCC, "Sparse Correlations for Compositional data" algorithm.
for performing a large multi-center study that employed a unified methodology.
Contrary to the findings of previous small single-center studies, our international cohort showed no significant differences in alpha or beta diversity between the three groups of patients analyzed: healthy control patients without CRS and the two phenotypes of CRS patients, those with polyps (CRSwNP) and those without (CRSsNP). The study however revealed a potential grouping of samples as demonstrated on beta diversity exploratory analysis (Paramasivan et al., 2020). Accordingly, we hypothesized that the bacteriology of the sinuses could be categorized into various clusters of similar compositions. We inquired whether these potential groups would aid in describing the sinonasal microbial composition of patients or associate with clinical features. Similar attempts performed on gut microbiota in healthy individuals were termed enterotyping (Arumugam et al., 2011). The clinical relevance of gut enterotypes remain the topic of research, and sometimes controversy. A previous exploration of clusters of sinus microbiota in patients was performed by Cope et al. (2017) in which the authors reported four compositionally distinct sinonasal microbial community states; the largest group of patients were dominated by a continuum of Staphylococcaceae and Corynebacteriaceae (Cope et al., 2017).
In this manuscript, we attempt "microbiotyping" to explain interpatient heterogeneity of the bacterial communities in the paranasal sinuses, and we describe "sinonasal microbiotypes" across the first large, multi-center cohort of individuals with and without CRS. We then describe the composition of these microbiotypes, explore potential clinical associations, and validate microbiotyping on a separate sinus microbiome dataset.

The "International Sinonasal Microbiome Study (ISMS)" Dataset
We perform the primary analysis on the dataset obtained from the "International Sinonasal Microbiome Study (ISMS)" project (Paramasivan et al., 2020). In summary, this dataset is a multicenter 16S-amplicon dataset which includes endoscopicallyguided, guarded swabs collected from the sinuses (in particular the middle meatus/anterior ethmoid region) of 532 participants in 13 centers representing 5 continents. Details of sample collection, DNA extraction and sequencing methodologies are described in the original report (Paramasivan et al., 2020). The 16S gene region sequenced was the V3-V4 hypervariable region, utilizing primers (CCTAYGGGRBGCASCAG forward primer) and (GGACTACNNGGGTATCTAAT reverse primer) according to protocols at the sequencing facility (the Australian Genome Research Facility). Sequencing was done on the Illumina MiSeq platform (Illumina Inc., San Diego, CA) with 300-base-pairs paired-end Illumina chemistry.

Bioinformatics Pipeline
Details of the bioinformatic pipeline is detailed in the original report (Paramasivan et al., 2020). In summary, we utilized a QIIME 2-based pipeline (Quantitative Insights Into Microbial Ecology 2) . Forward and reverse fastq reads were joined (Zhang et al., 2014), quality-filtered (Bokulich et al., 2013), abundance-filtered (Wang et al., 2018), then denoised using deblur  through QIIME 2-based plugins. This yielded a final feature table of high-quality, high-resolution Amplicon Sequence Variants (ASVs). Taxonomy assignment and phylogenetic tree generation (Janssen et al., 2018) was done against the Greengenes (DeSantis et al., 2006) database; and taxonomy was assigned using the QIIME 2 BLAST assigner . A rarefaction minimum depth cut-off was chosen at 400 and this yielded 410 samples out of the original 532 for downstream analysis. The same pipeline was then applied on DataSet Two for purposes of validation of microbiotyping. We chose to reproduce exactly all the original pipeline steps on DataSet Two, despite being a completely separate dataset, to reduce bias.

Delineating the Microbiotypes of the Sinonasal Microbiome
Our approach was guided by the "enterotyping" method described by Arumugam et al. (2011) with adaptations. We constructed a sample distance matrix using the Jensen-Shannon distance (JSD) metric, as used in the original "enterotypes" paper (Arumugam et al., 2011). The Jensen-Shannon distances were calculated between samples in the genus-level-assigned table in a pairwise fashion using the JSD function in the R package "philentropy" with a log (log 10 ) base. Following this, Principal Coordinate analysis (PCoA) was done on the distance matrix for dimensionality reduction and visualization. Clustering was then performed using a standard K-means clustering algorithm, as implemented in the machine learning Python package scikit-learn (version 0.20.1) (Pedregosa et al., 2011) on the first two principal components (PCs) obtained from the PCoA, with the number of clusters (k) chosen at 3 based on visual inspection of the beta diversity PCoA plots. Average silhouette scores, as implemented in scikitlearn, for the range (k = 2-8) were calculated to assess clustering quality, and this revealed the highest silhouette scores: 0.61 and 0.6 for [k = 4] and [k = 3], respectively. The three resulting clusters were defined as the three sinonasal microbiotypes. For further exploration of the subgroups that constitute microbiotype 3, we used the hierarchical density-based clustering algorithm "hdbscan" (McInnes et al., 2017) on the fulldimensional feature table. Genera were projected onto the PCoA matrix using a biplot approach (Legendre and Legendre, 2012), as implemented in scikit-bio's function "pcoa_biplot." Genera were represented in the biplot figure as arrows, originating from the center of the plot pointing to the direction of the projected feature coordinates, and the lengths normalized as a percentage of the longest arrow. We utilized "Analysis of Compositions of Microbiomes (ANCOM)" (Mandal et al., 2015) for identifying differentially-abundant taxa. Taxa genus level and Staphylococcus species level co-occurrence/correlation analysis were done after taxonomy assignment using SparCC (Sparse Correlations for Compositional data) algorithm (Friedman and Alm, 2012), in the fast implementation in FastSpar (Watts et al., 2018).

Validating Microbiotypes on a Second Sinonasal Microbiome Dataset
To infer whether our classification could be generalizable to other sinonasal microbiome samples not included in this study, we sought to validate our microbiotyping approach on a separate, previously-unpublished, 16S dataset. This dataset includes sinonasal microbiome swabs collected from private and public patients attending the Otolaryngology Department (University of Adelaide) to have surgery done by the authors P-JW, AP or the Otorhinolaryngology Service at the Queen Elizabeth Hospital in Adelaide, South Australia. Similar to the main dataset, these included CRS patients who underwent endoscopic sinus surgery for this sinus disease, and non-CRS control patients who underwent other otolaryngological procedures, such as tonsillectomy, septoplasty, or skullbase tumor resection. Sample collection, and processing were done in a standardized fashion similar to that has been described in the ISMS main dataset, except that DNA extraction was carried out using the PowerLyzer Power-Soil DNA kit (MoBio Laboratories, Salona Beach, CA) as previously described (Chan et al., 2016), rather than the Qiagen DNeasy kit (Qiagen, Hilden, Germany). Similar to the ISMS samples, library preparation and 16S sequencing were done at the Australian Genome Research Facility, on the Illumina MiSeq platform (Illumina Inc., San Diego, CA, USA) with the 300-base-pairs paired-end chemistry. Libraries were generated by amplifying (341F−806R) primers against the V3-V4 hypervariable region of the 16S gene (CCTAYGGGRBGCASCAG forward primer; GGACTACNNGGGTATCTAAT reverse primer) (Yu et al., 2005). PCR was done using AmpliTaq Gold 360 master mix (Life Technologies, Mulgrave, Australia) following a two-stage PCR protocol (29 cycles for the first stage; and 8 cycles for the second, indexing stage). Sequencing was done over two MiSeq runs in January 2015. We termed this dataset in this manuscript "Dataset Two." This dataset comprises samples collected from 129 participants. Rarefaction at a cutoff of 400 reads was performed, to match what was performed for the main dataset, and samples with read number <400 were excluded; this yielded a final feature table containing 97 samples, representing 33 CRSsNP patients, 35 CRSwNP patients, and 29 controls.
We took two separate approaches to validation. The first approach is to replicate the previously-described unsupervised Kmeans microbiotyping methodology independently on samples in Dataset Two. We call this first approach the "unsupervised approach." The second approach is to use the K-means model that was fitted on the samples from the Main Dataset to predict labels (i.e., microbiotypes) of the samples in Dataset Two. As such, the Main Dataset is used as a "training dataset" in the language of machine learning. We called the second approach the "semi-supervised approach."

Basic Characteristics of the Study Cohort and Beta Diversity Plots
The main ISMS study cohort was described in our previous publication (Paramasivan et al., 2020). In brief, 410 samples were included in the analysis collected from 13 centers representing 5 continents. These samples are distributed along three diagnosis groups as follows: 99 CRSsNP patients, 172 CRSwNP patients, and 139 (non-CRS) controls. Beta diversity ordination plots (of weighted UniFrac and Jensen-Shannon distances) are shown in Figure 1. The plots do not reveal any distinct grouping by disease state or by center, but on visual inspection show a triangular arrangement suggesting that samples lie on a continuum between three distinct clusters, providing motivation for further analysis.

Composition of the Three Sinonasal Microbiotypes
We applied our microbiotyping approach through the unsupervised dimensionality reduction and clustering method described in the Methods. The principal components  and the taxonomic composition of the resulting "sinonasal microbiotypes" is found in Figures 2A,B, respectively.
The Abundance/Prevalence tables for the microbiotypes is demonstrated in Tables S1A-C.
We used a PCoA biplot to project features (genera) onto the PCoA matrix (Legendre and Legendre, 2012). The 5 topmost abundant genera were overlaid on the PCoA plot as arrows, originating from the center of the plot and pointing to the direction of the projected feature coordinates ( Figure 2B). Each arrow is indicated by the color of the genus according to the Legend in Figure 2B, and the length of each was normalized as a percentage of the longest arrow. The coloring of the samples in the PCoA scatter plot according to the microbiotype assignment is provided for additional illustration (Figure 2A). We note that the biplot arrows show a quasi-orthogonal arrangement between the key genera that constitute the microbiome.
The distributions of the relative adundances of Corynebacterium and Staphylococcus in all three microbiotypes were plotted in histograms ( Figure 2C). It was noted that in microbiotype 1, most samples have a high abundance of Corynebacteria (i.e., Corynebacteria dominate), while Staphylococci appeared to dominate in microbiotype 2 in most samples.

Dissection of "Sinonasal Microbiotype 3"
We observed that Microbiotype 3 included various genera that did not cluster into the major two microbiotypes. It was also evident that this microbiotype is more heterogeneous. Applying the K-Means algorithm we showed poor clustering on only the first two and three Principal Components, since this group included multiple signatures with various dominant organisms. Accordingly, we employed the hierarchical densitybased clustering algorithm "hdbscan" (McInnes et al., 2017) on the full-dimensional OTU table. One advantage of this algorithm is that it can estimate the number of clusters, without a priori specification by the user. This algorithm also has the ability to detect "outliers" that fail to cluster with the rest of the groups and detaches them into a separate "Miscellaneous/Other" group. We ran this algorithm on samples in Microbiotype 3 and this revealed four clusters, each dominated by one of the genera of Streptococcus (21 samples), Haemophilus (16 samples), Moraxella (9 samples), and Pseudomonas (7 samples), with a mean relative abundance ranging from 73.49 to 95.5%. The fifth cluster was the assigned "Miscellaneous/Other" group (18 samples). We term these "sub-microbiotypes": microbiotype 3S, 3H, 3M, 3P, and 3O, respectively ( Figure 2D).

Exploring Microbiotypes at the Species-Level Reveals Potential Antagonism Between Corynebacterium Species and Staphylococcus aureus
At present, species level assignment is limited by the current technology of 16S-surveys, the current state of microbial databases in general, and by our chosen short-read sequencing methodology. However, species level associations hold clinical significance for sinus health, since Staphylococcus aureus has been traditionally associated with biofilm formation and superantigen elaboration, both of which are associated with more severe sinus disease and poorer response to treatment. Furthermore, nasal carriage of methicillin-resistant Staphylococcus aureus (MRSA) is a global health concern with implications that extend far beyond the sinuses. Moreover, our new QIIME 2-based pipeline ) allows a higher "sub-OTU" resolution compared to older pipelines, offering an opportunity to resolve some taxa at species level when possible Thompson et al., 2017).
We explored taxonomy assignment at the species level, with a focus on Staphylococcal species. Staphylococci were assigned to either Staphylococcus aureus, Staphylococcus epidermidis, or unclassified Staphylococcus. We found that almost all of the assigned Staphylococcus aureus species were clustered in Microbiotype 2, forming 47.81% mean relative abundance of this Microbiotype, compared to 1.36 and 0.3% in Microbiotype 1 and Microbiotype 3, respectively ( Figure 2E). Differential abundance of both Staphylococcus aureus and epidermidis between the disease groups was confirmed as statistically significant using the ANCOM method.
In light of this finding, we hypothesized a reciprocal or antagonistic relationship between Corynebacterium sp. and Staphylococcus aureus and investigated this using the SparCC algorithm. This confirmed a significant negative correlation between Corynebacterium genus and the species Staphylococcus aureus (correlation coefficient = −0.339, p = 0.001). Interestingly, Staphylococcus epidermidis positively correlated with Corynebacterium (correlation coefficient = 0.271, p = 0.001). These results suggest that a benign or probiotic role is played by both Corynebacterium spp. and Staphylococcus epidermidis when interacting with Staphylococcus aureus. This should be viewed in the context of previous literature and in the context of the current limitations of 16S-sequencing, and is elaborated on in the discussion.

Prevalence and Distribution of the Microbiotypes in Different Diagnoses and Centers
Microbiotype 1 was assigned to 222 samples (54.1%), microbiotype 2 to 117 samples (28.5%), and microbiotype 3 to 71 samples (17.3%). The prevalence distribution of the sinonasal microbiotypes did not appear to significantly differ by the disease state of the sinuses (Figure 3). However, a Chi-Squared test on the contingency table by center showed significantly different distributions by center (FDR-corrected p < 0.001): there was a higher prevalence of microbiotype 2 in our European center (Amsterdam), and a higher prevalence of microbiotype 1 in Asian and Australasian centers, with a much lower prevalence of microbiotype 3 in Asia (Figure 3 and Table 1).

Associations of Microbiotypes With Clinical Variables
We then explored the distribution of the three microbiotypes among multiple clinical variables in Table 2. This shows no  significant difference for some variables including asthma, aspirin sensitivity, GORD, diabetes mellitus, and current smoking status (FDR-corrected p > 0.05; Chi-squared test). The cross tabulation however revealed a statistically significant association with "aspirin sensitivity" or aspirin-exacerbated respiratory disease (AERD) (p = 0.02), although this did not persist after correction for multiple comparisons (corrected p = 0.077). Patients who were aspirin-sensitive (or suffering from AERD) showed less prevalence of microbiotypes 1, 2, and a higher prevalence of microbiotype 3, compared to those who were not aspirin-sensitive.

Validation of Sinonasal Microbiotyping on a Separate Dataset
We validated our approach on a separate 16S dataset we called Dataset Two. As described in the Methods section, we validated this using an independent unsupervised approach and a semisupervised approach guided by the Main Dataset. The first unsupervised approach yielded three clusters similar to the microbiotypes described on the Main Dataset, with one cluster exhibiting high mean relative abundance of Corynebacteria, a second cluster exhibiting high mean relative abundance of Staphylococcus, and a third cluster with other dominant genera. Plotting the first two Principal Components ( Figure 4A) resulting from PCoA on the Jensen-Shannon distance matrix revealed the same triangular distribution of samples observed in Figure 1.
Prevalence of the microbiotypes in this dataset (using the unsupervised approach) was as follows: microbiotype 1 assigned 39.2% of samples, microbiotype 2 with 26.8% of samples, and microbiotype 3 with 34.0%.
The second semi-supervised approach yielded similar results (Figure 4; Supplementary Jupyter notebook), differing in the classification of only 3 samples (out of 97 samples; i.e., 3.09%; see Supplementary Jupyter Notebook). Two of these samples show Staphylococcus dominating the samples in combination with Haemophilus, with no overt dominance of one taxon over the other, making them more-or-less transitional samples between the signatures of microbiotypes 2 and 3. The third sample was dominated by Staphylococcus and Corynebacterium, making it a transitional sample between microbiotype 1 and microbiotype 2, with Staphylococcal species assigned to epidermidis, making this more appropriately assigned to microbiotype 1 (see Supplementary Jupyter Notebook).
These results validate the microbiotyping approach and suggest that our approach and dataset could be used to guide classification of sinonasal samples sequenced in future separate studies (Figure 4). Moreover, it points toward a potential clinical relevance of performing sinonasal microbiotyping.

DISCUSSION
We demonstrate that the microbiota of most sinus swab samples could be classified into distinct signatures or archetypes, which we have termed "sinonasal microbiotypes." We observed three main microbiotypes: the most prevalent being a Corynebacterium-dominated microbiotype (microbiotype 1), then a Staphylococcus-dominated microbiotype (microbiotype 2), and microbiotype 3 which includes samples dominated by Streptococcus, Haemophilus, Moraxella, Pseudomonas, and other genera.
As we have previously reported (Paramasivan et al., 2020), the sinus microbiota are dominated by the genera Corynebacterium and Staphylococcus (microbiotypes 1 and 2). A similar clustering approach to the sinus microbiome was applied by Cope et al., who utilized Dirichlet multinomial mixture models, and reported that most samples in their study were occupied by a continuum of Staphylococcaceae and Corynebacteriaceae (Cope et al., 2017). It appears that, regardless the statistical or clustering methodology utilized, it is most likely that the sinonasal microbiome consists of core organisms (Paramasivan et al., 2020) that potentially have distinct co-occurrence patterns.
Staphylococcus aureus has been perceived to be an important pathogen in sinus inflammatory disease. Staphylococcus aureus biofilms may act as a nidus for recurrent infections (Jervis- Drilling et al., 2014) and as a "nemesis" of otherwise-successful sinus surgery (Psaltis et al., 2008;Foreman and Wormald, 2010;Singhal et al., 2011). Staphylococcus aureus is also a producer of exotoxins, which in some cases can serve as superantigens, and these have been previously described as playing an important role in the pathogenesis of CRSwNP (Bachert et al., 2008). Pseudomonas aeruginosa biofilms are also virulent organisms that are difficult to eradicate from the sinuses, and have been associated with worse clinical outcomes (Bendouah et al., 2006). Both these organisms are important pathogens in the chronic mucociliary dysfunction exhibited in cystic fibrosis. However, methicillin-resistant Staphylococcus aureus (MRSA) is an important nasal colonizer that could asymptomatically colonize the nose. What determines the clinical course, between asymptomatic colonization vs. symptomatic pathogenicity, remains an interesting topic of research. In this study, we identified a potential reciprocal relationship between Staphylococcus aureus and Corynebacterium. Being aware of the challenges of compositional data analysis, we utilized for this purpose the specialized SparCC algorithm which infers correlations from compositional data (Friedman and Alm, 2012). This finding needs to be supported by future co-culture experiments, but suggests that Corynebacterium sp. may be a "cornerstone" of sinus microbial health. It is important to note that our bioinformatic methodology has been intentionally designed to utilize state-of-the-art software methods at every step of the analysis pipeline, in order to maximize the resolution of taxonomy assignment Bokulich et al., 2018;Bolyen et al., 2018). Nevertheless, our approach is still confined within the limitations of current 16S sequencing methodologies, and the confidence of assignment is reduced beyond the genus level. Our analysis pipeline could not delineate between different Corynebacteria at the species level and Staphylococcus aureus at the strain level. Hence functional difference between samples with same species remain to be determined using a functional metagenomics approach. A recent study suggest that by incorporating location information or "sample-level metadata" into specieslevel assignment accuracy could be improved (Kaehler et al., 2019). In our study, the differential relationships of both Staphylococcus aureus and epidermidis toward Corynebacteria (negative and positive associations, respectively) could be of clinical significance and is worthy of future investigation. We performed a post-hoc inspection of species-level assignment in Dataset Two, to investigate whether this finding will be reproducible in a separate dataset. This confirmed clustering of almost all Staphylococcus aureus species in microbiotype 2 (Supplementary Results in Jupyter Notebook).
The finding of a potential reciprocal relationship between Staphylococcus aureus and Corynebacterium spp. has to be placed in the context of similar previous findings from the literature. The competitive inhibition between Staphylococcus aureus and Corynbacteria were demonstrated in early studies in vitro (Parker and Simmons, 1959;Barrow, 1963). More recently, it has been shown that even these S. aureus strains that survive killing by Corynebacteria in vitro exhibit a decreased virulence profile (Ramsey et al., 2016;Hardy et al., 2019). In vivo, a negative correlation has been demonstrated between Staphylococcus aureus and various Corynbacterial species (including C. accolens and C. pseudodiphthericum) in the anterior nares in several studies (Uehara et al., 2000;Lina et al., 2003;Wos-Oxley et al., 2010;Johnson et al., 2015;Liu et al., 2015). Some interventional studies suggest a probiotic potential as Corynebacteria successfully reduced rates of Staphylococcal colonization when inoculated into the anterior nares (Uehara et al., 2000;Kiryukhina et al., 2013). Moreover, Johnson et al. (2015) showed that colonization of the anterior nares by Corynebacteria was associated with a lower prevalence of S. aureus related skin and soft tissue infections. In the paranasal sinuses, the previously-referenced study by Cope et al. (2017) is the first to demonstrate a reciprocal relationship between Staphylococcaceae and Corynebacteriaceae. In addition to competitive antagonism with S. aureus, the probiotic role of Corynebacteria includes its resistance to Respiratory Syncytial Virus and the pathologenic Streptococcus pneumoniae, demonstrated in an animal model (Kanmani et al., 2017). It also includes its contribution to the stability of the microbiome and the reduced incidence of Respiratory Tract Infections, where this has been demonstrated in children (Biesbroek et al., 2014;Bosch et al., 2017). On the other hand, the probiotic role of Staphylococcus epidermidis has been demonstrated in a murine nasal bacterial interaction model (Cleland et al., 2014).
The distribution of the sinonasal microbiotypes was found to be not significantly dis-similar amongst healthy controls and CRS patients. This result mirrored the findings of the traditional differential abundance approach undertaken in our first report (Paramasivan et al., 2020). There appeared to be no significant associations with other clinical variables such as asthma and aspirin-sensitivity after controlling for multiple comparisons ( Table 2). The distribution of the microbiotypes however differed according to center/location of collection (Figure 3). As such, we cannot conclude based on our study that microbiotypes could function independently as a disease biomarker. It could be the case that chronicity of inflammation -on its own-is not a determinant of a dysbiotic microbiome, but whether there is a clinically-evident "sinus infection" current at the time of sample collection. In this theory, stable chronic sinuses with no overt signs of acute or chronic infection, may remain similar to a "healthy sinus microbiome." Only when the sinuses are clinically infected (as evident on clinical symptoms and endoscopic findings), the microbiota become disrupted and the dysbiosis exaggerated. It is important to note that Streptococcus, Haemophilus, and Moraxella (represented here in microbiotype 3) have been traditionally implicated in acute infections of the upper respiratory tract including acute rhinosinusitis and acute otitis media. Patients with clinically obvious acute exacerbations were not included in the original dataset (Paramasivan et al., 2020). An alternative possibility is that with advancing sequencing technology, and with complementary methods such as shotgun metagenomics or metatranscriptomics, we could unravel the constitution and function of sinonasal microbiotypes at a higher resolution in the future, which might uncover some difference between healthy and diseased states.
Asia and Australasia showed an over-representation of microbiotype 1. Europe had a higher prevalence of microbiotype 2. Unfortunately, the study only included one European center (Amsterdam) so it is difficult to be certain whether this finding generalizes to other locations in Europe. The driving factors for these geographical differences could be multiple, including but not limited to clinical practices such as local antibiotic prescriptions for CRS and timing of recruitment of patients for sinus surgery, as discussed previously (Paramasivan et al., 2020). And as mentioned in our previous analysis, it is difficult to conclude a "causative role" for geography given only our data (Paramasivan et al., 2020).
We have adapted our methodology from the enterotyping approach taken by Arumugam et al. (2011) for classifying bacterial signatures of the gut microbiome. In their original manuscript, they described three different enterotypes in the gut dominated by Prevotella, Bacteroidetes, and Ruminococcus, respectively (Arumugam et al., 2011). Several papers have correlated gut enterotypes with various clinical variables (Wu et al., 2011;Vandeputte et al., 2016). Despite this, enterotyping as an approach to population stratification has not been without its controversies. Several authors have criticized the definition of distinct clusters, since it neglects intra-cluster variation and gradients between clusters (Jeffery et al., 2012;Koren et al., 2013;Knights et al., 2014;Costea et al., 2018). We provide answers to previous critique (Knights et al., 2014) to enterotyping as it applies to our study in Table S2. It is important to note these valid criticisms to any community typing approach. In our experiment, the clusters or types lie on a continuum, with some samples falling in the gradients between two, or perhaps even all three microbiotypes (see ordination plots). The histograms in Figure 2 also suggest this, but they do show most samples in each microbiotype feature a high relative abundance of a dominating genus in many samples. We investigated a simple dominance measure, the Berger-Parker alpha diversity index (Berger and Parker, 1970), in the combined datasets' 507 samples. The Berger-Parker index simply reports the relative abundance of the most dominant taxon in a sample. This found that only 24.9% of samples had a dominating taxon that only had a relative abundance of 50% or less. On the other hand, 51.9% of samples had the dominant taxon exhibiting a relative abundance of >70% of the sample (Supplementary Results in Jupyter Notebook; Figure S1). This shows that in most samples, there is one dominating organism. Based on these results, the microbiotyping approach is therefore proposed to reduce complexity about modeling bacterial interactions in the sinuses, and not to suggest that each type is a walled-off discrete cluster. Further investigations into the local substructures of each type will be required to further explore the roles and interactions of its constituent taxa. Another limitation of our description of microbiotypes is that they may as well-describe different community "states" rather than community "types, " since we do not have longitudinal data to describe how these clusters behave with the passage of time and treatments. Hence, we could not confirm whether these are stable, consistent communities across time. It may well be that intermediate samples lying between two or more microbiotypes are representing a transitional state. An important future avenue of research is to conduct a longitudinal study to investigate the temporal stability of these clusters.
We predict that ongoing sinonasal microbiome research and consequent large meta-analyses of microbiota studies, with novel meta-analytic tools and platforms  enabling such large-scale studies, will allow the refinement of these types and further clarify their clinical/microbiological utility. Our methodological approach to describe the microbiotypes is not exclusive, as alternative statistical or machine-learning approaches could be employed to investigate them. In light of this, we expect that international multi-center standardization and rationalization of the sinonasal microbiotypes would be possible in the future, similar to the recent proposed effort to standardize enterotyping of the gut microbiota by Costea et al. (2018).

CONCLUSION
We examined our International Sinonasal Microbiome Study 16S dataset through an approach modeled on human gut microbiome enterotyping and we found three major microbial community types or "microbiotypes" as clusters that lie on a continuum, based on an unsupervised machine learning approach that involved dimensionality reduction and clustering. Microbiotypes did not show an association with disease state or clinical variable, suggesting that they could not function as independent disease biomarkers. The description of these microbiotypes has also unveiled a potential reciprocal relationship between Staphylococcus aureus and Corynebacterium spp. in the sinuses that requires further investigation in future studies. The findings were validated on a separate previously unpublished sinus bacterial 16S gene dataset. Microbiotypes are therefore proposed to reduce the complexity of modeling bacterial interactions in the sinuses, and in this sense hold microbiological and clinical relevance.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Queen Elizabeth Hospital Human Research Ethics Committee (approval HREC/14/TQEHLMH/222). The patients/participants provided their written informed consent to participate in this study. The project was approved by the respective institutional human research ethics boards of all sample-collection centres. The details of all ethics applications are provided in Table S1 in the original ISMS publication (Paramasivan et al., 2020).

AUTHOR CONTRIBUTIONS
AB wrote primary and revised versions of manuscript and main data analysis. SP, AS, MD, and JC shared in analyzing the data. JC provided bioinformatics QIIME 2 pipeline design, data analysis supervision and critique. EC provided data analysis consultation. JC, EC, AP, SV, and P-JW provided critical review and edits of manuscript drafts. MA, BB, CCa, MC, RD, DD, CG, RH, PH, AL, RS, PT, MT, P-JW, and AP sample collection. CCo, MR, and SM processed samples for sequencing. AP conceived project idea, collaborations, and design. All authors read and approved the final manuscript.