Oropharyngeal microbiome profiled at admission is predictive of the need for respiratory support among COVID-19 patients

The oropharyngeal microbiome, the collective genomes of the community of microorganisms that colonizes the upper respiratory tract, is thought to influence the clinical course of infection by respiratory viruses, including Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the causative agent of Coronavirus Infectious Disease 2019 (COVID-19). In this study, we examined the oropharyngeal microbiome of suspected COVID-19 patients presenting to the Emergency Department and an inpatient COVID-19 unit with symptoms of acute COVID-19. Of 115 initially enrolled patients, 50 had positive molecular testing for COVID-19+ and had symptom duration of 14 days or less. These patients were analyzed further as progression of disease could most likely be attributed to acute COVID-19 and less likely a secondary process. Of these, 38 (76%) went on to require some form of supplemental oxygen support. To identify functional patterns associated with respiratory illness requiring respiratory support, we applied an interpretable random forest classification machine learning pipeline to shotgun metagenomic sequencing data and select clinical covariates. When combined with clinical factors, both species and metabolic pathways abundance-based models were found to be highly predictive of the need for respiratory support (F1-score 0.857 for microbes and 0.821 for functional pathways). To determine biologically meaningful and highly predictive signals in the microbiome, we applied the Stable and Interpretable RUle Set to the output of the models. This analysis revealed that low abundance of two commensal organisms, Prevotella salivae or Veillonella infantium (< 4.2 and 1.7% respectively), and a low abundance of a pathway associated with LPS biosynthesis (< 0.1%) were highly predictive of developing the need for acute respiratory support (82 and 91.4% respectively). These findings suggest that the composition of the oropharyngeal microbiome in COVID-19 patients may play a role in determining who will suffer from severe disease manifestations.

others present with only mild or no symptoms 2 . There are known clinical factors that are associated with risk of 62 severe disease such as age, diabetes, high blood pressure, and obesity 3 , but predicting whether an individual 63 patient will require hospitalization or respiratory support, or can recover safely at home has important 64 implications for healthcare resource utilization. Currently, clinical factors such as age, BMI, and medical 65 comorbidities, in combination with initial vital sign measurements, need for oxygen support, and clinical 66 laboratory testing, are used to predict clinical decompensation and the need for ICU level of care--even the best 67 algorithms, however perform only with an accuracy of 70-80% 4,5 . There are likely other individual factors that 68 determine how a patient responds to COVID-19 and may play a role in determining disease manifestations, 69 such as the need for respiratory support 6 . 70 The oropharyngeal and nasopharyngeal microbiomes, the collection of organisms that colonize the 71 human upper airway, have been hypothesized to influence the host immune responses to respiratory viral and 72 bacterial infections 7 . Commensal bacterial species of the nasopharynx can modulate the immune response to patients requiring respiratory support, and those that did had similar characteristics. The overall mean age of the 93 final cohort was 68 (SD 15.24), 50% were female, the majority of patients identified as Hispanic or Latino 94 (76%) and white (64%). Within the acute COVID+ cohort (see Figure 1), 12 (24%) patients never required any 95 respiratory support, 18 (36%) were treated with supplemental oxygen via nasal cannula, 3(6%) were treated 96 with supplemental oxygen via facemask, 6 patients were treated positive pressure ventilation (12%), and 11 97 (22%) were intubated. There were 2 patients who died of COVID-19 but had Do Not Intubate (DNI) orders; 98 accordingly, they were considered as having respiratory failure severe enough to be treated with intubation. 99 100 Features of the oropharyngeal microbiome are associated with need for respiratory support 101 We first directly compared abundances of microbiome features between COVID-19+ and COVID-19-102 patients utilizing the Wilcoxon Rank Sum test. When corrected for multiple comparisons, there were no 103 bacterial species or metabolic pathway abundances that were significantly different between COVID-19+ and 104 COVID-19-patients. We then trained RFC models to determine what clinical and microbiome features (species 105 and metabolic pathway abundances) were predictive of need for respiratory support. We selected this model 106 because previous work has demonstrated robust correlations between microbiome and clinical outcomes 12 . We 107 chose this machine learning-based approach as it enables the use of non-normally distributed (species relative 108 abundance) and a diverse set of variables (Shannon's alpha diversity index, and numerical and categorical 109 clinical covariates) as features in the same model thus allowing us to predict clinical response from complex 110 multi-modal data 13 . To evaluate the performance of our models, we computed F1 score, the harmonic mean 111 between precision and recall, which accounts for both prediction errors and the specific type of prediction error. 112 Utilizing sample-level Shannon's alpha diversity index and clinical covariates, which included age, BMI, race, 113 ethnicity, selected medical comorbidities available at admission, the model performed well with a mean F1 114 score 0.857 ± 0.000 (Figure 2A). A model trained only on measured bacterial abundances performed 115 comparably with a mean F1 score of 0.837 ± 0.005. A model including clinical covariates, select medical 116 comorbidities, measured bacterial abundances, and sample-level Shannon's alpha diversity index led to a 117 similar predictive performance measured by a mean F1 score of 0.858 ± 0.009. These F1 scores indicate similar 118 performance of clinical and microbial variables. Additional model statistics are included in Table S1. We 119 examined the model that combined microbiome features and clinical covariates in more depth to compare 120 directly how these factors were associated with the need for respiratory support. 121 The aggregated permutated variable importance 14 from the selected RFC model identified the relative 122 abundance of Prevotela salivae as the most important predictor of the need for respiratory support ( Figure 2B). 123 Specifically, a decrease in P. salivae abundance was indicative of respiratory support need ( Figure 2C). 124 Notably, this organism is ranked higher than both patient age and BMI ( Figure 2B), which are two clinical 125 factors known to associate with severe COVID-19 3 . Other factors that were predictive of the need for 126 respiratory support include decreases in Shannon's alpha diversity and the decreases in the relative abundances 127 of Campylobacter concisus, Veillonella infantum, and Actinomycetes species S6-Spd3 ( Figure 2C). 128 To further explore connections between microbiome features and clinical covariates, we examined the 129 association between the abundance of our 15 top-predicting microbes with clinical covariates using MaAsLin2. 130 MaAsLin2 determines multivariable associations between clinical variables and microbiome data utilizing 131 general linear models as opposed to a random forest 15 . This approach allows us to determine if specific 132 microbiome predictors are associated with our clinical outcome of interest (need or O2 support) after explicitly 133 controlling for the effect of possible confounding clinical covariates (i.e., age and BMI). Furthermore, 134 MaAsLin2 analysis can also be considered an independent validation of our findings using a different 135 methodology. The need for respiratory support was identified as significantly associated with four of the fifteen 136 RFC-identified as important microbes, specifically, P. salivae, Eubacterium branchy, Actinomyces sp. S6 spd3 137 and, Aggregatibacter sp. oral taxon 45 (Table 2). Age was found to be independently associated with 138 abundance of P. salivae, and Neisseria sp. oral taxon 014. None of the top microbial predictors were found to 139 associate with BMI. These results support the association between microbiome features and the need for 140 respiratory support as these features were found to be significantly associated with this outcome utilizing an 141 approach that specifically controls for potential confounders such as patients' age and BMI. 142 Similar analysis was repeated on the samples profiled for the abundance of metabolic pathways using 143 HUMAnN3 16 . The relative abundance of specific bacterial metabolic pathways was also highly predicative of 144 the need for respiratory support (mean F1 score 0.804 ± 0.009) and adding clinical covariates available at 145 admission to the model, resulted in a similar mean F1 score of 0.821 ± 0.004 ( Figure 3A). Additional model 146 statistics are included in Table S2. The metabolic pathways most important in predicting the need for 147 respiratory are decreased abundance of LPS biosynthesis (CMP-3-D-manno-octulosonate and lipid IV A 148 biosynthesis), mycolate biosynthesis, and trehalose degradation pathways and increased abundance of L-149 threonine, L-proline and inosine-5-phosphate pathways ( Figure 3B,C). We examined the contribution of 150 bacterial genera to two LPS biosynthetic pathways that were highly predicative of the need for respiratory 151 support. We observed, less of the CMP-3-deoxy-D-manno-octulosonate pathway originating from Prevotella 152 and large portion of this pathway is originating from Pseudomonas in patients who required respiratory support 153 (Supplementary figure 1). A large contributor to the Lipid IVA biosynthesis pathway in patients who required 154 respiratory support originated from Aggrigatibacter, a genus closely related to Haemophilus influenzae 17 . We 155 similarly applied MaAsLin2 to the metabolic pathway predictors identified as important in our RFC. Seven of 156 the top predictors identified also showed significant associations by MaAsLin2 with only one pathway (stearate 157 biosynthesis) significantly associated with age as well. Notably, the relative abundance mycolic acid 158 biosynthesis pathway was found to be a top predictor of the need for respiratory support and significantly 159 associated with the need for respiratory support by MaAsLin2. We show that the abundance of several Gram-negative and Actinomyces species and metabolic pathways 163 associated with LPS, mycolic acid, and amino acid biosynthesis within the oropharyngeal microbiome are 164 associated with COVID-19 patients developing the need for respiratory support and thus COVID-19 severity. 165 The top predictors from our RFC predictive model were confirmed using an independent analysis based on 166 generalized linear models. When examining important factors associated of the need for respiratory support, we 167 found that decreased abundances of P. salivae, and an Actinomyces species were highly associated with the 168 need for respiratory support in both analyses, suggesting the presence of these protective organisms is 169 associated with COVID-19 patients not requiring respiratory support. A higher abundance of genes encoding 170 the metabolic pathways for mycolate biosynthesis, L-alanine biosynthesis, stearate biosynthesis, folate 171 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 28, 2022. ; transformation, and genes associated with aerobic utilization of hexuronides were identified in both analyses as 172 associated with the need for respiratory support, with LPS biosynthesis genes (CMP-3-D-manno-octulosonate 173 and lipid IV A biosynthesis) also found to be highly predictive in the RFC. These trends suggest that the most 174 important microbiome factors in predicting the need for respiratory support are a higher abundance of some 175 commonly detected oropharyngeal commensal bacteria and an increased abundance of pathways associated 176 with bacterial product biosynthesis and aerobic respiration. Prevotella has generally been implicated in chronic inflammation 22 but is also part of the normal, 192 healthy lung microbiome 23 . P. salivae has been shown in animal models to stimulate less inflammatory cytokine 193 production and lead to less neutrophil chemotaxis than the Gram-negative respiratory pathogens Morexella 194 catarhallis and Haemophilus influenzae 24 . It is hypothesized that a penta-acylated LPS produced by Provetella 25 195 stimulates less innate-immune receptor activation than hexa-acylated LPS produced by Gram-negative 196 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.28.22271627 doi: medRxiv preprint respiratory pathogens and Escherichia coli 22 . This may represent an adaptation that allows Prevotella to 197 colonize the upper airway without causing disease. 198 Our metagenomic analysis found that the abundance of two LPS biosynthetic pathways, CMP-3-deoxy-199 D-manno-octulosonate and lipid IV A biosynthesis, are the top predictors of the need for respiratory support in 200 the RFC. CMP-3-deoxy-D-manno-octulosonate is a critical metabolite in LPS biosynthesis 26 , and lipid IVA is a 201 precursor in the production of the lipid A core of LPS 27 . In our RFC model trained with metabolic pathways and 202 clinical covariates, a higher abundance of these pathways appears protective, which initially seems counter-203 intuitive as LPS is known to generate substantial inflammation via the innate immune system activation 28 . 204 When we examined the contribution of bacterial genera to the CMP-3-deoxy-D-manno-octulosonate 205 biosynthesis pathway, we observed that less of the pathway originated from Prevotella in patients who required 206 respiratory support and a larger portion of this pathway originates from Pseudomonas, a known respiratory 207 pathogen capable of producing highly inflammatory LPS 29 . A large contributor to the Lipid IVA biosynthesis 208 pathway originated from Aggrigatibacter, a genus closely related to Haemophilus influenzae 17 , which also 209 produces highly inflammatory LPS 24 . A possible explanation for these findings may be related to the natural 210 history of COVID-19 lung disease. Sequencing-based analysis of broncho-alveolar lavage fluid from patients 211 hospitalized with COVID-19 lung disease has shown the presence of oropharyngeal flora, which are 212 hypothesized to enter the lungs by aspiration 30 . The presence of organisms producing more inflammatory LPS 213 in the oropharynx translocating to the lungs may potentiate inflammation during COVID-19 lung disease and 214 lead to the need for respiratory support. Our findings support the hypothesis that a higher abundance of 215 Prevotella and other species producing weakly immunogenic LPS corresponds to decreased abundance of more 216 inflammatory LPS producing species. If aspiration and translocation occurs during COVID-19, the presence of 217 organisms that produce less inflammatory LPS may limit inflammation in the lungs of COVID-19 patients. 218 219

Actinomyces and Mycolic Acid Biosynthetic Pathway 220
A lower abundance of several Actinomyces were found to be predictive of the need for respiratory 221 support in our RFC and an Actinomyces species was found as associated with the outcome via MaAsLiN2. 222 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.28.22271627 doi: medRxiv preprint the human body and environment 31,32 . Clinically, they are usually associated with slow progressing infections of 224 the head, neck, chest and pelvis 32 . They are likely a component of a healthy oropharyngeal microbiome, in a 225 study of the oropharyngeal microbiome among healthy adults, higher Actinomyces abundance was associated 226 with decreased systemic inflammation 33 . They also are capable of biosynthesis of a wide variety of biologically 227 active compounds including mycolic acid 34 . A lower abundance of the pathway for mycolic acid biosynthesis 228 was a top predictor of the need for respiratory support in our RFC model and was also associated with the 229 outcome by MaAsLiN2. Actinomyces is the only genera found to effect COVID-19 in this study hypothesized to 230 be capable of mycolic acid production. An anti-inflammatory effect, possibly via mycolic acid biosynthesis, 231 may be why a higher abundance of these organisms and this metabolic pathway is predictive of not requiring 232 respiratory support. 233 234 235

The Potential Protective Effect of Commensals 236
The predominant effect that we observed was that a decrease in the abundance of several commensal 237 organisms and an increased abundance of bacterial products synthesis pathways of the oropharyngeal 238 microbiome is the primary predictor of the need for respiratory support in COVID-19. The finding that the 239 bacteria of the oropharyngeal microbiome are potentially protective against severe COVID-19 fits with 240 observational data about the treatment of COVID-19 patients with antibiotics. These studies suggest that 241 treatment of COVID-19 with antibiotics does not reduce mortality and that secondary bacterial infection is 242 uncommon 35,36 . Our findings run counter to the hypothesis that the oropharynx is primarily a source of 243 opportunistic pathogens that gain access to the lungs during the course of COVID-19 30 . 244 If the predominant effect were that the presence of harmful or pathogenic bacteria in the oropharyngeal 245 microbiome contributing to severe COVID-19, one might expect treatment with antibiotics to be beneficial. Our 246 findings are more consistent with the results of animal-model experiments with influenza, that suggest that 247 treatment with antibiotics is potentially harmful due to their effect on beneficial commensal organisms. In mice 248 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. ; challenged with influenza who had normal upper airway microbiomes, macrophages activated genes associated 249 with anti-viral activity such as interferon-gamma, while those who were treated with antibiotics failed to 250 activate these pathways and had more severe lung disease 9 . In another study, antibiotic treatment prior to 251 influenza challenge impaired dendritic cell priming and migration to draining lymph nodes that ultimately led to 252 impaired development of T-cell mediated adaptive immunity 37 . In COVID-19, the oropharyngeal microbiome 253 may play a similar role, aiding the development of an effective anti-viral response that limits severe disease 254 manifestations. In this context, the microbiome was demonstrated to be critical to an effective immune response 255 to viral infection 8,9 . 256 257 258

Strengths and Limitations 259
Our strengths include our enrollment of patients within the Emergency Department during acute 260 presentation of the disease, prospective data collection, use of metagenomic sequencing, and use of two 261 independent analysis techniques to verify our results. The enrollment and collection of samples within the 262 Emergency Department has allowed us to sample the microbiome of patients early in disease course before 263 medical intervention. We excluded any patients with self-reported symptoms longer than 14 days at time of 264 collection to focus our analysis on the acute phase of the COVID-19. Our characterization of the oropharyngeal 265 microbiome shows us features that can be predictive of disease course and potentially a target for therapeutics. 266 In addition, the use of metagenomic sequencing for microbiome characterization has enabled us to determine 267 what bacterial metabolic pathways could potentially affect disease course as opposed to just genus-level 268 information provided by 16S rRNA sequencing. Although some microbiome features were also associated with 269 age by MaAsLin2, these represent independent associations and would have been corrected for when 270 determining associations with the need for respiratory support.

Conclusions 284
We demonstrate a relationship between disease manifestations of COVID-19 and the oropharyngeal 285 microbiome. Specifically, the decreased abundance of some organisms, primarily P. salivae, is predictive of 286 patients requiring respiratory support. We show that the presence of metabolic pathways for bacterial products 287 such as LPS and mycolic acid are also predictive of not requiring respiratory support, implying that the presence 288 of bacteria producing these products has a positive impact on disease course. Together, these findings suggest 289 that the presence of beneficial commensal bacteria in the upper airway has the potential to prevent or mitigate 290 pulmonary manifestations of COVID-19. Thus, our study underscores that the interaction between the 291 oropharyngeal microbiome and respiratory viruses such as SARS-CoV2 could potentially be harnessed for 292 diagnostic and therapeutic purposes. testing. Enrollment and sample collection took place April 2020 through March 2021, this occurred before 300 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.28.22271627 doi: medRxiv preprint vaccines were widely available and no subjects had been vaccinated against COVID-19. Enrolled patients were 301 followed prospectively through the Electronic Medical Record (EMR). We collected information on disease 302 outcomes of COVID-19 for their initial visit including need for respiratory support, the results of clinical 303 laboratory testing, and mortality via the EMR. The Institutional Review Board at the University of 304 Massachusetts Medical School approved this study (protocol # H00020145). Intubate (DNI) order but went on to die of COVID-19 symptoms, we considered that patient has having 319 respiratory failure severe enough to require intubation and classified the sample as being from a patient who 320 was intubated. Patients were considered as having in-hospital mortality from COVID-19 if this was listed as a 321 cause of death on hospital death records. 322 Sequence Processing and Analysis: Shotgun metagenomic reads were first trimmed and quality filtered 323 to remove sequencing adapters and host contamination using Trimmomatic 39 and Bowtie2 40 , respectively, as 324 part of the KneadData pipeline version 0.7.2 (https://huttenhower.sph.harvard.edu/kneaddata/). As in our 325 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Microbiome-clinical factors modeling: To determine the association between bacterial species 328
abundance and COVID-19 diagnosis, we performed a non-parametric Wilcoxon Rank Sum test for species with 329 at least 5% prevalence and a minimal average relative abundance of 0. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022.  CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 28, 2022.       . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.28.22271627 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.28.22271627 doi: medRxiv preprint 1 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 28, 2022. ; https://doi.org/10.1101/2022.02.28.22271627 doi: medRxiv preprint