miR-1207-5p Can Contribute to Dysregulation of Inflammatory Response in COVID-19 via Targeting SARS-CoV-2 RNA

The present study focuses on the role of human miRNAs in SARS-CoV-2 infection. An extensive analysis of human miRNA binding sites on the viral genome led to the identification of miR-1207-5p as potential regulator of the viral Spike protein. It is known that exogenous RNA can compete for miRNA targets of endogenous mRNAs leading to their overexpression. Our results suggest that SARS-CoV-2 virus can act as an exogenous competing RNA, facilitating the over-expression of its endogenous targets. Transcriptomic analysis of human alveolar and bronchial epithelial cells confirmed that the CSF1 gene, a known target of miR-1207-5p, is over-expressed following SARS-CoV-2 infection. CSF1 enhances macrophage recruitment and activation and its overexpression may contribute to the acute inflammatory response observed in severe COVID-19. In summary, our results indicate that dysregulation of miR-1207-5p-target genes during SARS-CoV-2 infection may contribute to uncontrolled inflammation in most severe COVID-19 cases.

INTRODUCTION COVID-19 is the first worldwide pandemic in a globalized world. The short time since the outbreak is the reason why many aspects of the molecular interactions of SARS-CoV-2 in the human host are still unknown, especially its mechanisms at transcriptional level. The present study aims to unravel the role of human miRNAs in SARS-CoV-2 infection. miRNAs are short non-coding RNA molecules with a post-transcriptional regulatory function (Bartel, 2004). They bind complementary sequences in mRNA molecules, with the role of inhibiting the translation of their mRNA targets into proteins (Bartel, 2009). Host endogenous miRNA activity in viral propagation has been previously studied and many complex virus-specific mechanisms have been identified, although the precise role of miRNAs in viral infections is not yet fully understood (Bruscella et al., 2017). In this paper, we present the results of an extensive predictive analysis to identify human lung-specific miRNAs that may bind the SARS-CoV-2 RNA. Then, we considered the already experimentally validated miRNA interactions with endogenous genes to identify the host's miRNA regulatory sub-network affected by SARS-CoV-2 infection, looking at the virus as a competing RNA (Sumazin et al., 2011). We finally evaluated the impact of such interactions on the expression profile of genes targeted by the identified miRNAs in human airway epithelial cells infected with SARS-CoV-2. Specifically, we identified miR-1207-5p as a possible regulator of the S protein in SARS-CoV-2 RNA. As so, we suggest that the viral RNA competes with the CSF1 mRNA, a known target of miR-1207-5p , leading to CSF1 overexpression. To support our hypothesis, several published transcriptional datasets were evaluated. The finding that the CSF1 gene is over-expressed in lung epithelial cells infected with SARS-CoV-2 supported our hypothesis. CSF1 controls the production, differentiation and function of macrophages and its overexpression may contribute to the acute inflammatory response observed in severe COVID-19.

Transcriptomics Datasets and Analysis
Normal lung tissue expression profiles have been downloaded from TissueAtlas (Ludwig et al., 2016). Raw miRNA expression data from 18 lung control tissues were normalized with quantile normalization and the average expression level for each miRNA was computed. We used the average expression profile computed from all the 18 control tissues to identify the top 100 expressed miRNAs in normal lung tissue. Table S1 summarizes the list of selected miRNAs and their average expression level in lung control tissues.
A wide collection of already available transcriptomics datasets with gene expression profiles after SARS-CoV-2 infection has been assembled from literature, as summarized in Table S2. When available, we considered the differential expression analysis results obtained by the authors. Otherwise, we preprocessed and analyzed the gene expression profiles as specified in column "Data analysis" of Table S2. When raw count RNAseq data was available, we used the DESeq2 (Love et al., 2014) R pipeline to compare infected vs. not infected samples, and the Benjamini-Hochberg procedure (Benjamini, 1995) to compute adjusted p-values. The univariate threshold of statistical significance was set at 5%.

SARS-CoV-2 Sequences
The RefSeq sequence NC_045512 was used as reference to predict the binding sites of human miRNAs on the viral RNA. A total of 15881 worldwide viral complete genomes was downloaded-updated to September 7th, 2020-from the Severe acute respiratory syndrome coronavirus 2 data hub of NCBI Virus database, by filtering for taxid = "2697049" and Nucleotide Completeness = "complete". Stability of particular viral genome regions was assessed by searching the exact match of the region in all the viral available genomes. To assess the statistical significance of the stability of each binding site, we associated a p-value with the number (m bs ) of viral sequences that showed a mutation in the region of the binding site. Such a p-value was calculated as the frequency with which a number of mutations larger or equal to m bs was observed in all of the other regions with the same length of the binding site in the involved mRNA.

miRNA Target Prediction
Mature miRNA sequences were downloaded from miRbase, version 22. We used four miRNA target prediction tools to assess whether an RNA sequence is predicted to be a target of a miRNA: miRanda (Enright et al., 2003), PITA (Kertesz et al., 2007), Targetscan (Agarwal et al., 2015), and ComiR (Coronnello et al., 2012;Coronnello and Benos, 2013;Bertolazzi et al., 2020). miRanda script was used with -score 0 and -energy 0 settings. PITA and Targetscan scripts were used with default settings. ComiR was used to compute the ComiR score associated with the targets of each single miRNAs. For each miRNA we identified as highly predicted targets the genes that passed all the following conditions: -miRanda binding energy, lower than -20; -PITA DDE, lower than -15 -TargetScan Binding Site, 8mer or 7mer -ComiR score, greater than 0.85 We used the localization of the binding sites predicted by PITA, miRanda and Targetscan to further restrict the set of targets by considering only the binding sites predicted by all the three algorithms. The resulting targets are named as highly predicted targets.
Experimentally validated miRNA targets were downloaded from miRTarBase, where only the validation methods with strong evidence (i.e. Reporter assays, RT-qPCR, and Westernblot based experiments) have been considered.

Five Human Lung-Specific miRNAs are Predicted to Target SARS-CoV-2 Viral Genome
Aiming to unravel the role of endogenous miRNA expressed in the human lung with respect to SARS-CoV-2 virus, we focused our analysis on the 100 most expressed miRNAs in normal lung (Ludwig et al., 2016), identified as described in Methods. We identified potential targets of these 100 miRNAs on SARS-CoV-2 RNA sequence (NCBI reference viral sequence NC_045512), using four miRNA target prediction tools (Enright et al., 2003;Kertesz et al., 2007;Coronnello and Benos, 2013;Agarwal et al., 2015) (see Methods). Only 15 miRNAs were predicted to target the viral RNA by all the four algorithms ( Figure 1A). Among the predicted miRNA:viral RNA interacting pairs (specific target locations), six were identified by all four algorithms ( Figures  1B, C).
The six sites were targeted by 5 miRNAs: miR-6089, miR-6821-5p, miR-103a-3p, miR-4763-3p, and miR-1207-5p. miR-4763-3p and miR-1207-5p miRNAs belong to the same miRNA family, sharing the same seed sequence (ggcaggg). In our analysis, we predict that they have a common binding site in the viral sequence, located in the region coding for the Spike (S) glycoprotein. Spike is a structural protein that allows Sars-Cov-2 to enter host cells by interacting with membrane receptors (Masters, 2006). Human miRNAs miR-6089, miR-6821-5p, and miR-4763-3p have their binding sites in the ORF1ab gene, specifically hitting the regions coding for Nsp10, formerly known as growth-factor-like protein (GFL), Nsp12, an RNA-dependent RNA polymerase, and Nsp13_ZBD gene, a helicase ( Figure 1C). The three mentioned nonstructural proteins are crucial in coronavirus replication, being part of a complex of 16 non-structural proteins entailed for viral RNA replication and transcription (Masters, 2006;Bouvet et al., 2014). miR-103a-3p binding site is located in the Nucleocapsid (N) protein coding region. N proteins are structural proteins, that play key roles during the packaging of the viral RNA genome (Masters, 2006). Whether the enhancement of the host's miRNAs regulatory machinery could inhibit the replication process or the production of the structural viral proteins, and as a consequence the virus diffusion through the host, is a hypothesis that needs to be experimentally validated and requires further investigation.

Stability of Predicted miRNA Binding Sites on SARS-CoV-2 RNA
The worldwide spread of COVID-19 infection exposes the viral genome to a high risk of mutation. For this reason, we checked the binding sites' sequence stability across the 15881 SARS-CoV-2 genomes annotated from all over the world in the NCBI virus database. To analyze such a stability, for each one of the six selected binding sites, we counted the number of viral sequences that presented a mutation. Results are reported in the third column of Figure 1B. We found that the binding regions are highly stable, which implies the consequent stability of bindingsite predictions across the currently circulating viruses. In addition, we compared the occurrences of mutations in each binding site with the occurrences in any other region of the same length in the involved viral coding RNA, as described in the Methods section. The obtained p-values (see Figure 1B) indicate that the stability of all the six binding sites does not show a significant deviation from the one of the whole mRNA in which they are located, respectively.
Host mRNAs Competing With SARS-CoV-2 RNA are Overexpressed in Lung Epithelial Cells The viral sequence, once expressed, can interact with the host's miRNA regulatory machine by sequestering the selected miRNAs. Therefore, viral RNA may act as a miRNA sponge, with the same mechanism of competing endogenous RNA (Sumazin et al., 2011). Among the five selected miRNAs, two have been previously studied in detail. miR-103a-3p activity has been widely studied in different tissues, i.e. gastric and colorectal cancer or liver (Liao and Lönnerdal, 2010;Martello et al., 2010;Annibali et al., 2012;Chen et al., 2012;Yu et al., 2012;Zhang et al., 2012;Geng et al., 2014;Liang et al., 2015;Zhang et al., 2015;Asiaee et al., 2019), and a number of its targets has been validated. miR-1207-5p expression is high in the cytoplasmic fraction of human normal lung tissue while being reduced in cancer . miR-1207-5p has been first characterized as negative regulator of epithelial-to-mesenchymal transition (EMT) as it inhibits the expression of a number of genes involved in this process, including Snail, Smad2, Smad3 and Vimentin Qin et al., 2016). In addition to its role in EMT, miR-1207-5p plays an important role in shaping the inflammatory milieu. In this respect, CSF1 (colony-stimulating factor 1, also known as macrophage colony-stimulating factor, M-CSF) has been reported as one of its direct targets . In order to test whether infection of lung epithelial cells with SARS-CoV-2 cell infection affects the gene expression levels of the endogenous miRNA target genes, we used a recent dataset of gene expression profiles of human lung-derived cells infected with SARS-CoV-2 (Blanco-Melo et al., 2020). Authors examined the behavior of wild type adenocarcinomic human alveolar basal epithelial (A549) and airway epithelial (Calu3) cell lines. A549 cells show a low expression of ACE2 receptor, hence a limited coronavirus infection rate. Thus, the authors also analyzed A549 cells transfected with a vector expressing ACE2 (A549+ACE2). We used this dataset to analyze the transcriptional profiling of the experimentally validated targets of miR-1207-5p and miR-103a-3p. Figure 2A presents  We expect that endogenous direct targets will increase their expression level following SARS-CoV-2 infection since viral RNA will compete with the endogenous RNA. Some of the analyzed targets behave as expected, especially the ones that are in the range of 1,000-2,000 reads per million (rpm), including CREB1, CSF1, PTEN, and DICER1. Consistent with the known A549 limited infection rate, the expression of these genes is enhanced in ACE2-expressing A549 cells, and even more in Calu-3 cells that are highly permissive to SARS-CoV-2 replication. These findings support our hypothesis that the viral RNA may act as a competing RNA for a selection of host miRNAs leading to the increase of the expression level of their endogenous targets. Highly expressed targets, for instance ADAM10, are not upregulated as expected. This is probably due to the fact that these genes might be modulated by other highly expressed miRNAs not sequestered by the virus. Alternatively, the sponge effect that we are hypothesizing is not effective when the mRNA is highly expressed. Figure 2B presents the complexity of the miRNA-target network known up to now. Here we map all the experimentally validated interactions among the list of direct targets of miR-1207-5p and miR-103a-3p, and 45 of the 100 most highly expressed miRNAs in healthy lungs, that show at least one interaction. For instance, we observe that ADAM10, one of the targets of miR-103a-3p, is also regulated by miR-451a, the most highly expressed miRNA in lung. The presence of this regulator might be the reason why the expression of ADAM10 is not affected by the presence of the virus.
Binding of miR-1207-5p to SARS-CoV-2 RNA May Lead to Over-Expression of EMT-Related Genes and CSF1 miR-1207-5p has been first characterized as negative regulator of EMT by controlling the expression of several genes including SMAD2, SMAD3, SMAD7, CLASP1, ZEB1, and SNAIL1 Qin et al., 2016). EMT processes favor fibrotic events. Of interest, current data suggest that pulmonary fibrosis after COVID-19 recovery could be substantial (Cabrera-Benitez et al., 2014;Merad and Martin, 2020;Spagnolo et al., 2020). Therefore, we tested the hypothesis that SARS-CoV-2 infection in bronchial epithelial cells may have an impact on the expression of these genes by reducing the availability of miR-1207-5p. Figure 3 shows the results of the differential expression analysis for the genes involved in EMT that have been reported to be regulated by miR-1207-5p. The increase in their expression levels appears evident when cells are infected with SARS-CoV-2 virus, therefore supporting our hypothesis.
We further expanded our analysis by evaluating the impact of SARS-CoV-2 infection on the expression of CSF1. As reported in Figure 2, CSF1 is one of the host gene targets most upregulated following viral infection. CSF1 is a predicted target of 3 out of 5 of the miRNAs targeting the virus sequence: miR-4763-3p, miR-1207-5p and miR-6089. It is also an experimentally validated target of miR-1207-5p . The only other known miRNA CSF1 regulator, among the 100 highly expressed miRNA in the lung, is miR-130a-3p, which is expressed at lower level than miR-1207-5p. CSF1 regulates the survival, proliferation, differentiation, and chemotaxis of tissue macrophages and dendritic cells (DC) that play a key role in innate immune responses. In the human lung, CSF1 can be released by airway epithelial cells in the airspace and its local concentration contributes to control the recruitment and activation of DC and macrophages (Louis et al., 2015;Moon et al., 2018;Turianováet al., 2019).
To further validate our hypothesis that the CSF1 mRNA is over-expressed after SARS-CoV-2 infection, we analyzed several recently published datasets as reported in Table S2. To this purpose, different types of experimental designs and platforms were taken into consideration. When available, we referred to the differential expression analysis performed by the authors. Specifically, we considered transcriptomics data analysis of infected vs. healthy samples from human lung biopsies as reported in (Vishnubalaji et al., 2020), bronchoalveolar lavage fluid (BALF) in (Zhou et al., 2020), peripheral blood mononuclear cells (PBMC) and BALF in (Xiong et al., 2020), and whole blood in (Ong et al., 2020). We also analyzed the single cell RNAseq data from whole blood reported in ref. (Wilk et al., 2020), infected NHBE cells in (Ravindra et al., 2020), and infected Calu3 cells in (Emanuel et al., 2020). Data sets obtained by analyzing human samples were not useful to confirm our hypothesis. This can be due to several reasons. More specifically, the high-variability among patients, the cell heterogeneity of reported biological samples (such as bronchioalveolar lavage fluids and lung biopsies) with different efficiency of viral transfection and the low sample size make it really difficult to unravel fine regulatory mechanisms of virus-host interaction. On the contrary, when dataset derived from bronchial epithelial cells (both primary cells and cell lines) were analyzed, significant upregulation of CSF1 was observed therefore confirming our hypothesis. For example, Wyler et al. (Emanuel et al., 2020) performed gene expression profiles of SARS-CoV-2 infected Calu3 cell line. Overexpression of CSF1 in Sars-CoV-2 infected versus mock treated cells confirmed our hypothesis. Furthermore, in Ravindra et al. (2020) the authors performed single-cell RNA sequencing of human bronchial epithelial cells grown in air-liquid interface and infected with SARS-CoV-2. When looking at ciliated cells, the expression of CSF1 significantly increased in infected compared to mock cells. Of note, the expression of CSF1 was significantly higher in ciliated infected cells compared to bystander cells that remained uninfected in samples challenged with SARS-CoV-2. These findings suggest that viral replication inside the cells is required in order for CSF-1 to be over-expressed therefore supporting that a direct interaction between viral RNA and host miRNAs is required to alter the expression of CSF1 during infection.

DISCUSSION
In 10-20% of the cases, SARS-CoV-2 infections may progress to interstitial pneumonia and acute respiratory distress syndrome (ARDS) especially in patients with older age and comorbidities. Clinical features of severe COVID-19 as well as their systemic cytokine profile suggest the occurrence of macrophage activation syndrome (MAS) (McGonagle et al., 2020;Merad and Martin, 2020). High rates of viral replication have been listed among the factors that may drive severe lung pathology during infection by contributing to enhanced host cell cytolysis and production of inflammatory cytokines and chemokines by infected epithelial cells (Merad and Martin, 2020;Wen et al., 2020;Gardinassi et al., 2020). We propose that the high concentration of viral RNA in the cell may sequestrate miR-1207-5p therefore contributing to CSF1 release leading to enhanced macrophage recruitment and activation. In fact, increased release of CSF1 may represent a predisposing factor for MAS and cytokine storm secondary to viral infection (Akashi et al., 1994;Maruyama and Inokuma, 2010). Consistently, it has been recently reported that T-cell derived CSF-1, acting via intercellular crosstalk, may be associated with cytokine storm in COVID-19 (Wen et al., 2020). In our proposed model, infected bronchial epithelial cells may be a source of CSF-1 contributing to local and systemic inflammatory profiles. In addition, reduced availability of miR-1207-5p may also promote EMT events therefore favoring fibrosis (Cabrera-Benitez et al., 2014;Merad and Martin, 2020;Spagnolo et al., 2020). Although further experimental validation will be required to confirm direct interaction between miR-1207-5p and the SARS-CoV-2 genome, our proposed model has been confirmed using several published datasets. Results herein reported strongly suggest that upregulation of CSF1 due to interaction of miR-1207-5p with viral genome may occur when lung epithelial cells are infected with a high viral load. A limitation of the current study is the lack of data regarding protein levels and release. To address this issue, we carefully looked for published proteomics data in COVID19 literature, but, so far, no information about CSF1 protein levels has been published and therefore further studies will be carried out to address this point. Nevertheless, transcriptional and posttranscriptional control of mRNA levels represent a key regulatory step for most inflammatory mediators during infection. In this respect, the discovery of novel potential mechanisms that contribute to modulate the mRNA levels of a specific inflammatory mediator in the context of SARS-Cov-2 infection may represent a step forward toward a better understanding of virus-host interaction molecular mechanisms. A wide analysis of the SARS-CoV-2 transcriptome (Kim et al., 2020) revealed the presence of several non-canonical subgenomic RNAs. They consist in discontinuous transcriptions of the viral sequence, where the 5' leader region is fused to a nonconventional part of the genome. As a result, the obtained RNA contains only a portion of the viral mRNAs. It is tempting to speculate that they may play a role as competing RNA. Specifically, miR-1207-5p related binding site is located in the far downstream region of the viral gene Spike. As a consequence, almost all of the sub-genomic RNA sequences with the fusion occurring in the region of the Spike gene contain the miR-1207-5p binding site. Although, these sub-genomic RNA sequences do not have the coding potential to yield the S protein, they could still act as miRNA sponges.
To conclude, our results suggest that the miR-1207-5p family may interact with SARS-CoV-2 viral genome leading to deregulation of CSF-1, which may enhance inflammatory responses in COVID-19 patients, and promoting EMT, which can contribute to pulmonary fibrosis, a possible sequela of COVID-19. Further experimental validation will be conducted to confirm molecular mechanisms of host-virus interaction and to investigate their involvement in disease progression.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
Conceptualization, MT, CCi, and CCo. Methodology, MT and CCo. Formal analysis, GB, MT, and CCo. Investigation, CCo. Writing-original draft, CCo. Writing-review and editing, GB, PB, CCi, MT, and CCo. Supervision, MT and CCo. All authors contributed to the article and approved the submitted version.

FUNDING
The present work has been funded by Regione Siciliana, Assessorato delle Attività Produttive, Azione 1.1.5 del PO FESR Sicilia 2014/2020, Project n. 086202000366 -"OBIND", CUP G29J18000700007 to MT and CC and by the National Institutes of Health (NIH) Grants U01HL137159 and R01LM012087 to PVB.

ACKNOWLEDGMENTS
We all wish to thank dr. Ravindra and his collaborators for sharing with us the single-cell RNA data related to manuscript doi.org/10.1101/2020.05.06.081695.