Original Research ARTICLE
Designing a Network Proximity-Based Drug Repurposing Strategy for COVID-19
- 1National Research Council (CNR), Institute for Applied Computing (IAC), Rome, Italy
- 2National Research Council (CNR), Institute of Translational Pharmacology (IFT), Rome, Italy
The ongoing COVID-19 pandemic still requires fast and effective efforts from all fronts, including epidemiology, clinical practice, molecular medicine, and pharmacology. A comprehensive molecular framework of the disease is needed to better understand its pathological mechanisms, and to design successful treatments able to slow down and stop the impressive pace of the outbreak and harsh clinical symptomatology, possibly via the use of readily available, off-the-shelf drugs. This work engages in providing a wider picture of the human molecular landscape of the SARS-CoV-2 infection via a network medicine approach as the ground for a drug repurposing strategy. Grounding on prior knowledge such as experimentally validated host proteins known to be viral interactors, tissue-specific gene expression data, and using network analysis techniques such as network propagation and connectivity significance, the host molecular reaction network to the viral invasion is explored and exploited to infer and prioritize candidate target genes, and finally to propose drugs to be repurposed for the treatment of COVID-19. Ranks of potential target genes have been obtained for coherent groups of tissues/organs, potential and distinct sites of interaction between the virus and the organism. The normalization and the aggregation of the different scores allowed to define a preliminary, restricted list of genes candidates as pharmacological targets for drug repurposing, with the aim of contrasting different phases of the virus infection and viral replication cycle.
The worldwide ongoing COVID-19 pandemic outnumbers 23.9M confirmed cases and a death toll above 819,000 (∼3.4% global case fatality rate), at the time of writing1 (Dong et al., 2020). Worse, in several densely populated countries, especially those in the South of the world, it is still difficult to forecast when a significant slowing down of the pace of the new infections will occur, and if, when and with what intensity a new global wave will arise. The ultimate goal in fighting a pandemic is to completely stop the spread, but slowing it down is also crucial, to mitigate otherwise devastating effects on health and socioeconomic systems on a local and global scale. Thus, it is necessary to interfere by every possible means with the natural, deadly flow of the outbreak, in order to reduce and flatten the epidemic curve and relieve the pressure on hospitals capacity (Qualls et al., 2017; Anderson et al., 2020).
In this perspective, aside all already implemented epidemiological, clinical and immunological measures and efforts, a deployment, via drug repurposing, of the vast, existing and potentially effective pharmacological arsenal is timely and needed, witnessed by the numerous ongoing clinical trials on several off-the shelf drugs (source: DrugBank)2. This work is committed to aid in the fight against the health consequences of the COVID-19 pandemic by providing a data-driven, viable drug repurposing approach.
In this study, we give account of the complexity of the molecular interactions and processes underlying the SARS-CoV-2 host response, and provide an integrated molecular picture to be exploited for a drug repurposing strategy. Such a picture includes the charting of the protein interaction map involving host genes that in the current state of knowledge have been observed to interact with SARS-CoV-2 viral proteins, and/or are considered critical in the host infection processes, also considering previous knowledge related to other relevant Coronaviruses. In the wider context of network medicine (Bauer-Mehren et al., 2011; Silverman and Loscalzo, 2017), the protein-protein interaction (PPI) framework provides a widely assessed and effective heuristic approach for the identification of disease genes (Taylor et al., 2009; Gustafsson et al., 2014; Tieri et al., 2019; Silverman et al., 2020). The complexity of the organism’s response to the viral invasion is mirrored by the wide variability of the clinical symptoms observed in patients, ranging from asymptomatic infections to extremely critical conditions, up to the death of the patient in around 3.4% of cases worldwide (see text footnote 1). With this study we therefore intended to expand the molecular landscape of the host proteins observed to directly interact with viral proteins (Gordon et al., 2020) to include actors who could be neglected when focusing only on the direct interactors set, and that could potentially prove to be important pharmacological targets to engage in order to propose an effective drug repurposing strategy aimed to improve the clinical outcome of the disease.
Materials and Methods
The workflow of our approach has been sketched in Figure 1. Here we briefly describe each step of the method, providing detailed explanations in forthcoming subsections. We started by collecting updated human PPI data (Figure 1A) -from which a network of 18,618 human proteins and 424,076 binary interactions has been built- and SARS-CoV-2/Coronavirus/human PPI data, constituted by a set of 500 human genes potentially involved in the COVID-19 disease (see section “COVID-19 Associated Host Genes, Protein-Protein Interaction Data and Interactomes Reconstruction”). On such data, a network medicine approach has been applied by using connectivity significance and network diffusion algorithms in order to provide a COVID-19 “proximity” or “involvement” gene ranking (Figure 1B, details in section “Connectivity Significance” and “Network Diffusion”).
Figure 1. Scheme of the workflow adopted: starting from available human interactome data and the set of COVID-19 experimentally associated genes (A), a network proximity approach (based on connectivity significance and heat diffusion) has been carried out to select genes that are proximal to the initial set of COVID seed genes (B). Filtering via gene expression in specific tissues and association to the most common COVID-19 symptoms and phenotypes (C) allowed the design of the proposed drug repurposing strategy (D).
The top 1,000 genes in the proximity ranking added to the original 500 Sars-CoV-2 related genes gives the final dataset of 1,500 mostly involved proteins in the COVID-19 disease. In order to further refine the selected list of genes, their gene expression levels in COVID-19-relevant tissues have been investigated (Figure 1C). The human tissues mostly involved in the COVID-19 infection have been identified and divided into five groups (see Figure 2 and section “Gene Expression Data”). The genes that are not expressed in those tissues have been excluded. The remaining genes, for each tissue group, are ranked based on the most common COVID-19 symptoms. The rankings have been provided through VarElect functional filtering, whose details have been discussed in section “Functional Analysis,” and they have been aggregated (see section “Rank Aggregation”), so that a restricted ranked list has been considered. Finally (Figure 1D), the proposed drug repositioning strategy was designed and implemented via dedicated drug-gene interaction information (see section “Design of Drug Repositioning Strategy via Drug-Gene Interaction Data”).
Figure 2. Graphical sketch showing the selected five groups of organs/tissues representative of potential sites of interaction between SARS-CoV-2 and the organism. Group 1: respiratory tract tissues; Group 2: organs of the digestive system; Group 3: blood cells; Group 4: filtering organs; Group 5: brain areas. Group-specific gene expression data have been retrieved by the Human Protein Atlas web portal (www.proteinatlas.org).
COVID-19 Associated Host Genes, Protein-Protein Interaction Data and Interactomes Reconstruction
Protein-protein interaction data for interactome reconstruction have been retrieved from the BioGRID (Oughtred et al., 2019), one of the most comprehensive interaction repositories with freely provided data compiled through manual curation efforts, currently containing more than 1.7 million protein and genetic interactions from major model organism species, including Homo sapiens. The repository provides both the whole human-only interactome, as well as, in the effort to provide valuable data to fight the pandemic, the SARS-CoV-2/human protein interaction dataset, derived from several sources as described on the dedicated BioGRID webpage3. For this study, the latest version available at the time of the analysis of the human interactome, and of COVID-19-associated host genes, i.e., version 3.5.186 (.tab2 and .tab3 format types) have been used. The dataset includes 338 human proteins interacting with SARS-CoV-2 [i.e., the genes identified by the seminal work of Gordon and colleagues (Gordon et al., 2020)], 47 human proteins considered critical for the virus host entry and response, and further 115 proteins experimentally observed to interact with other, SARS-relevant Coronaviruses, finally totaling 500 involved human genes (Supplementary Table S1). The reason for including the last 115 genes is found in the fact that it is known that there is marked similarity and a close relationship between SARS-CoV-2 and SARS-CoVs or SARS-like bat CoVs (Wu A. et al., 2020), similarities that could play a relevant role when comparing the host tropism and transmission features of the SARS-CoV-2 and SARS-CoV and that are thus worthy of investigation. Besides these considerations, and despite the efforts in experimental PPI mapping, it is also known that the number of missing interactions greatly exceeds the number of experimentally detected interactions (Kovács et al., 2019). In this perspective, these further viral-human interactions related to other Coronaviruses provided by BioGRID in the same dataset represent a very significant information from the heuristic point of view, partly due to structural similarities.
The whole human interactome has been gathered from BioGRID data as well (BIOGRID-ORGANISM-3.5.186.tab3.zip), and the largest connected component (LCC) has been extracted to undergo network analysis, consisting of 18,618 genes and 424,076 unique pairwise interactions among them (Supplementary File S1).
The concept of connectivity significance, originally proposed by Ghiassian et al. (2015), has been used to uncover genes associated with a particular path phenotype, via the observation that proteins associated to specific diseases show peculiar patterns of interaction among each other, patterns that in turn help in the identification of neighborhoods not previously associated to the disease. An efficient algorithm (a.k.a. DIAMOnD, DIseAse MOdule Detection) to compute this measure is publicly available (Ghiassian et al., 2015), and it has been used to rank the genes in the interactome showing the highest connectivity significance with respect to the 500 COVID-19-associated seed genes (in Supplementary Table S2 were reported the first 200 ranked genes).
Network diffusion (or network propagation) is a methodology able to identify those genes which are proximal to a starting list of seed genes by using network topology (and optionally other features). In network medicine it can be used to identify genes and genetic modules that underlie human diseases (Mosca et al., 2014; Cowen et al., 2017; Sumathipala et al., 2019) or to identify causal paths linking mutations to expression regulators, or to discover significantly mutated subnetworks in cancer (Vandin et al., 2011; Paull et al., 2013). The methodology exploits the concept of heat diffusion, i.e., how the heat distribution spreads over time in a medium, here consisting of the PPI network, as it flows from nodes where it is higher toward nodes where it is lower according to the diffusion coefficient and their mutual connections. In practice, starting with an arbitrary subset of seed nodes (e.g., genes associated with a disease), a diffusion algorithm is applied to the initial values assigned to the seed nodes that propagate through the network according to its topology. Fixing a stopping time for the diffusion algorithm, the final distribution of the propagated values generates a proximity ranking that can be used to identify a subset of genes that are closely associated to the selected seed genes. The Cytoscape network analysis platform (Shannon et al., 2003), version 3.7, and the Cytoscape-embedded function “Diffuse,” based on a heat diffusion algorithm, have been used for the analysis (Carlin et al., 2017). The diffusion algorithm has been run considering as seed genes the 500 COVID-19-associated human genes with initial heat hs(0) = 1; non-seed genes have been set with initial heat hns(0) = 0. The heat diffusion has been observed at the following times t: 0.002, 0.005, 0.01, 0.02, 0.05 (arbitrary algorithm diffusion time units; Supplementary Table S3), and the quantities of heat in non-seed genes hns(t) have been computed. The appropriate time has been identified by considering, for each time t, the intersection of the most significant genes obtained via the DIAMOnD algorithm and the most relevant genes in the diffusion process, i.e., the ones with highest hns(t) values, and selecting the time showing the largest overlap, that turned out to be t = 0.005. More in detail, we considered the overlap of the top 200 genes obtained via the DIAMOnD algorithm with the top 1000 genes obtained via the heat diffusion algorithm at each stopping time. The combination of the two methods, the heat diffusion that favors genes well-connected to the seed genes or with high degrees, and the DIAMOnD that privileged those genes that are well-connected to the set of the seed genes, generates a proximity ranking of topologically well-connected genes to the COVID-19-associated genes. Moreover, since the overlap in the intersection is about 50%, the number of genes that is surely well-connected to the seed genes is very significant.
Rank aggregation deals with the aggregation of several lists of preferences obtained from different methodologies. It is very useful in all those situations in which preferences can be set according to several features, none of them prevailing on the others. This is actually our case with different lists of best genes associated with the different preferences, none being preferred over the others. Many methods have been proposed in literature to aggregate rank, they are mainly divided into three groups, namely heuristic algorithms, methods based on Markov chains and stochastic optimization methods, see Lin (2010) for a detailed overview. The most suitable method in this particular situation turned out to be a stochastic optimization method. Namely, a new ranking is obtained through an optimization problem whose objective is to minimize the distance between the new ranking and all the others. This approach usually considers two distances, the L1, also known in the rank aggregation literature as Spearman’s distance, and the Kendall distance. The main difference between these two measures is that the first one considers the distances between the different scores of the genes in the different lists of preferences, while the second one takes into account the partial order of the ranking counting the number of pairwise discordance between two lists of preferences. The optimization has been carried out using the L1 distance over the list of preference obtained from the VarElect tool detailed in section “Functional Analysis.”
Gene Expression Data
Human tissue-specific gene expression data have been retrieved by the Human Protein Atlas web portal4 (Uhlén et al., 2015). The Tissue Atlas includes information about the expression profiles of human genes on mRNA and protein level. The protein data covers 15,313 genes (78%) for which there are antibodies available. The mRNA expression data are derived from RNA-seq of 37 different healthy individuals. Genes expressed in 5 organs and tissue groups, representative of potential sites of interaction between SARS-CoV-2 and the organism, were first selected (Table 1 and Figure 2), based on up-to-date information5. Indeed, it is actually recognized that, beside its impact on the respiratory system, SARS-CoV-2 induces multi-organ dysfunctions (Bal et al., 2020; Wu T. et al., 2020) indicating a potential virus-host interaction extended to several organs/systems. Respiratory tract tissues (lungs, tongue, tonsils, and olfactory epithelium) were included in group 1. In group 2, organs and tissues of the digestive system (stomach, esophagus, colon, duodenum, small intestine and rectum) were included. Groups 1 and 2 are therefore representative of the highest probability of virus-host interaction, affecting the epithelial cells (Cong and Ren, 2014). All blood cells were included in group 3. In group 4 the filtering organs and tissues (spleen, liver, lymph nodes, and kidney) were included. Finally, all brain areas for which RNA expression data were available in Protein Atlas (Amygdala, Basal Ganglia, Cerebellum, Cerebral Cortex, Hippocampus, Hypothalamus, Midbrain, Olfactory region, Pons and Medulla, and Thalamus) were included in group 5. The need to include tissues belonging to the nervous system in the analysis derives from the emerging evidence of a specific involvement of the latter in the development of symptoms currently named Neuro-COVID (Ahmad and Rathore, 2020; Helms et al., 2020; Mao et al., 2020). For each group, genes with an expression level <2 (for details about normalized RNA expression data see “Normalization of transcriptomics data” section in the Protein Atlas web portal)6 in all tissues/organs belonging to each group were excluded from the analysis. In each of the groups, the 1,500 mostly involved proteins in the COVID-19 disease were selected according to their expression level and used for the functional analysis through the VarElect tool (Stelzer et al., 2016), see section “Functional Analysis” for details.
Table 1. Groups of tissue/organs matched with disease phenotypes (A) and number of target genes selected as potential candidates for drug repurposing by VarElect aggregate and single group ranks (B).
We took advantage from the VarElect tool, a comprehensive phenotype-dependent gene prioritizer, based on the widely used GeneCards, which helps in identifying causal gene-phenotype associations with extensive evidence (Stelzer et al., 2016). The sets of COVID-host interacting genes, selected for each group of tissue/organs, were matched with disease phenotypes (symptoms or disease manifestations) that were considered peculiar to each group of organs/tissues (Table 1). Accordingly, for group 1 the phenotype query: “fever” OR “cough” OR “pneumonia” OR “dyspnea” OR “pain” OR “hemoptysis” OR “sore throat” OR “chills” OR “inflammation” was used (Supplementary Table S5). Group 2 was analyzed for the phenotype query: “fever” OR “diarrhea” OR “pain” OR “nausea” OR “vomiting” OR “inflammation” (Supplementary Table S6). Group 3 was analyzed for the phenotypes: “fever” OR “chills” OR “inflammation” OR “hemorrhagic” (Supplementary Table S7). The phenotype query for group 4 was: “fever” OR “cough” OR “diarrhea” OR “pain” OR “nausea” OR “vomiting” OR “chills” OR “inflammation” OR “hemorrhagic” (Supplementary Table S8). For group 5 (brain tissues) 2 sets of disease-related phenotypes were used in separate VarElect analyses, accounting for the reported neurological symptoms related to the central nervous system (group 5a, phenotype query: “dizziness” OR “headache” OR “consciousness” OR “encephalopathy” OR “encephalitis” OR “seizures” OR “stroke” OR “delirium,” Supplementary Table S9) or the peripheral nervous system (group 5b, phenotype query: “ageusia” OR “dysgeusia” OR “hypogeusia” OR “anosmia” OR “hyposmia” OR “myalgia” OR “myelitis” OR “pain” OR “Guillain-Barre,” Supplementary Table S10) (Ahmad and Rathore, 2020; Lahiri and Ardila, 2020; Tsivgoulis et al., 2020).
The VarElect scores related to each tissue group have been normalized so that they can be compared and used in a rank aggregation procedure. In particular, we considered two rank aggregation, the first by aggregating groups 1,2, and 4 (G124) and the second by aggregating groups 1,2,3,4, and 5 (G12345).
The VarElect analysis on single and aggregate groups allowed the selection of 260 (arbitrary cutoff, subject to extension in forthcoming analysis) gene targets potential candidates for drug repurposing (complete lists in Supplementary Tables S11–S15, selection of the first 15 genes for each aggregate or single group ranks in Tables 2–6). In particular, 101 genes were selected for the aggregate rank G124 (Supplementary Table S11), 99 genes were selected for the aggregate rank G12345 (Supplementary Table S12), and 20 genes were selected for each of the single analysis performed on group 3 (G3 blood cells, Supplementary Table S13), group 5a (G5a brain, VarElect analysis related to the central nervous system, Supplementary Table S14), and group 5b (G5b brain, VarElect analysis related to the peripheral nervous system, Supplementary Table S15).
Table 2. VarElect aggregated score obtained analyzing groups 1, 2, and 4 (G124; 15 top-ranking genes).
Table 3. VarElect aggregated score obtained analyzing groups 1, 2, 3, 4, and 5 (G12345; 15 top-ranking genes).
Table 5. VarElect aggregated score obtained analyzing group 5 related to the central nervous system (G5a; 15 top-ranking genes).
Table 6. VarElect aggregated score obtained analyzing group 5 related to the peripheral nervous system (G5b; 15 top-ranking genes).
Selected genes from aggregate ranks G124, G12345, and from single ranks G3, G5a, and G5b and subjected to the evaluation about the development of anti-COVID-19 pharmacology based on the repositioning of drugs already on the market (see section “Drug Repurposing Strategy”).
Design of Drug Repositioning Strategy via Drug-Gene Interaction Data
The DrugBank repository7 (Wishart et al., 2018) was manually queried for the selection of drugs on the basis of their possible interference with the direct or indirect virus-host interaction. The criteria applied for selecting a restricted list of gene targets and the corresponding drugs were: (a) the highest place occupied in the aggregate VarElect ranks G124 (Table 2 and Supplementary Table S11) and G12345 (Table 3 and Supplementary Table S12) and in the single ranks G3, G5a, and G5b (Tables 4–6 and Supplementary Tables S13–S15); (b) the main cellular location of the target protein, selected on the basis of the possible virus-host interaction during cell entry (plasma membrane), RNA duplication (cytosol), RNA translation (endoplasmic reticulum), viral protein maturation and virus assembly (Golgi apparatus) and virus secretion (secretory vesicles); (c) the existence of approved drugs as suitable candidates for repositioning.
Selection of Targets for Drug Repurposing
The application of the methodology detailed in section “Materials and Methods” leads to the selection of 260 target genes being potential candidates for drug repurposing. It turned out that out of these 260 genes, only 14 of them were ranked once (CDH1, CHEK2, TOP1, ADRB2, BIRC3, PRKAR1A, IKBKG, NEU1, CHUK, BSG, XPO1, WWOX, LDHA, and HSPA1A), while all of the others were repeated in two or more different ranks, with a total of 130 genes represented over 260 total entries in the pooled ranks. As for the main cellular locations, the majority of virus potential interactors were associated with cell nucleus (51), with less gene products located on plasma membrane and cytosol (15 each), Golgi/endoplasmic reticulum (12), vesicles (11), and mitochondria (7). The molecular function most represented was “enzyme” (46), while 16 activators/transcription factors, 9 membrane-bound receptors, 10 secreted proteins, 27 DNA-binding, 7 RNA-binding, 6 chaperones, 8 repressors were detected. Of note, 37 of the 130 unique gene targets were indicated in the Protein Atlas database as generic Virus-Host interactors, while 8 genes codify for proteins with antiviral activities. Finally, the analysis of “protein class” fields in Supplementary Tables S11–S15, revealed that 65 out of 130 genes were previously identified as non-COVID-19 specific potential drug targets, yet subjected to evaluation or approved by Regulatory Agencies (FDA and EMA).
Drug Repurposing Strategy
Following our extensive, multi-level analysis, we identified high ranking genes that may be potential pharmacological targets, fulfilling the requirements for a fast and safe drug repositioning strategy (Table 7). Most of them are enzymes (kinases, phosphatases, etc., i.e., AKT1, CDK4, LYN, MAPK14, TBK1, CHEK2, ATM, LRRK2, CHUK, SRC, MTOR, and MAPK1) belonging to various downstream signaling pathways, or involved in essential cell physiological processes, such as DNA replication, RNA synthesis and translation (i.e., RANBP2, SMARCA4, FUS, XPO1, DDX58, CAVIN1, and IFIH1), protein processing (i.e., PLAT, CASP8, PSEN1, APP, CASP3, XIAP, SERPING1, and TNFAIP3) energy consumption (i.e., GLA, NEU1) metabolic pathways (i.e., LDHA, WWOX, HMOX1, SDHB, SDHA, HADHA, ACADM, SOD1, GAPDH, PLOD1, and NOS2). Few proteins belongs to the class of secreted factors (i.e., PLAT, FBN1, TGFB1, SPP1, TNF, SERPING1, APOE, C4A, FN1, and TNFRSF1A). Some of the identified targets, based on their cellular compartmentalization, most probably may be activated/repressed in the process of virus entry and replication or viral proteins post-translational processing (i.e., HMOX1, APP, LYN, XIAP, SOD1, HIF1A, EGFR, ICAM1, FAS, CD209, CDH1, SRC, FLNA, DDX58, MAPT, CTNNB1, ERBB2, ADRB2, GAPDH, HSPB1, and CAVIN1), or may interact with virus proteins during the last phase of virus secretion through secretory vesicles (i.e., DNM1L, LDHA, NKX2-1, SLC6A4, CAVIN1, APOE, TNFAIP3, COMT, NEU1, BSG, SCARB1, MTOR, SQSTM1, NTRK1, and SPTAN1). A list of 18 potentially effective pharmacological targets with associated approved drugs, is presented in Table 7. Such genes have been selected prioritizing the existence of an already approved, safe and effective pharmacology. Then, gene candidates that were not considered as directly involved in virus-host protein interactions were discarded, i.e., those located in cell nucleus structures or those involved in essential, redundant and/or non-targetable cell metabolic/physiologic processes. Finally, all potential (and strong) candidates already under clinical investigation as potential drug targets for COVID-19 pandemics (i.e., TNF, highest in more than one aggregate rank in the VarElect analysis) were also excluded. The resulting list encompass plasma membrane receptors (i.e., EGFR, ERRB2, FGFR1, among others), proteins mainly localized in the Golgi and endoplasmic reticulum (CALR, APP, LYN, and COMT), Cytosol (LDHA, MTOR), vesicles (TBK1, COMT, APP, LDHA, and MTOR). Some of the proposed genes are potentially targeted by the same or similar drugs (as evidenced in the “potential alternative targets” fields in Table 6 drugs). Moreover, some of the proposed drugs are potentially effective on pharmacological targets already identified as potential drug targets or under investigation in ongoing clinical trials on COVID-19 patients (i.e., VEGFA, C1QA, C1QB, and C1QC)8. All of the selected genes were relatively high in their aggregated ranks (see Tables 2–6).
Table 7. Summary of drugs potentially relevant for COVID-19 chosen via the data-driven drug repositioning strategy.
In this work, we identified and prioritized a number of target genes involved in different ways in the host SARS-CoV-2 invasion and response via a network proximity-based procedure. Subsets of such target genes were subsequently identified in different organs and systems of the human organism, with the aim of isolating and classifying, in functionally coherent tissue/organ groups (respiratory and digestive epithelia, blood, filter/excretory tissues, and nervous system), the mostly suited target genes for the development of a pharmacology based on the repositioning of drugs already on the market. For each group of tissues, relevant target classifications have been established, on the basis of the potentially associated pathological phenotypes, previously described as characterizing the COVID-19 disease (Adhikari et al., 2020). The highest target genes in the individual tissue ranking were then grouped to reach the selection of 130 unique targets, 90% of which were significant in two or more of the tissues considered. Finally, by analyzing each relevant target, a pharmacological proposal has been defined for 18 target genes and expected to interfere with the virus-host interaction in the various infectious phases and the viral replication cycle.
Computationally based approach has been already considered for drug repurposing: for example Zhou et al. (2020) prioritize sixteen potential repurposable drugs against SARS-CoV-2 using a network proximity analysis. In particular, the authors mapped the drug-target network into a selected COVID-19 host interactome to search for cellular target; (Cheng et al., 2020) proposed a combination of anti-inflammatory and antiviral therapeutics using a network based approach in which proximity measure quantifies the relationship between COVID-19 disease modules and drug targets in the Human PPIs network. Our computationally driven approach revealed that it is possible to hypothesize unequivocal and functional pharmacological interventions to counteract the development of symptoms affecting various organs and systems. This consideration arises from the evidence that some of the pharmacological targets identified (i.e., EGFR, ERBB2, APP, ICAM1, and FAS), may be important to prevent the interaction of the virus with the cell surface in different target organs. However, it is also necessary to conceive pharmacological strategies based on the combination of different drugs, able to counter, by targeting different players of the virus-host interaction, the various stages through which the infection develops at the cellular level (virus entry, replication, viral protein processing, and release of new virus). Finally, the association of therapies interfering with virus-host interaction with strategies aimed at bringing back under control the inflammatory phenomena, with which the body fights the infection and which have often proved fatal (Astuti and Ysrafil, 2020), is deserved.
Computational criteria and methods brought to the definition of COVID-19 proximal target genes. Then, biological criteria lead to select the relevant interactions, potential targets for drug repurposing, associated with different stages of viral infection and the development of the constellation of symptoms already described in COVID-19 patients (Adhikari et al., 2020; Ahmad and Rathore, 2020). Virus-host interactions may stand as physical interactions between viral and human proteins or as indirect interactions based on the triggering, after virus challenge, of the complex network of metabolic processes characterizing eukaryotic cells. In the analysis presented in this work, in addition to the classifications of relevant target genes, their cellular localization was also taken into consideration, with the aim of hypothesizing possible specific interactions for the individual compartments of the cell, in which the viral proteins could relate with human ones. Based on such rationale, plasma membrane-bound proteins have been considered as alternative interactors for virus entry. Cytoplasm-located proteins may conceivably interact with the virus during its replication phase, while endoplasmic reticulum and Golgi proteins could interact with the viral M protein and the viral proteins post-translational processing (Astuti and Ysrafil, 2020). Finally, vesicles-associated interactors have been hypothesized to play a role in the virus secretion.
It is known that the receptor-binding domains on the SARS-CoV-2 S protein bind with high affinity to human ACE2 (Wrapp et al., 2020), an interaction accounting for virus entry in the host cell and for its transmissibility. The analysis of COVID-19 extended interactome indicates several membrane bound gene/proteins (i.e., ICAM1, EGFR, ERBB2, APP, ADR2, FAS, CDH1, and MAPT), whose activity and/or expression could be affected by SARS-CoV-2 challenge. Evidence for alternative interaction of virus S protein with receptors other than ACE2 have been not only already suggested by computational analysis (Milanetti et al., 2020), but also demonstrated in vitro (Ulrich and Pillat, 2020). Furthermore, some of the selected proteins could also account for additional host interactions, not necessarily related with the transmission of disease. RNA-binding proteins present in the cytosol, part of the extended interactome and with a high position in the VarElect ranks (i.e., RANBP2, XPO1, and CDKN2A), could reasonably participate in the replication and translation phases of the viral RNA. Similarly, proteins associated with the endoplasmic reticulum and Golgi membranes (i.e., CALR, COMT, CAV1, and PTCH1) could be involved in the translation processes of the viral RNA and in the subsequent protein processing. Lastly, it is worth mentioning the interactions foreseen by computational analysis with secreted proteins. Among the most important are those with TNF, which plays a central role in the cytokine storm that characterizes the most severe phase of the disease, and which already constitutes a drug target challenged in intensive care units worldwide.
There are actually dozens of drug targets tested for COVID-19 in more than 1200 clinical trials worldwide, as reported in the DrugBank repository (see text footnote 8). Among these, only TNF has been identified by our analysis as being part of the COVID-19 host target genes. Recently, a list of more than 300 possible target genes has been experimentally observed to interact with Sars-CoV-2 proteins and thus considered for the development of anti-COVID, repositioning-based therapies (Gordon et al., 2020), of which only 11 (NEU1, SCARB1, TBK1, COMT, HMOX1, FBN1, GLA, ACADM, DNMT1, PLAT, and TOR1A) are shared with those predicted through the methodology applied in the present work after a data-driven prioritization. In addition, despite the apparent abundance of potential pharmacological targets proposed through data analysis, relatively few of these lend themselves to being used in drug repositioning strategies. The final data of the present work, summarized in the Table 6, indicate that among the potential first 130 targets identified, because at the top positions in the ranks of potential efficacy elaborated through our methodology, only 18 preliminary appear as suitable candidates for drug repositioning. The reasons lie in the lack, for most of the ranked genes, of pharmacologically active drugs already approved by the Regulatory Agencies, or in the impossibility of developing, for many of them participating in essential processes in cellular physiology, a pharmacological approach that modifies their activity, or, finally, in the difficulty of using drugs with a significant impact on physiology or with a high risk of inducing side effects in patients already deeply debilitated by SARS-CoV-2 infection.
Conclusion and Perspectives
The pandemic caused by SARS-CoV-2 represents an open and unresolved challenge for the global health system. The need to identify drugs that demonstrate efficacy in countering both the mechanisms of interaction of SARS-CoV-2 with host cells and to control the devastating inflammatory phenomena that characterize the late stages of viral infection, requires increasingly urgent answers. The biomedical research approach based on the repurposing of already approved drugs seems to be one of the most viable strategies in this struggle. This work, via a data-driven network-based procedure, provides a viable and alternative drug repurposing strategy to be considered for clinical trial. The proposed approach has been conceived to support the comprehension of the molecular landscape of COVID-19 as well as the identification of genes that are not immediately associated to SARS-CoV-2 invasion, or not taken into consideration in respect to the host defense regulation and dynamics, and may thus suggest new directions for further studies and analyses. We leave open the possibility of extending our preliminary analysis by increasing the number of genes present in the currently proposed COVID-19 proximal target genes and/or by extending the selection of potential target genes identified through functional analysis to a greater number than the current one. Under the computational point of view further approaches could be considered, for instance several network topological measures and/or a combination of them could be considered to select COVID-19 proximal candidate target genes and to investigate whether/how changes in the drugs proposal occur.
Data Availability Statement
All datasets presented in this study are included in the article/Supplementary Material.
PT conceived the study. All authors collected the data, ran the analysis, wrote the manuscript, and approved the submitted version.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research was partially supported by the Horizon 2020 H2020-ICT-2018-2 project iPC – individualized Paediatric Cure (grant no. 826121) to PT. We wish to thank the CNR-IAC Digital Biology Unit (https://www.iac.cnr.it/~dbu/), CNR-IAC colleagues, and Dr. R. Mazzucco, MD (AUSL Ferrara) for useful comments, discussions, and encouragement during the lockdown months.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcell.2020.545089/full#supplementary-material
Supplementary File 1 | Cytoscape session file (.cys) with human interactome data.
Supplementary Table 1 | List of the 500 COVID-19 associated host genes derived from the BioGRID dataset (BIOGRID-CORONAVIRUS-3.5.186.tab3.zip).
Supplementary Table 2 | Output of the DIAMOnD algorithm: first 1000 genes ranked for connectivity significance starting from the 500 COVID-19 associated seed genes.
Supplementary Table 3 | Output of the network propagation/heat diffusion algorithm run with 5 different diffusion times starting from the 500 COVID-19 associated seed genes.
Supplementary Table 4 | Aggregated chart of the first 1500 genes ranked for COVID-19 proximity.
Supplementary Tables 5–10 | VarElect scores obtained for genes in the COVID-19 extended interactome expressed in different groups of tissues/organs (reported in each Table) and selected for their association with disease phenotypes (reported in each Table) specifics for each tissue/organ.
Supplementary Tables 11–15 | Aggregated and normalized VarElect scores for the first 100 (G124 and G12345) and 20 (G3, G5a, and G5b) genes in the VarElect analysis depicted in Supplementary Tables 5–10.
- ^ Source: COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU), available at https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6, retrieved August 26th, 2020.
- ^Source: https://www.drugbank.ca/Covid-19, retrieved July 6th, 2020.
- ^ https://wiki.thebiogrid.org/doku.php/covid
- ^ www.proteinatlas.org
- ^ Wadman M. et al., “How does coronavirus kill? Clinicians trace a ferocious rampage through the body, from brain to toes”. doi: 10.1126/science.abc3208
- ^ https://www.proteinatlas.org/about/assays+annotation#rna
- ^ https://www.drugbank.ca/
- ^ https://www.drugbank.ca/covid-19#drug-targets
Adhikari, S. P., Sha, M., Wu, Y.-J., and Mao, Y.-P. (2020). Epidemiology, Causes, Clinical Manifestation and Diagnosis, Prevention and Control of Coronavirus Disease (COVID-19) during the Early Outbreak Period: A Scoping Review. Infect. Dis. Povert. 9:29.
Anderson, R. M., Heesterbeek, H., Klinkenberg, D., and Hollingsworth, T. D. (2020). How Will Country-Based Mitigation Measures Influence the Course of the COVID-19 Epidemic? Lancet 395, 931–934. doi: 10.1016/s0140-6736(20)30567-5
Astuti, I., and Ysrafil. (2020). Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2): An Overview of Viral Structure and Host Response. Diab. Metabol. Syndr. 14, 407–412. doi: 10.1016/j.dsx.2020.04.020
Bal, A., Agrawal, R., Vaideeswar, P., Arava, S., and Jain, A. (2020). COVID-19: An up-to-Date Review - from Morphology to Pathogenesis. Indian J. Pathol. Microbiol. 63, 358–366. doi: 10.4103/ijpm.ijpm_779_20
Bauer-Mehren, A., Bundschus, M., Rautschka, M., Mayer, M. A., Sanz, F., and Furlong, L. I. (2011). Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases. PLoS One 6:e20284. doi: 10.1371/journal.pone.0020284
Cheng, F., Rao, S., and Mehra, R. (2020). COVID-19 Treatment: Combining Anti-Inflammatory and Antiviral Therapeutics Using a Network-Based Approach. Clevel. Clin. J. Med. 2020:32606050. doi: 10.3949/ccjm.87a.ccc037
Ghiassian, S. D., Menche, J., and Barabási, A.-L. (2015). A DIseAse MOdule Detection (DIAMOnD) Algorithm Derived from a Systematic Analysis of Connectivity Patterns of Disease Proteins in the Human Interactome. PLoS Comput. Biol. 11:e1004120. doi: 10.1371/journal.pcbi.1004120
Gordon, D. E., Jang, G. M., Bouhaddou, M., Xu, J., and Obernier, K. (2020). A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug Repurposing. Nature 583(7816), 459–468. doi: 10.1038/s41586-020-22862289
Mao, L., Jin, H., Wang, M., Hu, Y., Chen, S., and He, Q. (2020). Neurologic Manifestations of Hospitalized Patients With Coronavirus Disease 2019 in Wuhan, China. JAMA Neurol. 77(6), 683–690. doi: 10.1001/jamaneurol.2020.1127
Mosca, E., Alfieri, R., and Milanesi, L. (2014). Diffusion of Information throughout the Host Interactome Reveals Gene Expression Variations in Network Proximity to Target Proteins of Hepatitis C Virus. PLoS One 9:e113660. doi: 10.1371/journal.pone.0113660
Paull, E. O., Carlin, D. E., Niepel, M., Sorger, P. K., Haussler, D., and Stuart, J. M. (2013). Discovering Causal Pathways Linking Genomic Events to Transcriptional States Using Tied Diffusion Through Interacting Events (TieDIE). Bioinformatics 29, 2757–2764. doi: 10.1093/bioinformatics/btt471
Qualls, N., Levitt, A., Kanade, N., Wright-Jegede, N., Dopson, S., Biggerstaff, M., et al. (2017). Community Mitigation Guidelines to Prevent Pandemic Influenza — United States, 2017. MMWR. Recommen. Rep. 66(1), 1–34. doi: 10.15585/mmwr.rr6601a1
Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., et al. (2003). Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genom. Res. 13, 2498–2504. doi: 10.1101/gr.1239303
Silverman, E. K., Schmidt, H. W., Anastasiadou, E., Altucci, L., Angelini, M., Badimon, L., et al. (2020). Molecular Networks in Network Medicine: Development and Applications. New Jersey: Wiley, e1489.
Stelzer, G., Plaschkes, I., Oz-Levi, D., Alkelai, A., Olender, T., Zimmerman, S., et al. (2016). VarElect: The Phenotype-Based Variation Prioritizer of the GeneCards Suite. BMC Genomics 17(Suppl. 2):444. doi: 10.1186/s12864-016-2722-2
Sumathipala, M., Maiorino, E., Weiss, S. T., and Sharma, A. (2019). Network Diffusion Approach to Predict LncRNA Disease Associations Using Multi-Type Biological Networks: LION. Front. Physiol. 10:888. doi: 10.3389/fphys.2019.00888
Taylor, I. W., Linding, R., Warde-Farley, D., Liu, Y., Pesquita, C., Faria, D., et al. (2009). Dynamic Modularity in Protein Interaction Networks Predicts Breast Cancer Outcome. Nat. Biotechnol. 27, 199–204. doi: 10.1038/nbt.1522
Tieri, P., Farina, L., Petti, M., Astolfi, L., Paci, P., and Castiglione, F. (2019). Network Inference and Reconstruction in Bioinformatics. Encyclo. Bioinform. Comput. Biol. 2, 805–813. doi: 10.1016/b978-0-12-809633-8.20290-2
Tsivgoulis, G., Palaiodimou, L., Katsanos, A. H., Caso, V., Köhrmann, M., Molina, C., et al. (2020). Neurological Manifestations and Implications of COVID-19 Pandemic. Ther. Adv. Neurol. Dis. 13:1756286420932036.
Ulrich, H., and Pillat, M. M. (2020). CD147 as a Target for COVID-19 Treatment: Suggested Effects of Azithromycin and Stem Cell Engagement. Stem Cell Rev. Rep. 16, 434–440. doi: 10.1007/s12015-020-09976-7
Vandin, F., Upfal, E., and Raphael, B. J. (2011). Algorithms for Detecting Significantly Mutated Pathways in Cancer. J. Comput. Biol. J. Comput. Mole. Cell Biol. 18, 507–522. doi: 10.1089/cmb.2010.0265
Wu, A., Peng, Y., Huang, B., Ding, X., Wang, X., Niu, P., et al. (2020). Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Micr. 27, 325–328. doi: 10.1016/j.chom.2020.02.001
Wu, T., Zuo, Z., Kang, S., Jiang, L., Luo, X., Xia, Z., et al. (2020). Multi-Organ Dysfunction in Patients with COVID-19: A Systematic Review and Meta-Analysis. Aging Dis. 11, 874–894. doi: 10.14336/ad.2020.0520
Keywords: COVID-19, network medicine, drug repurposing, network-based, pharmacologic (drug) therapy
Citation: Stolfi P, Manni L, Soligo M, Vergni D and Tieri P (2020) Designing a Network Proximity-Based Drug Repurposing Strategy for COVID-19. Front. Cell Dev. Biol. 8:545089. doi: 10.3389/fcell.2020.545089
Received: 24 March 2020; Accepted: 07 September 2020;
Published: 06 October 2020.
Edited by:Yasuko Tsunetsugu Yokota, Tokyo University of Technology, Japan
Reviewed by:Noah Lucas Weisleder, The Ohio State University, United States
Teiichiro Shiino, National Institute of Infectious Diseases (NIID), Japan
Copyright © 2020 Stolfi, Manni, Soligo, Vergni and Tieri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Paolo Tieri, firstname.lastname@example.org