Designing a Network Proximity-Based Drug Repurposing Strategy for COVID-19

The ongoing COVID-19 pandemic still requires fast and effective efforts from all fronts, including epidemiology, clinical practice, molecular medicine, and pharmacology. A comprehensive molecular framework of the disease is needed to better understand its pathological mechanisms, and to design successful treatments able to slow down and stop the impressive pace of the outbreak and harsh clinical symptomatology, possibly via the use of readily available, off-the-shelf drugs. This work engages in providing a wider picture of the human molecular landscape of the SARS-CoV-2 infection via a network medicine approach as the ground for a drug repurposing strategy. Grounding on prior knowledge such as experimentally validated host proteins known to be viral interactors, tissue-specific gene expression data, and using network analysis techniques such as network propagation and connectivity significance, the host molecular reaction network to the viral invasion is explored and exploited to infer and prioritize candidate target genes, and finally to propose drugs to be repurposed for the treatment of COVID-19. Ranks of potential target genes have been obtained for coherent groups of tissues/organs, potential and distinct sites of interaction between the virus and the organism. The normalization and the aggregation of the different scores allowed to define a preliminary, restricted list of genes candidates as pharmacological targets for drug repurposing, with the aim of contrasting different phases of the virus infection and viral replication cycle.


INTRODUCTION
The worldwide ongoing COVID-19 pandemic outnumbers 23.9M confirmed cases and a death toll above 819,000 (∼3.4% global case fatality rate), at the time of writing 1 (Dong et al., 2020). Worse, in several densely populated countries, especially those in the South of the world, it is still difficult to forecast when a significant slowing down of the pace of the new infections will occur, and if, when and with what intensity a new global wave will arise. The ultimate goal in fighting a pandemic is to completely stop the spread, but slowing it down is also crucial, to mitigate otherwise devastating effects on health and socioeconomic systems on a local and global scale. Thus, it is necessary to interfere by every possible means with the natural, deadly flow of the outbreak, in order to reduce and flatten the epidemic curve and relieve the pressure on hospitals capacity (Qualls et al., 2017;Anderson et al., 2020).
In this perspective, aside all already implemented epidemiological, clinical and immunological measures and efforts, a deployment, via drug repurposing, of the vast, existing and potentially effective pharmacological arsenal is timely and needed, witnessed by the numerous ongoing clinical trials on several off-the shelf drugs (source: DrugBank) 2 . This work is committed to aid in the fight against the health consequences of the COVID-19 pandemic by providing a data-driven, viable drug repurposing approach.
In this study, we give account of the complexity of the molecular interactions and processes underlying the SARS-CoV-2 host response, and provide an integrated molecular picture to be exploited for a drug repurposing strategy. Such a picture includes the charting of the protein interaction map involving host genes that in the current state of knowledge have been observed to interact with SARS-CoV-2 viral proteins, and/or are considered critical in the host infection processes, 2 Source: https://www.drugbank.ca/Covid-19, retrieved July 6th, 2020.
FIGURE 1 | Scheme of the workflow adopted: starting from available human interactome data and the set of COVID-19 experimentally associated genes (A), a network proximity approach (based on connectivity significance and heat diffusion) has been carried out to select genes that are proximal to the initial set of COVID seed genes (B). Filtering via gene expression in specific tissues and association to the most common COVID-19 symptoms and phenotypes (C) allowed the design of the proposed drug repurposing strategy (D). also considering previous knowledge related to other relevant Coronaviruses. In the wider context of network medicine (Bauer-Mehren et al., 2011;Silverman and Loscalzo, 2017), the proteinprotein interaction (PPI) framework provides a widely assessed and effective heuristic approach for the identification of disease genes (Taylor et al., 2009;Gustafsson et al., 2014;Tieri et al., 2019;Silverman et al., 2020). The complexity of the organism's response to the viral invasion is mirrored by the wide variability of the clinical symptoms observed in patients, ranging from asymptomatic infections to extremely critical conditions, up to the death of the patient in around 3.4% of cases worldwide (see text footnote 1). With this study we therefore intended to expand the molecular landscape of the host proteins observed to directly interact with viral proteins (Gordon et al., 2020) to include actors who could be neglected when focusing only on the direct interactors set, and that could potentially prove to be important pharmacological targets to engage in order to propose an effective drug repurposing strategy aimed to improve the clinical outcome of the disease.

MATERIALS AND METHODS
The workflow of our approach has been sketched in Figure 1.
Here we briefly describe each step of the method, providing detailed explanations in forthcoming subsections. We started by collecting updated human PPI data ( Figure 1A) -from which a network of 18,618 human proteins and 424,076 binary interactions has been built-and SARS-CoV-2/Coronavirus/human PPI data, constituted by a set of 500 human genes potentially involved in the COVID-19 disease (see section "COVID-19 Associated Host Genes, Protein-Protein Interaction Data and Interactomes Reconstruction"). On such data, a network medicine approach has been applied by using connectivity significance and network diffusion algorithms in order to provide a COVID-19 "proximity" or "involvement" gene ranking ( Figure 1B, details in section "Connectivity Significance" and "Network Diffusion").
The top 1,000 genes in the proximity ranking added to the original 500 Sars-CoV-2 related genes gives the final dataset of 1,500 mostly involved proteins in the COVID-19 disease. In order to further refine the selected list of genes, their gene expression levels in COVID-19-relevant tissues have been investigated ( Figure 1C). The human tissues mostly involved in the COVID-19 infection have been identified and divided into five groups (see Figure 2 and section "Gene Expression Data"). The genes that are not expressed in those tissues have been excluded. The remaining genes, for each tissue group, are ranked based on the most common COVID-19 symptoms. The rankings have been provided through VarElect functional filtering, whose details have been discussed in section "Functional Analysis, " and they have been aggregated (see section "Rank Aggregation"), so that a restricted ranked list has been considered. Finally (Figure 1D), the proposed drug repositioning strategy was designed and implemented via dedicated drug-gene interaction information (see section "Design of Drug Repositioning Strategy via Drug-Gene Interaction Data"). FIGURE 2 | Graphical sketch showing the selected five groups of organs/tissues representative of potential sites of interaction between SARS-CoV-2 and the organism. Group 1: respiratory tract tissues; Group 2: organs of the digestive system; Group 3: blood cells; Group 4: filtering organs; Group 5: brain areas. Group-specific gene expression data have been retrieved by the Human Protein Atlas web portal (www.proteinatlas.org).

COVID-19 Associated Host Genes, Protein-Protein Interaction Data and Interactomes Reconstruction
Protein-protein interaction data for interactome reconstruction have been retrieved from the BioGRID (Oughtred et al., 2019), one of the most comprehensive interaction repositories with freely provided data compiled through manual curation efforts, currently containing more than 1.7 million protein and genetic interactions from major model organism species, including Homo sapiens. The repository provides both the whole human-only interactome, as well as, in the effort to provide valuable data to fight the pandemic, the SARS-CoV-2/human protein interaction dataset, derived from several sources as described on the dedicated BioGRID webpage 3 . For this study, the latest version available at the time of the analysis of the human interactome, and of COVID-19-associated host genes, i.e., version 3.5.186 (.tab2 and .tab3 format types) have been used. The dataset includes 338 human proteins interacting with SARS-CoV-2 [i.e., the genes identified by the seminal work of Gordon and colleagues (Gordon et al., 2020)], 47 human proteins considered critical for the virus host entry and response, and further 115 proteins experimentally observed to interact with other, SARS-relevant Coronaviruses, finally totaling 500 involved human genes (Supplementary Table S1). The reason for including the last 115 genes is found in the fact that it is known that there is marked similarity and a close relationship between SARS-CoV-2 and SARS-CoVs or SARS-like bat CoVs , similarities that could play a relevant role when comparing the host tropism and transmission features of the SARS-CoV-2 and SARS-CoV and that are thus worthy of investigation. Besides these considerations, and despite the efforts in experimental PPI mapping, it is also known that the number of missing interactions greatly exceeds the number of experimentally detected interactions (Kovács et al., 2019). In this perspective, these further viralhuman interactions related to other Coronaviruses provided by BioGRID in the same dataset represent a very significant information from the heuristic point of view, partly due to structural similarities.
The whole human interactome has been gathered from BioGRID data as well , and the largest connected component (LCC) has been extracted to undergo network analysis, consisting of 18,618 genes and 424,076 unique pairwise interactions among them (Supplementary File S1).

Network Diffusion
Network diffusion (or network propagation) is a methodology able to identify those genes which are proximal to a starting list of seed genes by using network topology (and optionally other features). In network medicine it can be used to identify genes and genetic modules that underlie human diseases (Mosca et al., 2014;Cowen et al., 2017;Sumathipala et al., 2019) or to identify causal paths linking mutations to expression regulators, or to discover significantly mutated subnetworks in cancer (Vandin et al., 2011;Paull et al., 2013). The methodology exploits the concept of heat diffusion, i.e., how the heat distribution spreads over time in a medium, here consisting of the PPI network, as it flows from nodes where it is higher toward nodes where it is lower according to the diffusion coefficient and their mutual connections. In practice, starting with an arbitrary subset of seed nodes (e.g., genes associated with a  disease), a diffusion algorithm is applied to the initial values assigned to the seed nodes that propagate through the network according to its topology. Fixing a stopping time for the diffusion algorithm, the final distribution of the propagated values generates a proximity ranking that can be used to identify a subset of genes that are closely associated to the selected seed genes. The Cytoscape network analysis platform (Shannon et al., 2003), version 3.7, and the Cytoscape-embedded function "Diffuse, " based on a heat diffusion algorithm, have been used for the analysis (Carlin et al., 2017). The diffusion algorithm has been run considering as seed genes the 500 COVID-19-associated human genes with initial heat h s (0) = 1; non-seed genes have been set with initial heat h ns (0) = 0.
The heat diffusion has been observed at the following times t: 0.002, 0.005, 0.01, 0.02, 0.05 (arbitrary algorithm diffusion time units; Supplementary Table S3), and the quantities of heat in non-seed genes h ns (t) have been computed. The appropriate time has been identified by considering, for each time t, the intersection of the most significant genes obtained via the DIAMOnD algorithm and the most relevant genes in the diffusion process, i.e., the ones with highest h ns (t) values, and selecting the time showing the largest overlap, that turned out to be t = 0.005. More in detail, we considered the overlap of the top 200 genes obtained via the DIAMOnD algorithm with the top 1000 genes obtained via the heat diffusion algorithm at each stopping time. The combination of the two methods, the heat diffusion that favors genes well-connected to the seed genes or with high degrees, and the DIAMOnD that privileged those genes that are well-connected to the set of the seed genes, generates a proximity ranking of topologically wellconnected genes to the COVID-19-associated genes. Moreover, since the overlap in the intersection is about 50%, the number of genes that is surely well-connected to the seed genes is very significant.

Rank Aggregation
Rank aggregation deals with the aggregation of several lists of preferences obtained from different methodologies. It is very  useful in all those situations in which preferences can be set according to several features, none of them prevailing on the others. This is actually our case with different lists of best genes associated with the different preferences, none being preferred over the others. Many methods have been proposed in literature to aggregate rank, they are mainly divided into three groups, namely heuristic algorithms, methods based on Markov chains and stochastic optimization methods, see Lin (2010) for a detailed overview. The most suitable method in this particular situation turned out to be a stochastic optimization method. Namely, a new ranking is obtained through an optimization problem whose objective is to minimize the distance between the new ranking and all the others. This approach usually considers two distances, the L1, also known in the rank aggregation literature as Spearman's distance, and the Kendall distance. The main difference between these two measures is that the first one considers the distances between the different scores of the genes in the different lists of preferences, while the second one takes into account the partial order of the ranking counting the number of pairwise discordance between two lists of preferences. The optimization has been carried out using the L1 distance over the list of preference obtained from the VarElect tool detailed in section "Functional Analysis."

Gene Expression Data
Human tissue-specific gene expression data have been retrieved by the Human Protein Atlas web portal 4 (Uhlén et al., 2015). The Tissue Atlas includes information about the expression profiles of human genes on mRNA and protein level. The protein data covers 15,313 genes (78%) for which there are antibodies available. The mRNA expression data are derived from Frontiers in Cell and Developmental Biology | www.frontiersin.org Cytosol RNA-seq of 37 different healthy individuals. Genes expressed in 5 organs and tissue groups, representative of potential sites of interaction between SARS-CoV-2 and the organism, were first selected (Table 1 and Figure 2), based on up-to-date information 5 . Indeed, it is actually recognized that, beside its impact on the respiratory system, SARS-CoV-2 induces multiorgan dysfunctions (Bal et al., 2020;Wu T. et al., 2020) indicating a potential virus-host interaction extended to several organs/systems. Respiratory tract tissues (lungs, tongue, tonsils, and olfactory epithelium) were included in group 1. In group 2, organs and tissues of the digestive system (stomach, esophagus, colon, duodenum, small intestine and rectum) were included. Groups 1 and 2 are therefore representative of the highest probability of virus-host interaction, affecting the epithelial cells (Cong and Ren, 2014). All blood cells were included in group 3. In group 4 the filtering organs and tissues (spleen, liver, lymph nodes, and kidney) were included. Finally, all brain areas for which RNA expression data were available in Protein Atlas (Amygdala, Basal Ganglia, Cerebellum, Cerebral Cortex, Hippocampus, Hypothalamus, Midbrain, Olfactory region, Pons and Medulla, and Thalamus) were included in group 5. The need to include tissues belonging to the nervous system in the analysis derives from the emerging evidence of a specific involvement of the latter in the development of symptoms currently named Neuro-COVID (Ahmad and Rathore, 2020; Helms et al., 2020;Mao et al., 2020). For each group, genes with an expression level <2 (for details about normalized RNA expression data see "Normalization of transcriptomics data" section in the Protein Atlas web portal) 6 in all tissues/organs belonging to each group were excluded from the analysis. In each of the groups, the 1,500 mostly involved proteins in the COVID-19 disease were selected according to their expression level and used for the functional analysis through the VarElect tool (Stelzer et al., 2016), see section "Functional Analysis" for details.

Functional Analysis
We took advantage from the VarElect tool, a comprehensive phenotype-dependent gene prioritizer, based on the widely used GeneCards, which helps in identifying causal gene-phenotype associations with extensive evidence (Stelzer et al., 2016). The sets of COVID-host interacting genes, selected for each group of tissue/organs, were matched with disease phenotypes  (symptoms or disease manifestations) that were considered peculiar to each group of organs/tissues (Table 1). Accordingly, for group 1 the phenotype query: "fever" OR "cough" OR "pneumonia" OR "dyspnea" OR "pain" OR "hemoptysis " OR "sore throat" OR "chills" OR "inflammation" was used (Supplementary Table S5). Group 2 was analyzed for the phenotype query: "fever" OR "diarrhea" OR "pain" OR "nausea" OR "vomiting" OR "inflammation" (Supplementary Table S6).
The VarElect analysis on single and aggregate groups allowed the selection of 260 (arbitrary cutoff, subject to extension in forthcoming analysis) gene targets potential candidates for drug repurposing (complete lists in Supplementary Tables S11-S15, selection of the first 15 genes for each aggregate or single group ranks in Tables 2-6). In particular, 101 genes were selected for the aggregate rank G124 (Supplementary Table S11), 99 genes were selected for the aggregate rank G12345 (Supplementary Table S12), and 20 genes were selected for each of the single analysis performed on group 3 (G3 blood cells, Supplementary Table S13), group 5a (G5a brain, VarElect analysis related to the central nervous system, Supplementary Table S14), and group 5b (G5b brain, VarElect analysis related to the peripheral nervous system, Supplementary Table S15).
Selected genes from aggregate ranks G124, G12345, and from single ranks G3, G5a, and G5b and subjected to the evaluation about the development of anti-COVID-19 pharmacology based on the repositioning of drugs already on the market (see section "Drug Repurposing Strategy").

Design of Drug Repositioning Strategy via Drug-Gene Interaction Data
The DrugBank repository 7 (Wishart et al., 2018) was manually queried for the selection of drugs on the basis of their possible interference with the direct or indirect virus-host interaction. The criteria applied for selecting a restricted list of gene targets and the corresponding drugs were: (a) the highest place occupied in the aggregate VarElect ranks G124 ( Table 2 and Supplementary Table S11) and G12345 (Table 3 and  Supplementary Table S12) and in the single ranks G3, G5a, and G5b (Tables 4-6 and Supplementary Tables S13-S15); (b) the main cellular location of the target protein, selected on the basis of the possible virus-host interaction during cell entry (plasma membrane), RNA duplication (cytosol), RNA translation (endoplasmic reticulum), viral protein maturation and virus assembly (Golgi apparatus) and virus secretion Frontiers in Cell and Developmental Biology | www.frontiersin.org

Selection of Targets for Drug Repurposing
The application of the methodology detailed in section "Materials and Methods" leads to the selection of 260 target genes being potential candidates for drug repurposing. It turned out that out of these 260 genes, only 14 of them were ranked once (CDH1, CHEK2, TOP1, ADRB2, BIRC3, PRKAR1A, IKBKG, NEU1, CHUK, BSG, XPO1, WWOX, LDHA, and HSPA1A), while all of the others were repeated in two or more different ranks, with a total of 130 genes represented over 260 total entries in the pooled ranks. As for the main cellular locations, the majority of virus potential interactors were associated with cell nucleus (51), with less gene products located on plasma membrane and cytosol (15 each), Golgi/endoplasmic reticulum (12), vesicles (11), and mitochondria (7). The molecular function most represented was "enzyme" (46), while 16 activators/transcription factors, 9 membrane-bound receptors, 10 secreted proteins, 27 DNA-binding, 7 RNA-binding, 6 chaperones, 8 repressors were detected. Of note, 37 of the 130 unique gene targets were indicated in the Protein Atlas database as generic Virus-Host interactors, while 8 genes codify for proteins with antiviral activities. Finally, the analysis of "protein class" fields in Supplementary Tables S11-S15, revealed that 65 out of 130 genes were previously identified as non-COVID-19 specific potential drug targets, yet subjected to evaluation or approved by Regulatory Agencies (FDA and EMA).

Drug Repurposing Strategy
Following our extensive, multi-level analysis, we identified high ranking genes that may be potential pharmacological targets, fulfilling the requirements for a fast and safe drug repositioning strategy ( Table 7) Table 7. Such genes have been selected prioritizing the existence of an already approved, safe and effective pharmacology. Then, gene candidates that were not considered as directly involved in virus-host protein interactions were discarded, i.e., those located in cell nucleus structures or those involved in essential, redundant and/or non-targetable cell metabolic/physiologic processes. Finally, all potential (and strong) candidates already under clinical investigation as potential drug targets for COVID-19 pandemics (i.e., TNF, highest in more than one aggregate rank in the VarElect analysis) were also excluded. The resulting list encompass plasma membrane receptors (i.e., EGFR, ERRB2, FGFR1, among others), proteins mainly localized in the Golgi and endoplasmic reticulum (CALR, APP, LYN, and COMT), Cytosol (LDHA, MTOR), vesicles (TBK1, COMT, APP, LDHA, and MTOR). Some of the proposed genes are potentially targeted by the same or similar drugs (as evidenced in the "potential alternative targets" fields in Table 6 drugs). Moreover, some of the proposed drugs are potentially effective on pharmacological targets already identified as potential drug targets or under investigation in ongoing clinical trials on COVID-19 patients (i.e., VEGFA, C1QA, C1QB, and C1QC) 8 . All of the selected genes were relatively high in their aggregated ranks (see Tables 2-6).

DISCUSSION
In this work, we identified and prioritized a number of target genes involved in different ways in the host SARS-CoV-2 invasion and response via a network proximity-based procedure. Subsets of such target genes were subsequently identified in different organs and systems of the human organism, with the aim of isolating and classifying, in functionally coherent tissue/organ groups (respiratory and digestive epithelia, blood, filter/excretory tissues, and nervous system), the mostly suited target genes for the development of a pharmacology based on the repositioning of drugs already on the market. For each group of tissues, relevant target classifications have been established, on the basis of the potentially associated pathological phenotypes, previously described as characterizing the COVID-19 disease (Adhikari et al., 2020). The highest target genes in the individual tissue ranking were then grouped to reach the selection of 130 unique targets, 90% of which were significant in two or more of the tissues considered. Finally, by analyzing each relevant target, a pharmacological proposal has been defined for 18 target genes and expected to interfere with the virus-host interaction in the various infectious phases and the viral replication cycle.
Computationally based approach has been already considered for drug repurposing: for example Zhou et al. (2020) prioritize sixteen potential repurposable drugs against SARS-CoV-2 using a network proximity analysis. In particular, the authors mapped the drug-target network into a selected COVID-19 host interactome to search for cellular target;  proposed a combination of anti-inflammatory and antiviral therapeutics using a network based approach in which proximity measure quantifies the relationship between COVID-19 disease modules and drug targets in the Human PPIs network. Our computationally driven approach revealed that it is possible to hypothesize unequivocal and functional pharmacological interventions to counteract the development of symptoms affecting various organs and systems. This consideration arises from the evidence that some of the pharmacological targets identified (i.e., EGFR, ERBB2, APP, ICAM1, and FAS), may be important to prevent the interaction of the virus with the cell surface in different target organs. However, it is also necessary to conceive pharmacological strategies based on the combination of different drugs, able to counter, by targeting different players of the virus-host interaction, the various stages through which the infection develops at the cellular level (virus entry, replication, viral protein processing, and release of new virus). Finally, the 8 https://www.drugbank.ca/covid-19#drug-targets association of therapies interfering with virus-host interaction with strategies aimed at bringing back under control the inflammatory phenomena, with which the body fights the infection and which have often proved fatal (Astuti and Ysrafil, 2020), is deserved.
Computational criteria and methods brought to the definition of COVID-19 proximal target genes. Then, biological criteria lead to select the relevant interactions, potential targets for drug repurposing, associated with different stages of viral infection and the development of the constellation of symptoms already described in COVID-19 patients (Adhikari et al., 2020;Ahmad and Rathore, 2020). Virus-host interactions may stand as physical interactions between viral and human proteins or as indirect interactions based on the triggering, after virus challenge, of the complex network of metabolic processes characterizing eukaryotic cells. In the analysis presented in this work, in addition to the classifications of relevant target genes, their cellular localization was also taken into consideration, with the aim of hypothesizing possible specific interactions for the individual compartments of the cell, in which the viral proteins could relate with human ones. Based on such rationale, plasma membrane-bound proteins have been considered as alternative interactors for virus entry. Cytoplasm-located proteins may conceivably interact with the virus during its replication phase, while endoplasmic reticulum and Golgi proteins could interact with the viral M protein and the viral proteins post-translational processing (Astuti and Ysrafil, 2020). Finally, vesicles-associated interactors have been hypothesized to play a role in the virus secretion.
It is known that the receptor-binding domains on the SARS-CoV-2 S protein bind with high affinity to human ACE2 (Wrapp et al., 2020), an interaction accounting for virus entry in the host cell and for its transmissibility. The analysis of COVID-19 extended interactome indicates several membrane bound gene/proteins (i.e., ICAM1, EGFR, ERBB2, APP, ADR2, FAS, CDH1, and MAPT), whose activity and/or expression could be affected by SARS-CoV-2 challenge. Evidence for alternative interaction of virus S protein with receptors other than ACE2 have been not only already suggested by computational analysis (Milanetti et al., 2020), but also demonstrated in vitro (Ulrich and Pillat, 2020). Furthermore, some of the selected proteins could also account for additional host interactions, not necessarily related with the transmission of disease. RNAbinding proteins present in the cytosol, part of the extended interactome and with a high position in the VarElect ranks (i.e., RANBP2, XPO1, and CDKN2A), could reasonably participate in the replication and translation phases of the viral RNA. Similarly, proteins associated with the endoplasmic reticulum and Golgi membranes (i.e., CALR, COMT, CAV1, and PTCH1) could be involved in the translation processes of the viral RNA and in the subsequent protein processing. Lastly, it is worth mentioning the interactions foreseen by computational analysis with secreted proteins. Among the most important are those with TNF, which plays a central role in the cytokine storm that characterizes the most severe phase of the disease, and which already constitutes a drug target challenged in intensive care units worldwide.
There are actually dozens of drug targets tested for COVID-19 in more than 1200 clinical trials worldwide, as reported in the DrugBank repository (see text footnote 8). Among these, only TNF has been identified by our analysis as being part of the COVID-19 host target genes. Recently, a list of more than 300 possible target genes has been experimentally observed to interact with Sars-CoV-2 proteins and thus considered for the development of anti-COVID, repositioning-based therapies (Gordon et al., 2020), of which only 11 (NEU1, SCARB1, TBK1, COMT, HMOX1, FBN1, GLA, ACADM, DNMT1, PLAT, and TOR1A) are shared with those predicted through the methodology applied in the present work after a data-driven prioritization. In addition, despite the apparent abundance of potential pharmacological targets proposed through data analysis, relatively few of these lend themselves to being used in drug repositioning strategies. The final data of the present work, summarized in the Table 6, indicate that among the potential first 130 targets identified, because at the top positions in the ranks of potential efficacy elaborated through our methodology, only 18 preliminary appear as suitable candidates for drug repositioning. The reasons lie in the lack, for most of the ranked genes, of pharmacologically active drugs already approved by the Regulatory Agencies, or in the impossibility of developing, for many of them participating in essential processes in cellular physiology, a pharmacological approach that modifies their activity, or, finally, in the difficulty of using drugs with a significant impact on physiology or with a high risk of inducing side effects in patients already deeply debilitated by SARS-CoV-2 infection.

CONCLUSION AND PERSPECTIVES
The pandemic caused by SARS-CoV-2 represents an open and unresolved challenge for the global health system. The need to identify drugs that demonstrate efficacy in countering both the mechanisms of interaction of SARS-CoV-2 with host cells and to control the devastating inflammatory phenomena that characterize the late stages of viral infection, requires increasingly urgent answers. The biomedical research approach based on the repurposing of already approved drugs seems to be one of the most viable strategies in this struggle. This work, via a data-driven network-based procedure, provides a viable and alternative drug repurposing strategy to be considered for clinical trial. The proposed approach has been conceived to support the comprehension of the molecular landscape of COVID-19 as well as the identification of genes that are not immediately associated to SARS-CoV-2 invasion, or not taken into consideration in respect to the host defense regulation and dynamics, and may thus suggest new directions for further studies and analyses. We leave open the possibility of extending our preliminary analysis by increasing the number of genes present in the currently proposed COVID-19 proximal target genes and/or by extending the selection of potential target genes identified through functional analysis to a greater number than the current one. Under the computational point of view further approaches could be considered, for instance several network topological measures and/or a combination of them could be considered to select COVID-19 proximal candidate target genes and to investigate whether/how changes in the drugs proposal occur.

DATA AVAILABILITY STATEMENT
All datasets presented in this study are included in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
PT conceived the study. All authors collected the data, ran the analysis, wrote the manuscript, and approved the submitted version.
Supplementary Table 2 | Output of the DIAMOnD algorithm: first 1000 genes ranked for connectivity significance starting from the 500 COVID-19 associated seed genes.
Supplementary Table 3 | Output of the network propagation/heat diffusion algorithm run with 5 different diffusion times starting from the 500 COVID-19 associated seed genes.
Supplementary Table 4 | Aggregated chart of the first 1500 genes ranked for COVID-19 proximity.
Supplementary Tables 5-10 | VarElect scores obtained for genes in the COVID-19 extended interactome expressed in different groups of tissues/organs (reported in each Table) and selected for their association with disease phenotypes (reported in each Table) specifics for each tissue/organ.