A Path-Based Analysis of Infected Cell Line and COVID-19 Patient Transcriptome Reveals Novel Potential Targets and Drugs Against SARS-CoV-2

Most transcriptomic studies of SARS-CoV-2 infection have focused on differentially expressed genes, which do not necessarily reveal the genes mediating the transcriptomic changes. In contrast, exploiting curated biological network, our PathExt tool identifies central genes from the differentially active paths mediating global transcriptomic response. Here we apply PathExt to multiple cell line infection models of SARS-CoV-2 and other viruses, as well as to COVID-19 patient-derived PBMCs. The central genes mediating SARS-CoV-2 response in cell lines were uniquely enriched for ATP metabolic process, G1/S transition, leukocyte activation and migration. In contrast, PBMC response reveals dysregulated cell-cycle processes. In PBMC, the most frequently central genes are associated with COVID-19 severity. Importantly, relative to differential genes, PathExt-identified genes show greater concordance with several benchmark anti-COVID-19 target gene sets. We propose six novel anti-SARS-CoV-2 targets ADCY2, ADSL, OCRL, TIAM1, PBK, and BUB1, and potential drugs targeting these genes, such as Bemcentinib, Phthalocyanine, and Conivaptan.


INTRODUCTION
COVID-19, a serious respiratory disease caused by the zoonotic virus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2, or SC2 for short), has emerged as a global pandemic leading to~315 million infections and 5 million deaths (data till date January 10 th , 2022, as per WHO dashboard). Despite a huge body of research investigating the SC2 biology (1), host-virus mechanisms (2), potential drug targets and their inhibitors (3), we are far from a complete understanding of mechanisms underlying the varied COVID-19 symptoms, and the search for an effective therapy continues. As there are limited treatment options available for SC2, several drugs which are prescribed to treat infections by other viruses such as Middle East respiratory syndrome (MERS), severe acute respiratory syndrome (SARS), human immunodeficiency virus (HIV), etc.
have been tried on COVID-19 patients (4). Remdesivir, a viral RNA dependent RNA polymerase inhibitor used in the treatment of Ebola virus, was the first FDA-approved drug for the treatment of COVID-19 (5); however, it has not been broadly effective. More recently approved Paxlovid was found to be effective against multiple SC2 variants including Omicron (6). While several effective vaccines are now in general use, there continues to be an urgent need for more effective therapeutic options for infected patients, especially considering the highly varied, and sometimes long-term effects as well as side effects of the current therapies.
Viruses hijack the host system for their own survival and proliferation, especially by exploiting and manipulating host transcriptional machinery and gene regulation (7). A better understanding of the host transcriptomic response to the viral infection is thus widely recognized as critical in designing better therapeutics strategies (4). Most previous studies investigating infection-induced transcriptomic changes in the host tissues and immune cells focus on genes that are differentially expressed (DEGs) upon infection and perform a series of downstream analyses to decipher the underlying mechanisms (8). One issue with the DEG-centric approach is that certain genes are known to be differentially expressed in a wide variety of contexts and represent generic transcriptional response and are not specific to SC2 infection (9). Moreover, it is now well recognized that differential expression ought to be interpreted in the context of genetic networks and pathways, and indeed some of the works have investigated SC2 transcriptomic response data from a network perspective (10). These approaches nevertheless rely on gene-level differential expression as the lynchpin for the downstream networkassisted analyses. An alternative approach -PathExt -that we have recently shown to be superior to DEG-centric approaches (11), instead integrates transcriptomic data with curated gene networks and instead of identifying differentially expressed genes, identifies differentially active paths in the integrated network, and then identifies the central genes mediating the differential activities of the most perturbed paths. This alternative approach is based on the recognition that (i) gene expression is noisy and DEGs can therefore lead to false positives, and (ii) key regulatory genes that mediate global transcriptomic changes and thus present a potent target may themselves not be differentially regulated and will thus be missed by DEG-centric approaches.
Here, using PathExt, we comprehensively analyze and compare transcriptomic response to SC2 and other respiratory viruses (SARS-CoV-1, MERS, Influenza, RSV, and HPIV3) in multiple cell lines (A549, A549-ACE2, Calu3, Vero, MRC5 and NHBE), as well as in COVID-19 patient-derived peripheral blood mononuclear cells (PBMCs). While PathExt identifies largely distinct sets of central genes across cell lines and viruses, these genes nevertheless converge on common processes such as cytokine signaling, cell cycle, metabolism, etc.; however, we observe a much greater similarity in response across cell lines for the same virus than across viruses in the same cell line. We assess the complementarity and unique advantages of using PathExt compared to the conventional DEG-based approach and find that PathExt genes capture experimentally identified anti-SC2 targets more accurately than DEGs, while also providing an overall greater enrichment of key biological processes. In PBMC data, we find that PathExt-identified central genes are associated with patient severity. Finally, we propose novel anti-SC2 therapeutic targets and their potential inhibitors that are either FDA-approved or currently in clinical trials.

Overview of the Approach
PathExt is our recently published tool; here we provide a brief intuitive overview of the approach. The aim of PathExt is to identify differentially active paths, in a prior knowledge-based gene network, while comparing transcriptomic data in two conditions, and shortlist genes among those paths which might be critically mediating the observed phenotypic change. As noted above, in contrast to conventional DEG-centered approaches, such mediating genes may not be differentially expressed themselves. The PathExt workflow is illustrated in Figure 1A. PathExt starts by integrating a knowledge-based curated gene network with samplespecific omics data from the conditions of interest; we employed a curated network -HPPIN -which integrates physical, regulatory, and metabolic interactions between genes or proteins (12). In each sample separately, nodes and edges of the network are weighted such that interactions involving differentially expressed genes are preferentially traversed by a shortest path algorithm. This integration is done in two ways, to emphasize either differential activation or repression. Shortest paths whose weights are statistically significant (based on permutation) can then be interpreted as statistically significant differentially active (or repressed) paths. Such paths are then cast as a sub-network referred to as the TopNet, either activated or repressed, based on the weighting scheme. PathExt then identifies central genes in each TopNet based on ripple centrality (13), which captures genes that can reach a large part of the TopNet along highly active (or repressed) paths. TopNet and the central genes are identified independently for each transcriptomic sample.
Here, we apply PathExt to analyze how the impact of SC2 infection (as well as other respiratory viruses) can vary in different cell lines, as well as in patient derived PBMCs ( Figure 1B). We first compare SC2 infection across cell lines, and with other viral infections in the same cell lines. In PBMC data, we identify associations between PathExt results and patient demographics. We then show a high concordance between the results from both cell line and PBMC data, and benchmarks such as genes previously shown to be affected by SC2 infection as well as experimentally screened drugs. Based on this, we use the output of PathExt to propose novel drug targets as well as drugs against them.
(A549-ACE2), and Calu3, Vero cell line derived from African monkey kidney, and primary bronchial epithelium cell line -NHBE (14,15). In each case, we applied PathExt to identify the top 100 central genes each in the activated and repressed TopNets (Supplementary Table S1); no repressed TopNet was detected in NHBE. We found that the top 100 genes were largely disjoint across cell lines (Figures 2A, B) with one exception, where the top 100 genes from the activated TopNets in Calu3 and Vero cell lines shared 79 common genes; indicating a high cell-type specificity in response to SC2 infection. A list of the top 100 upregulated and downregulated DEGs is provided in the Supplementary Table S2.
We then performed functional enrichment on the top 100 genes identified from each dataset using PANTHER (16), and consolidated (Methods) the enriched biological process terms using REVIGO (17). In case of activated TopNets ( Figure 2C), as expected, Calu3 and Vero cells show similar upregulated pathways such as I-kappaB kinase/NF-kappaB signaling, B A FIGURE 1 | Study Workflow. (A) Our PathExt tool accepts as input a curated gene network and gene expression data, to output two weighted sub-networks -an activated and a repressed TopNet comprising activated and repressed paths respectively, and a list of central genes in each TopNet. Activated genes are shown here in shades of red, and repressed genes are in shades of blue. PathExt integrates the inputs such that edges connecting genes with substantial change in expression are preferentially traversed by a shortest paths algorithm (Methods), illustrated here using wider arrows. Shortest paths which are statistically significant (permutation based) now represent differentially active (or repressed) paths and make up the TopNets in which PathExt identifies central genes based on ripple centrality. (B) We apply PathExt to analyze RNa-seq data from SARS-CoV-2 infection in both cell lines and patient PBMC data. PathExt outputs from the cell line data are used to compare cross-cell-line variation in SC2 infection response, and within-cell-line variation in response to other viruses. In patient PBMC data, we identify associations between PathExt results and demographics. We then validate all the results against benchmarks and use the TopNets and central genes to propose novel drug targets, as well as novel drugs for known targets. regulation of transcription factor (TF) activity, cytokine mediated signaling pathway, and response to stimulus. These processes are well supported by previous studies (18)(19)(20). Regulation of immune signaling pathways is another process prominently observed in COVID-19, especially production of various cytokines and chemokines (21,22). Likewise, hyperactivation of NF-kappaB signaling pathway post SARS-CoV-2 infection has been observed (23) and NF-kappaB has been recognized as a potential pharmacological target to treat COVID-19 (24). Several host TF binding sites (TFBS) are present in SARS-CoV-2 genome which the virus exploits for its replication (25). Calu3, Vero, and A549 were enriched for inflammatory response, specifically, tyrosine kinase signaling, recapitulating previously established links between SARS-CoV-2 and tyrosine kinase signaling (26). In A549 cells with high ACE2 expression, the strongest enrichment was seen for the posttranslational protein modification process. Although inflammatory and immune response pathways were not statistically enriched among the top 100 genes in A549-ACE2 cell line, key immune response genes were among the top 100 central genes, e.g., STAT3, STAT5A, NFKB1, NFKBIA, etc.
Post-translational modification (PTM) of viral proteins by the host machinery is well documented, such as glycosylation and palmitoylation of Spike protein, N or O-linked glycosylation of membrane protein, phosphorylation and ADP-ribosylation of nucleocapsid protein (27). Likewise, deimination of SARS-COV-2 proteins is catalyzed by peptidylarginine deiminases (PADs) which leads to activation of neutrophils extracellular traps (NETs) in an ACE2-dependent manner (28). These findings suggest that a larger viral load, facilitated by high ACE2 expression, is likely to induce greater PTM. Gordon et al. (29) created a PPI network where 332 human proteins were interacting with 27 viral proteins. They showed that high viral load inside the host cells leads to increased human protein phosphorylation and host kinome modulation.
Other enriched processes such as metabolism and cell differentiation also have known roles in viral infections. These results suggest that the host response to viral infection may  depend on the infection and replication rates in various cell lines. For instance, the differential response in A549 may be because A549 cells are relatively less permissive to SARS-CoV-2 replication owing to lower concentration of ACE2 receptor protein required for viral entry (30), while Calu3 and Vero cell lines exhibit a much higher expression of ACE2. This difference in infection rate leads to differences in the host immune responses as shown previously (15). Key genes in NHBE (bronchial epithelium cell line) were strongly associated with small molecule and nucleoside phosphate metabolic processes, known to be central for viral replication and survival (31). A complete list of all the upregulated enriched terms as well as the parent-child relations identified for various SARS-CoV-2 infected cell lines is provided in Supplementary Table S3.
In addition to biological process, we also looked at the molecular function associated with the top 100 central genes present in the Activated TopNets across these cell lines (Supplementary Table S4). In the case of A549, we saw the enrichment of molecular functions associated with DNA-binding transcription factor activity, calcium-dependent protein kinase C activity, protein binding, etc. All these processes have been shown to be important for SC2 infection. For instance, the complex of viral S protein and host ACE2 receptor binds to calcium ions (EF-hand domain) and shows protein kinase activity (32). After binding, the spike protein is further cleaved by TMPRSS2, a transmembrane protein serine 2 leading to further downstream signaling processes (33). As observed with biological processes, molecular functions for Vero and Calu3 cell lines were very similar. They were enriched for the functions such as protein kinase activity, TNF receptor superfamily binding, cytokine receptor binding, etc. These findings are supported by previous published literatures (34). Lastly, for NHBE cell line, we saw molecular functions such as small molecule binding, carbohydrate derivative binding, etc. as shown in previous work (35). In stark contrast, for top 100 DEGs, only one statistically significant molecular function was enriched across all cell lines, namely, "sequence specific DNA binding function" in the Vero cell line. The enrichment of regulatory and signaling molecules among the PathExt genes underscores the rationale that PathExt attempts to identify key genes mediating the transcriptional response. Figure 2D summarizes the pathways enriched among the top 100 central genes in the repressed TopNets. A549 cell lines were highly enriched in cell cycle, DNA replication, and DNA damage response; interestingly, the enrichment was much greater in A549 with ACE2. These observations recapitulate the established biology of viral infection. For instance, genes mediating DNA damage response -ATM and ATM/Rad3-related (ATR)were among the top 100 central genes. Key genes in Vero cells are enriched for mitotic spindle organization, ERBB signaling, EGFR signaling, protein phosphorylation, phagocytosis, etc., all previously known to modulate SARS-CoV-2 infection response (36). Lastly, in Calu3 cells, we saw enrichment of several lipid metabolism pathways. Lipids are one of the major components of viral structure and play an important role in its entry and replication inside the host cell. Several studies have shown the potential role of lipids and lipid induced metabolic changes in coronavirus infection (37,38).
Other enriched pathways include leukocyte migration and phagocytosis. The downregulated enriched biological terms identified for various SC2 infected cell lines is provided in Supplementary Table S5.
We also performed molecular function analysis for the top 100 central genes in repressed TopNets (Supplementary Table S6). For A549 cell line, functions such as catalytic activity and singlestrand DNA binding, were enriched. These processes are associated with SC2 attachment, infection and downstream signaling events as shown before (39). Likewise, for A549 cell line with high ACE2 expression, "single-stranded DNA binding" was enriched. Genes associated with this function (e.g., ATM and ATR) are essential for response to DNA damage and repair, DNA metabolism, and maintaining genomic stability (40). In case of Calu3 cell lines, enriched functions included kinase activity, purine ribonuclease triphosphate binding, signaling receptor binding, etc. These functions play an important role in viral entry, membrane trafficking, signaling, etc. (34).
We investigated the commonality between the activated and repressed TopNets for the cell lines. We first confirmed that the top 100 central genes of the activated and repressed TopNets of the same cell line were disjoint, with a few exceptions: out of 100 genes, 1 is common in Calu3 cell line, 2 in A549, 3 in A549 with high ACE2 expression, and 5 in the Vero cell line, suggesting dual activation and repressive roles of certain key genes; for instance, ATM, a protein kinase which plays key roles in many processes such as cell cycle progression, cell metabolism and growth, oxidative stress and chromatin remodeling, and is upregulated as well as downregulated in different cancers (41). Indeed, ATM is known to regulate these pathways in COVID-19, as shown in previous study (42). Despite disjoint central genes between activated and repressed TopNets, the enriched pathways among the central genes show greater commonality, such as protein phosphorylation, suggesting that these key processes may be involved in both activation and repression of different downstream processes.

Comparison of SARS-CoV-2 Infection With Other Viral Infections
Next, we compared the host transcriptional response to SC2 infection with those for other respiratory viruses in Calu3, Vero, MRC5, NHBE and A549 cell lines where there was data for an additional respiratory virus (RSV, IAV, HPIV3, SARS-CoV-1 and MERS). Top 100 central genes from each viral TopNet (activated & repressed) were used for comparison. As shown in Figures 3A, B, while there are small but significant overlaps in most cases, the responses could be considered largely virus specific.
Amongst activated TopNets, the SC2 response shares at least 10 central genes with responses to influenza virus, RSV, SARS-CoV-1 and MERS virus in different cell lines. SC2 shared 29 genes with MERS in the MRC5 cell line. Influenza virus in NHBE (bronchial epithelium) lacking NS1 shared 27 genes with SC2infected Calu3 (Lung epithelium) and 25 genes with SC2infected Vero cell line. The commonly shared genes between SARS-CoV-2 and most of the other viruses included MAPK1, IL7, LYN, STAT3, TRAF6, NFKB1. These genes are associated with pathways such as regulation of MAPK, response to cytokine, positive regulation of cell differentiation, and innate and adaptive immune response, all of which are supported by previous experimental work (15). These overlaps among the top 100 genes are significant and suggest shared responses across viruses. In repressed TopNets, a similar overlap trend was seen. SC2 shares at least 10 genes with all the viruses in different cell lines except MERS. Among the top 100 central genes in the repressed TopNet, SC2 shares 13 common genes with RSV and HPIV3 in the A549 cell line. Some of the commonly shared genes between SC2 and other viruses included E2F1, BRCA1, RAD21, CDK, DDAH2. These genes are associated with pathways such as cell cycle process, apoptotic process, cellular nitrogen compound metabolic process, DNA repair, etc., again revealing known shared responses across viruses (15,43).
Next, we assessed the similarity in responses between SC2 and other viruses at the pathway level, using PANTHER and REVIGO. Given the enriched terms in the TopNets for all the viruses, we computed the semantic similarity using the GOSemSim package in R (44). For the activated TopNet, SC2 shares high similarity with nearly all the viruses considered in the study ( Figure 3C). Highest similarity of 0.97 is observed between the processes enriched in SC2-infected A549 with high ACE2 expression, and Influenza-infected NHBE. Activated TopNets across viruses share cytokine signaling, NF-kappaB signaling, inflammatory response, DNA binding transcription factor activity, and protein phosphorylation, all of which are well established host responses to respiratory viral infections. The processes which were uniquely enriched for SC2 infection include ATP metabolic process, G1/S transition of mitotic cell cycle, post-translational protein modification, and carbohydrate derivative metabolic process. Some recent published studies also account for these processes associated with SC2 infection (42,45). A complete list of enriched biological pathways in activated TopNets for SC2 and other viruses is provided in the Supplementary Table S3.
Compared to activated TopNets, repressed TopNets exhibited greater divergence in terms of enriched processes ( Figure 3D), but consistent with activated TopNets, we observed a greater similarity across different cell lines infected by SARS-CoV-2. Key enriched processes across viruses include DNA replication, telomere organization, DNA metabolic process, cellular response to DNA damage stimulus, regulation of MAPK cascade, negative regulation of apoptotic process, regulation of cell migration, nuclear division, regulation of cell cycle, etc., all of which have literature support (46). However, the unique processes enriched with SC2 infection include activation of phospholipase C activity, cardiomyocyte differentiation, leukocyte activation and migration,

PathExt Provides Unique Insights Compared With DEG Analyses
Next, to assess advantage or complementarity of PathExt relative to DEG-centric approach, we compared the genes and pathways identified by PathExt with DEGs and their enriched pathways. Recall that PathExt identifies central genes that potentially mediate differential expression of other genes but may not be differentially expressed themselves. To test this, we selected the SC2-infected Calu3 cell line data and analyzed the log-fold change in expression of the top 100 central genes identified by PathExt TopNets and DEGs. As shown in Figure 4A, log-fold change of the genes present in the PathExt TopNets are much smaller than those of the DEGs, and therefore would systematically go undetected in a differential expression analysis. Similar results were seen for other SC2 infected cell lines as well (Supplementary Figure S1).
Next, in each SC2-infected cell line, top 100 upregulated genes were compared with the top 100 central genes in activated TopNet, and top 100 downregulated genes were compared with the top 100 central genes in repressed TopNet. As shown in Figures 4B, C, PathExt genes and DEGs were largely distinct. Given minimal overlap between PathExt central genes and DEGs, we compared them in terms of enriched processes. Remarkably, in stark contrast with PathExt central genes, the 100 most downregulated DEGs do not reveal any pathway enrichment in any of the cell lines, and the upregulated DEGs showed pathway enrichment only in Calu3 and Vero cell lines (Supplementary Table S7); in Vero cell line only one process was enriched and is therefore not discussed. As expected, pathways enriched among upregulated DEGs in Calu3 showed highest similarity with those in activated TopNet in Calu3 ( Figure 4D), despite very little overlap in terms of genes. In Calu3, while the common pathways among PathExt and DEGs (Supplementary Figure S2) includes response to cytokine production, and immune system process, several pathways were uniquely revealed by PathExt, including regulation of DNA binding transcription factor activity, I-kappaB kinase/ NF-kappaB signaling, regulation of cell death, and cellular response to organic substances. These pathways are wellassociated with the COVID-19 infection as shown in multiple studies (21,25), highlighting the relative advantage of PathExt over the conventional DEG approach.  Genes PBK, CDC6, and BUB1, were found to be the most frequent central genes among the activated TopNets occurring in 56%, 53%, and 48% of the 96 samples, respectively. Likewise, in the repressed TopNets, IL6 was the most frequent, occurring in 49% of the samples, followed by POMC in 48% of the 100 samples; a complete list of top genes is provided in Supplementary Table S8.
As shown in Figure 5A, pathways enriched among the frequent central genes in activated TopNets are predominantly related to cell cycle, which is an expected response to infection by the host immune system (48), as well as potentially linking SC2 infection with cancer (49). Previous studies have observed that cancer patients are more vulnerable to SC2 infection leading to adverse outcomes, likely due to compromised immune system (50). Some of the pathways such as uncontrolled production of cytokines, type-I interferon ( Table S10).
In contrast, pathways enriched among the frequent central genes in the repressed TopNets included response to cytokine, regulation of cytokine production, regulation of defense response and icosanoid metabolic process ( Figure 5B). Suppression of regulatory mechanisms that keep cytokine production in check is consistent with observed cytokine storms in COVID-19 patients and ensuing damage of organs such as the liver leading to downregulation of various metabolic processes (51). Likewise, SC2 can suppress the host immune defense by downregulating the T-cell function (52), cytokine production and cell-cell adhesion (51). Complete list of enriched terms associated with the top 100 most frequent central gene is provided in the Supplementary Table S11. Molecular functions associated with top 100 central genes include receptor ligand activity, heme binding, chemokine binding, etc. Once again, these are well known functions which occur post SC2 infection. However, downregulated DEGs were found to be enriched only for retinol binding function. Complete list of molecular function associated with repressed TopNet in PBMC is provided in Supplementary  Table S12.
The cellular response in COVID-19 patient PBMC likely reflects immune response to systemic infection by SC2. However, there is some evidence that SC2 can infect immune cells as well (53)(54)(55), and therefore, the cellular response in PBMC could in part reflect endogenous response by the infected PBMC cells. To distinguish between these two possibilities, we identified the enriched pathways separately for the PBMC central genes shared with those in other cell lines (reflecting endogenous and has started testing SC2negative may still exhibit cold symptoms due to decreased expression of adaptive immune related genes, especially those related to T and B cells and HLA molecules, making them susceptible to secondary infections. They also showed that the suppression of the adaptive immune system could be due to dysregulated host response and not because of immune checkpoint molecules such as PD-1, PD-L1, CTLA4, etc. (58). Overall, PathExt reveals immune cell response to SC2 infection and discriminates, to some extent, the potential cell-endogenous responses by infected PBMC and response to systemic infection. While it is possible that unique central genes in PBMC could simply reflect the inter-cell type heterogeneity in response to infection, we note that the fraction of unique genes in PBMCs are far greater than those in other cell types and we speculate that some of the unique gene genes and pathways in PBMCs may reflect systemic immune response. All enriched GO processes for activated and repressed TopNets in patient PBMCs common with cell lines are provided in Supplementary Table S13 whereas enriched GO processes for activated and repressed TopNets unique to patient PBMCs are provided in Supplementary Table S14.
Next, we investigated the association between PathExtidentified most frequent central genes in PBMCs and the available demographics and clinical features of the patientsage (<= 60 years or >60 years), sex (male vs female), and severity (ICU vs non-ICU). We found that most frequent genes in activated TopNets were more frequently observed in severe cases ( Figure 6A). We did not see a direct association between frequent central genes and sex or age. Frequency and distribution of the PathExt identified genes in patients with severe illness is consistent with previous findings (47). To further probe into association of frequent central genes with severity, we identified genes that were uniquely central in either ICU or non-ICU patients, with a minimum frequency of 5 (Supplementary  Table S15). Central genes in ICU patients included genes like CDC25C, LOX, TGFB3, CCNF, etc. and were enriched for mitotic cell cycle phase transition and regulation of cell death. Central genes in non-ICU patients included genes such as CCR10, IRF6, CCL4, etc. and were enriched for cytokine-mediated signaling pathway, cell communication, positive regulation of ERK1 and ERK2 cascade (Supplementary Table S16). B A FIGURE 6 | Demographic features analysis. Mann Whitney Test was performed to check statistical significance between top 100 central genes and various demographic features (age, sex and severity). In case of PathExt identified top genes, "Severity" was found to be the only statistically group among activated TopNet genes (A). However, no group was statistically significant in case of repressed TopNet (B).

Agrawal et al. COVID-19 Infection in Cell-Line and PBMCs
Frontiers in Immunology | www.frontiersin.org July 2022 | Volume 13 | Article 918817 The analysis above identifies frequent central genes in ICU and non-ICU patients. We further aimed to assess whether the TopNet neighbors of the central genes are similar across samples, which would indicate a homogeneity in the perturbed paths mediated by a central gene. Toward this, we quantified, for each central gene independently, all sample-pair overlap (quantified by Jaccard Index or JI) between the sample-specific TopNet neighbors of the central gene. We did this separately for ICU and non-ICU patients; Supplementary Figure S3 shows the distribution of JIs for the central genes in ICU and non-ICU patients separately, suggesting a greater homogeneity of response mediated by the central genes in the ICU patients.
We did not notice any association between frequent genes in the repressed TopNets and demographic features ( Figure 6B). For DEGs, top central genes were more associated with male in activated TopNets and with severe cases in repressed TopNets (Supplementary Figure S4) As a point of comparison, we also identified the upregulated and downregulated DEGs from the COVID-19 patients. As above, we selected the top 100 upregulated and downregulated genes in each patient sample and then the top 100 most frequent upregulated and downregulated genes across all 100 patients. First, we note that while frequent central genes in activated TopNets exhibit slightly lower fold changes compared to frequent DEGs, the frequent central genes in repressed TopNets exhibit far lower fold changes compared to DEGs (Supplementary Figure S5).
Next, we analyzed the commonality among the PathExt and DEGs top 100 genes. We found 28 genes to be common among the activated TopNets and upregulated DEGs and 8 were common among the repressed TopNets and downregulated DEGs. Next, we analyzed the similarity at the pathway level. While many processes enriched among frequent central genes were also enriched among upregulated DEGs (e.g., cell cycle) (Supplementary Figure S6), the PathExt central genes uniquely revealed cellular response to DNA damage stimulus, regulation of transferase activity, regulation of fibroblast proliferation, establishment of chromosome localization, etc. However, the frequent downregulated DEGs did not identify any significant enriched process, underscoring the relative advantage of the PathExt approach.

PathExt Reveals Previously Ascertained Anti-SC2 Target Genes Far Better Than DEGs
Next, we assessed the extent to which the PathExt-identified genes and DEGs recapitulate previously proposed anti-SC2 target genes based on experimental screens. Toward this, we compared the top 100 central genes identified in SC2-infected cell lines and patient PBMC data against previously published benchmark datasets of anti-SC2 targets. In total, we compiled 11 gene sets from our study: 6 for activated TopNets (5 for the cell lines and 1 for the PBMC cohort) and 5 for repressed TopNets (4 for the cells lines and 1 for the PBMC cohort); while for each cell line we considered the top central genes, for PBMC, we selected the 100 most frequent central genes across patients. We compiled 9 benchmark genes sets from previously published reports which includes CRISPR gene-knockout studies, viral-host proteinprotein interactions (PPI) and targets associated with experimentally screened drugs in various cell lines and animal models (Methods). Next, we assessed overlap between our 11 gene sets and the 9 different benchmark datasets using Fisher's Exact test, resulting in 99 tests. In 18 of the 99 comparisons (expectation is~5 at p-value threshold of 0.05) PathExt gene sets significantly overlapped with the benchmark gene sets ( Figure 7A). In sharp contrast, analogous sets of DEGs significantly overlapped with gold sets in only 5 cases, as expected by random chance (Figure 7B), again underscoring the relative advantage of PathExt.

Identifying Novel Potential Anti-SC2 Targets and Drugs
Next, to identify novel drug targets against SC2, we removed the already known targets (Methods) against SC2 and other viruses (considered in this study) from the TopNets (activated & repressed). Based on the frequency of these unique genes in the TopNets of various SC2 infected cell lines, we proposed novel anti-SC2 targets namely ADCY2, ADSL (mediating activated TopNet), and OCRL, and TIAM1 (mediating repressed TopNet). Similar analysis performed for the PBMC data reveals that genes like PBK and BUB1 can be potential new targets. Even though the inter-cell type overlap at the level of key genes is not high, the overlap at the pathway level is much greater (Figures 2C, D). Insofar targeting the key genes may impact the specific pathways, the impact of targeting these genes may nevertheless be broad. Complete list of the potential targets observed in both cell line and PBMC is provided in the Supplementary Table S17. The gold standard dataset used to ascertain novel targets was created in March 2021. Instead of using an updated dataset, we decided to further assess the accuracy of our proposed novel targets in a prospective manner: a quick survey of the literature published since March 2021 validated some of the targets that we have identified, for example ESRRA (59), PTGDR2 (60), EGFR (59), etc. This serves as a prospective validation of the targets proposed by PathExt. Finally, we performed virtual screening (Methods) to identify potential inhibitors against the proposed novel targets in our study. We propose the top 5 potential drug molecules for each target (cell line and PBMC) in Table 1.
All the drugs listed in Table 1 are either FDA approved or are currently undergoing clinical trial against various diseases, e.g., Nilotinib to treat chronic myeloid leukemia (61), Tirilazad against acute ischemic stroke (62), etc. Extensive drug repurposing has been carried out recently to treat SC2 infection and corroborate our proposal. Nilotinib has been shown to be effective against covid-19 (63); Conivaptan have been shown to be effective against covid-19 based on in silico studies, where conivaptan targets viral nonstructural protein 9 (Nsp9) (64). Likewise, Phthalocyanine has been shown to be effective against preventing covid-19 in randomized trials (65). Bemcentinib has been shown to be effective against SC2 (66) and is being tested (trial NCT04890509) for efficacy in hospitalized covid-19 patients. Bemcentinib was designed for targeting AXL, a tyrosine kinase which signals via PI3K (67); however, our analysis identifies it as a lead molecule against ADSL and PBK. We use GeneMANIA software (68) to probe potential relationship between ADSL or PBK with AXL. As shown in Figure 8A, there is a physical interaction between AXL and PBK via PIK3R2. Likewise, interaction can be seen among AXL and ADSL in Figure 8B, suggesting that Bemcentinib's effect may be mediated by multiple genes within a closely linked gene module. Our computationally generated hypotheses however need experimental validations through knockout or induction studies.

DISCUSSION
We applied PathExt to investigate the response to SC2 infection in different cell lines (lung epithelium, bronchial epithelium and kidney cells), as well as in patient derived PBMCs. We also compared the response to SC2 with those for other respiratory viruses. Some of the key processes enriched among the PathExt-identified genes involve the immune system, cytokine signaling, metabolism, DNA replication and transcription, lipid mediated processes, etc. Our analysis revealed both similarity and dissimilarity in response to SC2 across different cell lines, reflecting different strategies evolved in different cell types likely governed by their specific regulatory networks. For instance, as A549 cell line, with low baseline ACE2 expression, has a lower viral load, its response, in terms of the enriched pathways among central TopNet genes, were different in comparison to Vero and Calu3 cell lines. Several previous studies have shown the links between viral load dynamics and disease severity. Viral load affects the host gene expression and downstream response pathways (55). Blanco et al. have shown that only 0.1% viral reads were detected post SC2 infection in A549, whereas in Calu3 cell line, 15% of the reads were detected. They also observed different host transcriptional landscape in different cell lines based on virus replication rate post SC2 infection (15). Comparing the response to SC2 with  those of other respiratory viruses (Influenza, SARS1, MERS, HPIV3 and RSV), we noted that the central TopNet genes are largely virus specific. However, the broad biological processes enriched among these genes exhibit a much greater degree of similarity across viruses, suggesting a virus-specific host response network that nevertheless affects common phenotypic response. Some common processes enriched among all the viruses include cytokine signaling, defense response, cell cycle regulation, inflammatory response, etc. However, processes like ATP metabolic process, carbohydrate derivative metabolic process, cardiomyocyte differentiation, phagocytosis, leukocyte activation and migration, etc. were found to be specific to SC2 response. Though activated and repressed TopNets shared very few central genes, they shared several processes enriched among the central genes, including cell cycle, phosphorylation, metabolism, etc., suggesting that these biological processes likely mediate both upregulation and downregulation of various genes in the global transcriptomic response to SC2 infection.
A previous study by Thair et al. (69) has reported the comparison of differentially expressed genes in SARS-CoV-2 infection with other viral infection models. There are key similarities between their findings and ours in terms of enriched processes. For instance, both our studies identified cytokine signaling, inflammatory response, DNA binding transcription factor activity, etc. However, PathExt uniquely identified ATP metabolic process, carbohydrate derivative metabolic process, cardiomyocyte differentiation, etc. to be active in SARS-CoV-2 infection.
Conventional DEG-centered approaches can miss the genes which are not significantly differentially expressed but are nevertheless responsible for mediating, for instance, based on their post-translational modification state, the global transcriptional response in a given condition. PathExt addresses this limitation by focusing on genes that critically mediate significant path-level perturbations in the network. Thus, as expected, many of the central genes identified by PathExt exhibit much lower expression fold change relative to DEGs. Interestingly, some genes are identified as central in both activated as well as repressed TopNets (a feature unique to PathExt), consistent with pleiotropic function of genes. Similarly in patient PBMC dataset, while the central genes both in activated TopNet as well as in upregulated DEGs were enriched for cell cycle and defense response, the downregulated DEGs did not show any enrichment while key genes in repressed TopNet were enriched for cytokine production. The pathways enriched among central genes in PBMC TopNets likely reflect both the innate endogenous response of immune cells to viral infection as well as response to systemic infection. However, a vast majority of central genes identified in PBMCs are specific to PBMCs, suggesting that transcriptional changes in PBMCs are largely in response to systemic infection. Demographic features were not statistically significant except "severity" category in Activated TopNet of PathExt. Central genes unique to ICU patients were associated with cell cycle processes whereas unique genes associated with non-ICU patients were enriched for cytokine mediated processes. Central genes in ICU patients exhibited a much greater homogeneity in their TopNet neighborhood across patients compared with central genes in non-ICU patients, suggesting that disease severity may be mediated by a more conserved genes and processes. Importantly, PathExt was substantially more effective in identifying potential anti-SC2 drug targets. PathExt-identified targets were enriched in 18% of the gold B A FIGURE 8 | Drug-target association. Based on our virtual screening study, we identified, Bemcentinib as a potential inhibitor for PBK and ADSL. Bemcentinib is a well-known drug against AXL, and we saw that this gene (AXL) is connected to our proposed target PBK (A) and ADSL (B) suggesting that the drug effect may be mediated by multiple genes within a closely linked gene module.
set comparisons, compared with only 5% (NULL expectation) for DEGs.
Previous studies have noted that genes with variable expression are more likely to be detected as differentially expressed in multiple contexts and do not reveal context-specific functional responses (9). Compared to reliance only on the differential expression, by exploiting the knowledge-based gene networks, and focusing on identifying key genes associated with significantly perturbed paths, PathExt represents a complementary, and in important ways, more effective approach. While previous works have exploited protein networks to infer transcriptomic perturbations, they have still relied on significantly differentially expressed genes and interpreted them in the context of the network (70,71). We have previously demonstrated (11), superiority of PathExt over such integrative approaches that nevertheless rely on significant differential expression.
In summary, our work (1) further establishes the utility of PathExt, (2) provides a comparative analysis of key genes potentially driving the global transcriptomic response to SC2 and other respiratory viruses and across multiple cell lines and patient PBMCs, (3) identifies target genes validated in previously published benchmark anti-COVID-19 target gene datasets, far better than DEGs, (4) proposes novel targets against COVID-19, and (5) proposes FDA approved drugs or drugs in clinical trials, against the novel targets.

Data Collection and Processing
We collected 12 datasets from 2 studies, one published (15) and another unpublished (72), which include pre-and post-infection transcriptome in 5 different cell lines (NHBE, A549, Calu3, Vero, MRC5), each infected by one or more of the 6 viruses (SARS-CoV-2, SARS-CoV-1, MERS, RSV, HPIV3, and Influenza virus). In addition, we also obtained PBMC transcriptomes from 100 COVID-19 patients and 26 non-infected individuals as controls (47). In total, we had 13 datasets from 3 different studies, details of which are provided in Table 2. For each dataset (except Vero and PBMCs), we downloaded the raw reads using prefetch (73); most datasets were single end reads except Vero for which paired end reads were provided. Files were split using fastq-dump command (73), trimmed using Trim Galore-0.6.6 (74) at default parameters and the reads mapped to the human transcriptomic index version hg38 using SALMON v.1.12 (75). We proceeded only with those samples for which at least 60% of the reads were mapped. In the case of Vero cell line and patient PBMC data, we directly downloaded the TPM (transcripts per million) values provided from the GEO.

Gene Expression Normalization and Node Weight Computation
For every cell line, mean gene expression was computed for the treated and control samples. Genes with TPM value 0 or greater than 100 were removed. For patient PBMC data, each case was analyzed individually and for the control, we took the mean expression of 26 control samples. Data was further filtered by removing the genes whose expression value was either 0 or above 100 in at least 50% of the PBMC case samples. Lastly, we also excluded the genes from the cell lines and PBMC data, which were not present in our human protein-protein interaction network. After the above filters, there were a total of 7,740 genes for cell lines (except Vero), 6,852 for Vero cell line. In case of patient PBMC datasets, the main dataset comprises 12,417 genes (Supplementary Table S18). All data was quantile normalized as done in previous study (76). Lastly, the normalized data was used to compute node weights which was then used to compute differentially expressed paths using PathExt software. Node (gene) weights to identify activated TopNet were computed as follows. While log(fold change) is reasonable choice for node weight, given the dependence of the magnitude of log(fold change) on the expression value, we instead computed the expected log(fold change) for a given control expression and then used the difference of observed and expected log(fold change) as the node weight. To compute the expected absolute log(fold change), we regressed absolute log(fold change) values across all genes against the gene expression in control samples using Loess fit, implemented in R (77). For activated TopNets the fold change was computed in cases relative to control, and for repressed TopNets the fold change was computed in control relative to cases. Computed node weights for Activated and Repressed TopNets in SC2 infected cell line (other  Supplementary Table S22. Given the node weights across samples, the PathExt tool computes the significant paths in two steps: (1) Top 0.1% shortest paths are selected and (2) Statistical significance of those selected paths is estimated based on data randomization and multiple testing corrected q-value threshold. We selected those percentiles and qvalues which provide at least 300 nodes for a given TopNet. Once the TopNets were generated, we computed the ripple centrality score for each gene in the TopNet and top 100 central genes were selected for the further analysis.

GO Enrichment Analysis
Enriched pathways in a given TopNet (activated and repressed) were analyzed using the top 100 central genes using PANTHER software. Customized reference was used as a background during the enrichment analysis, where we considered only those genes which were used for the TopNet creation in this study. The reference was different for the Vero cell line, cell lines other than Vero and the patient PBMC data. 'GO biological process complete' was selected as Annotation Data Set, 'Fisher's Exact' as Test Type, and 'Calculate False Discovery Type' as Correction method. As there could be redundant terms present in the result, we removed them by performing parent-child relation study using REVIGO software. The GO term and its corresponding FDR value was provided as an input with the resulting list option to be 'Medium (0.7)'. Also, in the 'Advanced options', we selected 'Yes' in the remove obsolete GO terms, 'Homo sapiens' as the working species, and the 'SimRel' as semantic similarity measures. Finally, the circular plot representing the parent-child GO terms was created using the CirGo software (78). This tool requires the output of the REVIGO as an input. 'GOSemSim' package was used to compute the semantic similarity among the processes enriched in various cell lines, among different viruses and the patient PBMC data. In GOSemSim package, to calculate the similarities between two GO terms, we used Wang method which uses the topology of GO DAG graph structures to compute the semantic similarities. To combine two sets of GO terms, we utilized 'rcmax' method which considers the average of maximum similarity on each row and column on the matrix consisting of the similarities among two sets of GO terms. Heatmap plots for the similarity analysis are visualized using ggplot2 R package (79).

Comparing Most Central Genes Across Cell Lines and Viruses
Upset plot was generated by providing the list of genes of all the samples in a csv file format as an input to the server. Intervene server (80) at default parameters was used for generating the upset plots. For gene enrichment heatmap, we computed the Observed/Expected score. 'pheatmap' package in R was used for generating the heatmaps (81).

Identification of Novel Drug Targets and Their Potential Inhibitors
First, we identified known drug targets of SC2 (till March 2021) from various published reports which includes CRISPR gene-knockout studies, viral-host protein-protein interactions (PPI) and experimentally screened drugs in various cell lines and animal models. See Supplementary Table S23, for the references used for preparing this gold standard benchmarking datasets. These studies reported the host genes important for the viral replication and function. CRISPR-Cas studies provided the important therapeutic targets based on the gene knockdown studies and their downstream effects, whereas protein-protein interaction studies shed light on the targets physically interacting with the viral proteins. In vitro and in vivo studies in the cell lines and animal models provided the information about the genes which could be potentially targeted by different drugs. We excluded all these targets from our list of predicted targets. To provide specific recommendations for SC2, we also excluded the targets which were present in the TopNets (Activated & Repressed) for any other virus such as Influenza, HPIV3, RSV, SC1 and MERS. For other viruses, we removed all the genes present in the TopNets for the respective viral response in various cell lines. This left us with the targets unique to SC2 and we then prioritized the remaining central TopNet genes based on their frequency across cell line samples as novel potential drug targets. Similar approach was followed for the PBMC data.
Next, to propose potential inhibitors against our proposed novel targets, we performed a virtual screening process using AutoDock Vina software (82). The 3D structures of the targets were downloaded from the RCSB-PDB (83) and further refined using Open Babel software (84). Active site information of the protein molecules was computed using P2RANK software (85). Next, a drug library was created for the virtual screening process, where we considered only those drug molecules which are either FDA approved or are under clinical trials; SMILES formatted files of these drugs were downloaded from the ZINC database (86) and were further converted to MOL2 file format using openbabel. The ligand and the receptor files were prepared in the 'pdbqt' file format required by vina for the docking purpose. The center and the grid size of the receptor molecules was computed using UCSF Chimera software (87), based on the P2RANK software active site result. Lastly, based on the AutoDock Vina affinity score and Root Mean Square Deviation (RMSD) value, we proposed the drug molecules which could potentially inhibit the function of these new targets.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
PA created the dataset. PA and NS developed the code for the analysis. PA, NS and SH analyzed the results. PA, GO, and NS created the tables and figures. PA and SH wrote the manuscript. SH supervised the study. All authors contributed to the article and approved the submitted version.  Supplementary Figure 4 | Demographic features analysis. Mann Whitney Test was performed to check statistical significance between top 100 differentially expressed genes and various demographic features (age, sex and severity). In case of upregulated DEGs, "Gender" was found to be the only statistically group among activated TopNet genes (A) and in case of downregulated DEGs, "Severity" was found to be the only statistically group among repressed TopNet genes (B).