Screening Potential Drugs for COVID-19 Based on Bound Nuclear Norm Regularization

The novel coronavirus pneumonia COVID-19 infected by SARS-CoV-2 has attracted worldwide attention. It is urgent to find effective therapeutic strategies for stopping COVID-19. In this study, a Bounded Nuclear Norm Regularization (BNNR) method is developed to predict anti-SARS-CoV-2 drug candidates. First, three virus-drug association datasets are compiled. Second, a heterogeneous virus-drug network is constructed. Third, complete genomic sequences and Gaussian association profiles are integrated to compute virus similarities; chemical structures and Gaussian association profiles are integrated to calculate drug similarities. Fourth, a BNNR model based on kernel similarity (VDA-GBNNR) is proposed to predict possible anti-SARS-CoV-2 drugs. VDA-GBNNR is compared with four existing advanced methods under fivefold cross-validation. The results show that VDA-GBNNR computes better AUCs of 0.8965, 0.8562, and 0.8803 on the three datasets, respectively. There are 6 anti-SARS-CoV-2 drugs overlapping in any two datasets, that is, remdesivir, favipiravir, ribavirin, mycophenolic acid, niclosamide, and mizoribine. Molecular dockings are conducted for the 6 small molecules and the junction of SARS-CoV-2 spike protein and human angiotensin-converting enzyme 2. In particular, niclosamide and mizoribine show higher binding energy of −8.06 and −7.06 kcal/mol with the junction, respectively. G496 and K353 may be potential key residues between anti-SARS-CoV-2 drugs and the interface junction. We hope that the predicted results can contribute to the treatment of COVID-19.


INTRODUCTION
Novel coronavirus pneumonia COVID-19, erupted in Wuhan, Hubei, China, has become a global public health challenge (Nittari et al., 2020). By July 26, 2021, it has caused 192,284,207 confirmed cases and4,128,152 deaths (WHO, 2021). Although the COVID-19 vaccine has been researched and developed in many countries and regions, it still fails to avoid the risk of infection. Therefore, it is an urgent task to design effective drugs for the COVID-19 treatment (Khan et al., 2020a).
COVID-19 is caused by SARS-CoV-2 infection. SARS-CoV-2, like most coronaviruses, is a positive single stranded virus with unique coronal protein spikes (Khan et al., 2020b). It invades human body through SARS-CoV-2 Spike (S) protein binding with the surface of host angiotensin-converting enzyme 2 (ACE2) (Morse et al., 2020). Based on the homology between SARS-CoV-2 and other RNA viruses (such as SARS-CoV and MERS-CoV), we can investigate RNA virus-related FDA-approved drugs to find possible chemical agents for preventing  Computational methods for identifying potential antiviral drugs against COVID-19 contain structure-based methods and network-based methods. Structure-based methods are a pivotal implement based on computer-aided drug design and structural molecular biology. The type of methods aims at predicting binding sites between chemical agents and target proteins and thus elucidating basic biochemical processes (McConkey et al., 2002). Lan et al. (2020) determined the crystal structure of receptor-binding domain (RBD) in which the S protein binds to ACE2. Li et al. (2021) screened 21 antiviral, antifungal and anticancer compounds to identify possible SARS-CoV-2 papain inhibitors based on silicon molecular docking. Elfiky, 2020a utilized sequence analysis and molecular docking to construct an anti-COVID-19 RNA-dependent RNA polymerase (RdRp) prediction model. Panda et al. (2020) used molecular docking technique to implement virtual screening among SARS-CoV-2 protein, main protease, and RDB/ACE2 complex and FDAapproved antiviral drugs. Maurya et al. (2020) screened possible antiviral natural products against the S protein and its cellular receptor from Ayurveda through molecular docking. Choudhary et al. (2020) utilized a structure-based virtual screening technique to find possible inhibitors for SARS-CoV-2 entering cells. Wang et al. (2020a) investigated the development of structure-based methods and emphasized the limitations and further works of anti-SARS-CoV-2 drug research. Gahlawat et al. (2020) exploited structure-based virtual screening technique to investigate the inhibition effects of major proteases in coronavirus.
Network-based methods have been broadly applied to anti-SARS-CoV-2 drug screening. For example, Peng et al. (2020) built one virus-drug (VDA) association dataset and employed a regularized least square classifier to explore the therapeutic clues of COVID-19 by combining drug chemical structures, virus complete genome sequences, bipartite local model and neighborhood association information. Zhou L. et al. (2020) exploited a KATZ algorithm (VDA-KATZ) to predict candidate drugs for the SARS-CoV-2 prevention on the VDA dataset. Peng et al. (2021) continued to construct two VDA datasets and developed a random walk with restart method (VDA-RWR) to prioritize drugs related to COVID-19. Zhou Y. et al. (2020) designed a formidable network-based method to reposition the existing chemical agents and quickly screened latent drug combinations for COVID-19. Taz et al. (2021) identified the infectious responses between SARS-CoV-2 and idiopathic pulmonary fibrosis-infected lung cells based on protein-protein interaction network. Du et al. (2021) probed a network-based virus-host interaction prediction method and considered its application on SARS-CoV-2. Messina et al. (2020) studied pathogenesis of SARS-CoV-2 infection to discover the etiopathogenesis of COVID-19 by analyzing virus-host interactome. Ahmed (2020) found that Vitamin D may inhibit SARS-CoV-2 infection based on a network analysis approach. Stolfi et al. (2020) used a broader molecular map to reveal potential therapeutic strategy for COVID-19.
Computational methods effectively prioritize potential drugs for the SARS-CoV-2 infection. In this work, we propose a Virus-Drug Association (VDA) prediction algorithm, VDA-GBNNR, to discover potential chemical agents against COVID-19 based on virus similarity, drug similarity, VDA network, Gaussian Association Profile Kernel (GAPK), and Bounded Nuclear Norm Regularization (BNNR). VDA-GBNNR is compared with three existing VDA prediction methods, that is, VDA-RLSBN (Peng et al., 2020), VDA-KATZ (Zhou L. et al., 2020), VDA-RWR (Peng et al., 2021) and one network-based microRNAanticancer drug association prediction model SMiR-NBI (Li et al., 2016) on three VDA datasets. The experimental results show that VDA-GBNNR computes the best AUC, accuracy, sensitivity, and specificity. In addition, the inferred top six antiviral drugs against SARS-CoV-2, remdesivir, favipiravir, ribavirin, mycophenolic acid, niclosamide, and mizoribine come together in any two datasets. Molecular dockings between the six compounds and the junction of SARS-CoV-2 S protein and human ACE2 are implemented to calculate molecular binding energies and identify binding sites between them. Niclosamide and mizoribine are found to have the strongest binding energy of −8.06 and −7.06 kcal/mol with the junction, respectively.

MATERIALS AND METHODS
In this study, inspired by the works provided by ; Chen et al. (2018b), Yang et al. (2019), and Liu et al. (2020) we develop a VDA prediction framework (VDA-GBNNR) to screen underlying drugs for inhibiting COVID-19. First, virus similarity and drug similarity are calculated based on virus complete genomic sequences, drug chemical structures, and Gaussian Association Profiles (AP). Second, a BNNR model is developed to complete unknown associations between viruses and drugs. Finally, the predicted top anti-SARS-CoV-2 drugs are docked with the junction of the S protein bound with ACE2. The overall workflow is shown in Figure 1.

Datasets
Three VDA datasets are obtained from Peng et al. (2021). Each dataset contains virus similarity matrix, drug similarity matrix, and VDA matrix. In each dataset, virus complete genomic sequences were downloaded from the NCBI database (Coordinators, 2018), and MAFFT (Katoh et al., 2019) (a multisequence alignment tool) was utilized to compute virus sequence similarity matrix W vv . Drug chemical structures were obtained from DrugBank (Wishart et al., 2018) and RDKit (an opensource chemical information software) was used to calculate drug chemical structure similarity matrix W dd . VDA matrix W vd is achieved by searching the DrugBank, NCBI, and PubMed (Motschall and Falck-Ytter, 2005) databases. In W vd , W ij = 1 if virus v i interacts with drug d j ; otherwise, W ij = 0. Table 1 shows the details of three VDA datasets.

GAPK Similarity
For a given virus v i , the Gaussian association profile AP(v i ) is defined as the ith row of a VDA matrix W vd to describe its association information with all drugs. GAPK similarity between two viruses [i.e., (v i , v j )] is calculated by Eq. (1).
where γ v represents normalized kernel bandwidth based on bandwidth parameter γ v , and m is the number of viruses. For a given drug d i , its Gaussian association profile AP(d i ) is defined as the ith column of a VDA matrix W vd to describe its association information with all viruses. GAPK similarity between two drugs [i.e., (d i , d j )] is computed by Eq. (2): where γ d indicates normalized kernel bandwidth based on bandwidth parameter γ d , and n is the number of drugs.

Similarity Integration
Complete genomic sequence similarity W vv , chemical structure similarity W dd , and GAPK similarity (G V and G D ) are integrated to compute the final virus similarity matrix S V (Eq. 3) and drug similarity S D (Eq. 4). The parameter w is introduced to measure the importance between biological similarity and GAPK similarity.

Heterogeneous Network Construction
A heterogeneous virus-drug network is constructed by integrating virus similarity network, drug similarity network and VDA network. The edge between two viruses/drugs is weighted according to their similarity. The heterogeneous network can be represented as a bipartite graph G(V, D, E), where E (G) = {e ij } ⊆ V × D, e ij represents the edge between the virus v i and the drug d j , V and D represent virus set and drug set, respectively. The adjacency matrix of the heterogeneous network is defined as Eq. (5).
where W vd denotes known VDA matrix, M dd and M vv represent the adjacency matrices of drug similarity network and virus similarity network, respectively. Hence, the adjacency matrix can be rewritten as Eq. (6).

VDA-GBNNR Model
In three VDA datasets, known VDAs in the matrix W vd account for about 10.26, 8.72, and 5.90% among all possible virus-drug pairs, respectively. That is, the majority of virus-drug pairs are unlabeled and need to be completed. Therefore, we aim to complete unknown elements through a bounded nuclear norm regularization model. The rank of a matrix describes information redundancy, and lower rank denotes less information redundance. Indeed, VDA prediction can be represented as a low-rank matrix completion problem. Therefore, we built the following model to complete the missing association information in a VDA matrix by Eq. (7): where A is a matrix after completion, rank(·) indicates the rank of a matrix, M ∈ R (m+n) × (m+n) is a given VDA matrix, is the set of index pairs (i, j) which contains all known VDAs in M, and P is the projection operator on .
The solution of rank (A) in Eq. (7) is a non-convex problem. Based on the nuclear norm optimization provided by Candes and Recht (2013), the model Eq. (7) can be solved by Eq. (9): where ||A|| * denotes the nuclear norm of A and can be obtained by summating all singular values in A. The elements in virus similarity matrix W vv and drug similarity matrix W dd are between 0 and 1, and the elements in VDA matrix W vd are either 1 or 0. Therefore, the predicted association scores for unknown virus-drug pairs are expected   The bold values represent the best performance among five VDA prediction methods.
to be between 0 and 1. The value closer to 1 denotes bigger probability that a virus and a drug pair is linked, and vice versa. However, the elements in Eq. (9) may be any real value in (−∞, +∞). To ensure that the predicted results are within the interval of [0, 1], a bounded constraint is added to Eq. (9). In addition, there may exist data noise when evaluating virus similarities and drug similarities. To solve this problem, we build a matrix completion model with noise tolerance based on rank minimization by Eq. (10): where ||·|| F denotes Frobenius norm and ∈ indicates the noise level.
Since the noise level is unknown, it is difficult to choose the most appropriate parameters in Eq. (10). Therefore, a soft regularization term is introduced to tolerate unknown noise and reduce computational complexity. Thus, a bound nuclear norm regularization model (VDA-GBNNR) is developed to screen possible associations between viruses and drugs by Eq. (11): where α is a parameter used to balance the nuclear norm and the error term, and each element in A satisfies 0 ≤ A ij ≤ 1. Through introducing an auxiliary matrix W, Eq. (11) can be optimized using alternating direction method of multipliers defined by Eq. (12).
where the initial term A 1 = P (M). Consequently, the augmented Lagrange function can be defined by Eq. (13).
where B denotes the Lagrange multiplier and β > 0 indicates the penalty parameter. At each iteration, VDA-GBNNR alternatively calculates W k+1 , A k+1 and B k+1 by fixing other two terms. The specific solutions about W k+1 , A k+1 and B k+1 were provided by Yang et al. (2019). VDA-GBNNR can update VDA matrix W * vd by completing the missing elements in W vd .

Evaluation Metrics
In this study, sensitivity, specificity, accuracy, and AUC are used to evaluate the performance of our proposed VDA-GBNNR method. Accuracy denotes the proportion of correctly inferred positive and negative VDAs to all positive and negative VDAs. Sensitivity denotes the ratio of correctly predicted positive VDAs to all positive VDAs. Specificity represents the rate of correctly identified negative VDAs to all negative VDAs. The details are defined by Eqs. (14)

Experimental Settings and Parameter Selection
In the experiment, we conduct fivefold cross validation for 10 times to evaluate the performance of VDA-GBNNR. Eighty percent of elements in the VDA matrix W vd are randomly selected as the training set and the remaining is used the testing set. Parameters α, β, w, and γ are set in the range of [0.1, 1, 10, 100], [0.1, 1, 10, 100], [0, 0.1, 0.2, ..., 1], and [0.5, 1, 1.5, ..., 3], respectively. The optimal parameter combination is obtained by grid search. Table 2 shows parameter combinations when the top 10 AUCs are confirmed based fivefold cross validation for 10 times. Table 3 shows the optimal parameter settings for VDA-KATZ, VDA-RLSBN, VDA-RWR, and VDA-GBNNR based on grid search. The four methods obtain the best performance when parameters are set the corresponding values provided by Table 3. In the SMiR-NBI method, there is no parameter to set.

Performance Comparison With Other Methods
To evaluate the performance of VDA-GBNNR, we compare it with four classical association prediction methods based on fivefold cross validation, that is, SMiR-NBI, VDA-RLSBN, VDA-KATZ, and VDA-RWR. SMiR-NBI prioritized miRNAs as possible biomarkers to depict their responses to anticancer drug therapy on a heterogeneous drugs-miRNA network. VDA-KATZ, VDA-RLSBN, and VDA-RWR are the newest three VDA prediction algorithms. The experiments are implemented for 100 times and the average performance is taken as the final results. Table 4 gives sensitivities, specificities, accuracies, and AUCs of the five VDA identification models on the three VDA datasets.

Performance of BNNR With Gaussian Kernel or Not
In this section, we investigate the effect of GAPK on the prediction performance. In the BNNR model (VDA-BNNR) without GAPK, the adjacent matrix M = W dd W T vd W vd W vv is used to represent the heterogeneous virus-drug network where W vv and W dd denote virus complete genomic sequence similarity and drug chemical structure similarity, respectively. The comparison results are illustrated in Figure 3. From Figure 3, we can observe that VDA-GBNNR improves the prediction performance compared to VDA-BNNR.

Case Study
After confirming the prediction performance of VDA-GBNNR, we further discover potential available drugs applied to the treatment of COVID-19. Small molecules with the top 10 association scores with SARS-CoV-2 are shown in Tables 5-7. In addition, we search the recent documents and find that all the inferred top 10 chemical agents have been reported by COVID-19-related publications in the three datasets. In particular, remdesivir, favipiravir, ribavirin, mycophenolic acid, niclosamide and mizoribine come together in any two datasets. Remdesivir is an intravenous nucleotide prodrug bound with viral RdRp and can inhibit viral replication through premature termination of RNA transcription (Amirian and Levy, 2020). It has been validated to be able to inhibit the replication of SARS-CoV and MERS-CoV (Sheahan et al., 2017). The drug has obtained an emergency use authorization to treat the patients infected by SARS-CoV-2 from the Food and Drug Administration (FDA) (Moirangthem and Surbala, 2020).
Favipiravir is a guanosine analog targeting RdRp and blocking the rhinoviruses replication (Kocayigit et al., 2021). The drug is effective against a large-scale grippe virus types and subtypes (Furuta et al., 2017). Two recent open-label experiments discovered its therapeutic effective on COVID-19 (Cai et al., 2020;Prakash et al., 2020). It has been also applied to the treatment of COVID-19 by the Japanese government (Hoang and Anh, 2020), and exhibited hopeful results in clinical researches in Russia and China. More importantly, its anti-SARS-CoV-2 experiments are conducting in the United States, the United Kingdom and India (Joshi et al., 2020).
Ribavirin is an antiviral drug against hepatitis C virus and other RNA viruses . It can inhibit viral RNA synthesis and hander normal viral replication by binding to viral RNA (Kim et al., 2019). It combines closely with RdRp and is a powerful antiviral drug against SARS-CoV-2 (Elfiky, 2020b). Clinical trials about the treatment of ribavirin on the patient with COVID-19 have been conducted (Zarandi et al., 2021).
Mycophenolic acid is an antibiotic extracted from penicillium species. Mycophenolic acid can block the production of guanosine monophosphate by inhibiting inosine monophosphate dehydrogenase. Mycophenolic acid is also an immunosuppressive drug with a strong anti-proliferation effect (Kim et al., 2019). Studies suggest that mycophenolic acid has a potential inhibitory effect on the enzyme reproduced by SARS-CoV-2 (Muhseen et al., 2021).
Mizoribine is an imidazole nucleoside antibiotic isolated from bacillus brucellosis (Mizuno et al., 1974). Mizoribine lacks antimicrobial activity, however, it has powerful immunosuppressive activity and has been used in clinic after kidney transplantation (Tajima et al., 1984). It may be used as a potentially beneficial drug for hypertensive patients infected by COVID-19 (Jakovac, 2020).

Molecular Docking
Inspired by molecular docking provided by Peng et al. (2021), we further investigate the binding energy between the predicted six anti-SARS-CoV-2 drugs and the junction of the S protein-ACE2 interface by molecular docking. The chemical structures of the overlapping six small molecules are achieved from DrugBank in the PDB format. The PDB file is then converted to the PDBQT format by AutoDock4 (Morris et al., 2009). The structure of the S protein bound with ACE2 is downloaded from the RCSB Protein Data Bank (Burley et al., 2017), and the PDBID number is 6M0J. The predicted drugs are then regarded as ligands, and the junction of the S protein-ACE2 interface is regarded as receptors. Table 8 illustrates molecular docking results including molecular binding energies and binding sites. It can be observed that the six drugs have higher molecular docking energies with the junction of the S protein-ACE2 interface. More importantly, the key residues between the six small molecules and the interface junction are Q493 and K68 for remdesivir, G496 and K353 for favipiravir, G496, Q493, R403, and K353 for ribavirin, G496, F390, and K353 for mycophenolic acid, E37 for niclosamide, and N439, Q506, N330, and Q325 for mizoribine. In addition, the results suggest that G496 and K353 may be potential key residues between anti-SARS-CoV-2 drugs and the interface junction. Figure 4 demonstrates the dockings between remdesivir, favipiravir, ribavirin, mycophenolic acid, niclosamide and mizoribine and the junction of the S protein-ACE2 interface. In the figure, cyan indicates the S protein, green represents human ACE2, and the circles in each subgraph denotes key residues.

DISCUSSION
With the rapid spread of SARS-CoV-2, it is vital to screen specific drugs for patients infected by COVID-19. Although vaccines have been launched, it is well known that the effect of vaccines for SARS-CoV-2 is mainly prevention, rather than treatment. After vaccination, it cannot completely ensure that people will not be infected by SARS-CoV-2. Therefore, it is an urgent task to find possible clues of treatment for patients with the infection of COVID-19. Furthermore, the research and development of a new drug need consume a vast of time and resource. Hence, it may be a more appropriate strategy to screen anti-SARS-CoV-2 drug candidates from existing FDAapproved drugs.
In this manuscript, we arrange three VDA datasets including VDA matrix, virus complete genomic sequence similarity matrix, and drug chemical structure similarity matrix. First, virus GAPK similarity and drug GAPK similarity are computed. Second, the final similarity is obtained by integrating biological similarity and GAPK similarity. Third, a bounded nuclear norm regularization model is developed to predict anti-SARS-CoV-2 drug candidates. Finally, molecular docking is applied to measure the binding capabilities between the predicted anti-SARS-CoV-2 drugs and the junction of the spike protein-ACE2 interface. Although datasets used in this work is relatively small, we used three VDA datasets to evaluate the performance of VDA-GBNNR. More importantly, antiviral drugs against COVID-19 screened by the proposed VDA-GBNNR method come together in at least two VDA dataset instead of one dataset.
VDA-GBNNR can obtain the best prediction performance. It may mainly be the following advantages. First, GAPK similarity can effectively depict the association similarity between two viruses (or drugs). Second, the proposed bound nuclear norm regularization model can reduce the overfitting problem. Finally, range constraint makes all the predicted association scores can be within a predefined range.

CONCLUSION
In this study, we integrate the heterogeneous virus-drug network and design a VDA prediction model based on bounded nuclear norm regularization to explore potential anti-SARS-CoV-2 drugs. Experimental results show that VDA-GBNNR is an effective VDA identification method. The six FDA-approved drug candidates are found to be potential antiviral drugs against SARS-CoV-2. We hope that the inferred drugs can contribute to the inhibition of COVID-19.
In the future, first, we will integrate various data resource and build larger dataset. Second, we will consider different computational models (Gaur and Chaturvedi, 2019;Liu et al., 2019;Gutiérrez-Cárdenas and Wang, 2021), for example, matrix decomposition (Chen et al., 2018a), bidirectional label propagation , network distance analysis , internal confidence-based collaborative filtering recommendation (Wang et al., 2020b), sparse subspace learning with Laplacian regularization  to search possible associations between viruses and drugs. Third, we will try to use deep learning methods to predict drugs for COVID-19 Alakus and Turkoglu, 2021;Kang et al., 2021). Finally, we will also investigate the relationship between antimicrobial compounds and COVID-19.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.