Identifying Effective Antiviral Drugs Against SARS-CoV-2 by Drug Repositioning Through Virus-Drug Association Prediction

A new coronavirus called SARS-CoV-2 is rapidly spreading around the world. Over 16,558,289 infected cases with 656,093 deaths have been reported by July 29th, 2020, and it is urgent to identify effective antiviral treatment. In this study, potential antiviral drugs against SARS-CoV-2 were identified by drug repositioning through Virus-Drug Association (VDA) prediction. 96 VDAs between 11 types of viruses similar to SARS-CoV-2 and 78 small molecular drugs were extracted and a novel VDA identification model (VDA-RLSBN) was developed to find potential VDAs related to SARS-CoV-2. The model integrated the complete genome sequences of the viruses, the chemical structures of drugs, a regularized least squared classifier (RLS), a bipartite local model, and the neighbor association information. Compared with five state-of-the-art association prediction methods, VDA-RLSBN obtained the best AUC of 0.9085 and AUPR of 0.6630. Ribavirin was predicted to be the best small molecular drug, with a higher molecular binding energy of −6.39 kcal/mol with human angiotensin-converting enzyme 2 (ACE2), followed by remdesivir (−7.4 kcal/mol), mycophenolic acid (−5.35 kcal/mol), and chloroquine (−6.29 kcal/mol). Ribavirin, remdesivir, and chloroquine have been under clinical trials or supported by recent works. In addition, for the first time, our results suggested several antiviral drugs, such as FK506, with molecular binding energies of −11.06 and −10.1 kcal/mol with ACE2 and the spike protein, respectively, could be potentially used to prevent SARS-CoV-2 and remains to further validation. Drug repositioning through virus–drug association prediction can effectively find potential antiviral drugs against SARS-CoV-2.


INTRODUCTION
Last December 2019, a novel coronavirus called SARS-CoV-2 by the World Health Organization (WHO), first found in Wuhan, China, was rapidly spreading around the world (Kaiser et al., 2020;Sanche et al., 2020). The SARS-CoV-2 outbreak was declared as a global public health emergency by WHO, and a total of 16,558,289 cases have been confirmed with another 656,093 deaths throughout the world by July 29th, 2020 (World Health Organization [WHO], 2020). SARS-CoV-2 caused a severe acute respiratory syndrome named COVID-19, and no special vaccine or antiviral drug against SARS-CoV-2 has been found at present (Lu, 2020;Wang et al., 2020c). Therefore, finding a special antiviral drug as soon as possible is urgent to stop the spread of SARS-CoV-2 (Lu, 2020;Zhang et al., 2020a).
However, designing a new drug to treat COVID-19 in a short time is almost impossible (Zhang et al., 2020a). One of the best strategies is drug repositioning (Chen et al., 2012(Chen et al., , 2016Peng et al., 2017a;Beck et al., 2020). By repositioning already commercialized drugs, the undesired effects can be inferred to find new uses for these drugs. This strategy can thus greatly shorten the time required for an antiviral drug against SARS-CoV-2.
Although little is known about SARS-CoV-2, its complete genome sequence is strongly homologous to SARS-CoV (Huang et al., 2020;Morse et al., 2020). Therefore, in this study, to prioritize available FDA-approved antiviral drugs against SARS-COV-2 for further clinical trials, 11 well-studied viruses similar to SARS-CoV-2 were selected and 96 virus-drug associations (VDAs) with these 11 viruses were integrated. Regularized least squared classifier (RLS), bipartite local model (BLM), and neighbor association information were applied in our new algorithm named VDA-RLSBN to find novel VDAs for new virus (especially for SARS-CoV-2) or new drug. The results showed that ribavirin, remdesivir, and chloroquine may be antiviral drugs against SARS-CoV-2.
Molecular docking techniques investigate the behavior of small molecular drugs in the binding site of a target protein.
As more target protein structures are confirmed experimentally, molecular docking approaches are widely applied to drug design . AutoDock (Goodsell et al., 1996;Ruyck et al., 2016) is an available software applied to identify the bound conformations of a small molecular drug to a macromolecular target. The AutoDock affinity scoring function is applied to rank the candidate poses based on the sum of the van der Waals and electrostatic energies. We conducted molecular docking between the predicted top 10 antiviral drugs against SARS-CoV-2 and two target proteins including the spike protein of SARS-CoV-2 and human angiotensin-converting enzyme 2 (ACE2) molecule (Wang et al., 2020a). The molecular binding energies between the above three drugs and ACE2 are ribavirin with −6.39 kcal/mol, remdesivir with −7.4 kcal/mol, and chloroquine with −6.29 kcal/mol. These three small molecules have been under clinical trial or supported by recent publications. In addition, we found that FK506 shows higher molecular binding energies of −10.1 kcal/mol and −11.06 kcal/mol with these two targets, which suggest that FK506 may be applied to stop COVID-19 although there is no report about its association with SARS-CoV-2.

Problem Formalization
Bleakley and Yamanishi (2009) represented a drug-target interaction network as a bipartite graph and developed a BLMbased method to predict possible drug-target interactions. The proposed method first inferred targets of a given FDA-approved drug and drugs targeting a known protein and then combined these two independent predictions. The results demonstrated the excellent performance of BLM. Similar to the drug-target interaction network, the VDA network can also be taken as a bipartite graph. Results in this study are thus presented to evaluate the prediction performance in each of the following four cases for a given putative virus-drug pair: • The virus with at least one known drug and the drug with at least one known virus. • The virus with at least one known drug and the drug without any known virus (new drug). • The virus without any known drug (new virus) and the drug with at least one known virus. • New virus and new drug.
Based on these four cases, we represent a VDA network as a bipartite graph and thus the predicted VDA matrix Y pre n×m can be wheren = n − n cv is the number of new viruses (for example, SARS-CoV-2), andm = m − m cv is the number of new drugs. Y 1 represents VDAs from n cv existing viruses and m cv existing drugs, Y 2 represents VDAs from n cv existing viruses andm new drugs, Y 3 denotes VDAs fromn new viruses and m cv existing drugs, and Y 4 denotes VDAs fromn new viruses andm new drugs. Our aims are to identify potential VDAs in the subnetwork Y 1 as well as in Y 2 , Y 3 , and Y 4 . Figure 1 shows the flowchart of VDA-RLSBN.

Regularized Least Square
To infer possible VDA candidates, we develop an RLSbased VDA identification model (VDA-RLS) to compute the association profileŷ for each virus-drug pair: where K represents the kernel matrix, y denotes the original association profile, and σ is a regularization parameter.
To compute VDA matrix Y 1 from n cv existing viruses and m cv existing drugs, we consider the ensemble of independent virus-based prediction and drug-based prediction with RLS. The solution of Y 1 can be thus divided down into the following four steps: Step 1 For a given virus v i with at least one known association, its new association profileŷ v i can be computed from its original association profile y v i and the kernel matrix K v based on RLS classifier:ŷ /2, and y v i represents the ith row of Y ori . We can compute virus-based VDA matrix Y v by Eq. (4).
Step 2 For a given drug d j with at least one known association, its new association profileŷ d j can be computed from its original association profile y d j and the kernel matrix K d based on RLS classifier:ŷ Step 3 Integrate Y v with the element y v ij and Y d with the element y d ij to compute the predicted VDA matrix Y RLS based on RLS: Step 4 Obtain Y 1 by Eq. (7):

Regularized Least Square With Neighbor Association Information
We can identify novel VDAs between existing viruses and existing drugs, or known/new viruses and new/existing drugs based on RLS and BLM. However, VDA-RLS was not able to predict associations between new viruses and new drugs. To solve this problem, we developed a VDA prediction model (VDA-RLSBN) by integrating neighbor association information into the RLS model. Based on the "guilt-by-association" method, similar viruses/drugs tend to associate with similar drugs/viruses, so the association profile of an unknown virus could be possibly found by its neighbors' association information. Viruses highly similar to a new virus can be considered as its neighbors. Since the new virus has no associated drugs (i.e., its current association profile is a vector with all the elements of 0), complete genome sequence similarity of viruses is applied to define its neighbors.
For a new virus v i , its association weight with a drug d j can be computed by its neighbors' associations with d j and its association profile a v i j is defined as Eq. (8): where S v ik is the complete genome sequence similarity between two viruses v i and v j . a v i j > 0 when the jth associated drug d j exists, i.e., y ori kj > 0 for at least one k and a v i j = 0 when the jth associated drug d j is new, i.e., y ori kj = 0 for all k. a v i j is normalized to make its value in the range of [0, 1] by Eq. (9): Also, an independent virus-based association profile y v i for a virus-drug pair can be represented as Eq. (10): Similarly, for a new drug d j , its association profile y d j for the same virus-drug pair can be represented as Eq. (10): where a d j denotes the neighbor association profile of d j . The final VDA network can be represented as where Y VDA−RLSBN can be obtained by Eqs (10), (11), and (6). Specially, the VDA matrix related to SARS-CoV-2 can be obtained from Y VDA− RLSBN 3 . Finally, we used AutoDock to analyze the druggability of the predicted top 10 chemical agents and their binding activities with two target proteins including the SARS-CoV-2 spike protein and ACE2.

Evaluation Metrics and Experimental Settings
In this section, we performed extensive experiments to evaluate our proposed VDA-RLSBN method. We compared VDA-RLSBN with five state-of-the-art machine learning-based models, including LRLSHMDA (Wang et al., 2017), SMiR-NBI (Li et al., 2016), CMF (Zheng et al., 2013), NetLapRLS (Xia et al., 2010), and WNN-GIP (Laarhoven and Marchiori, 2013 Sensitivity, specificity, accuracy, AUC, and AUPR are widely applied to evaluate various machine learning-based models. In this study, we used these five metrics to measure the performance of five state-of-the-art models and VDA-RLSBN. Accuracy denotes the ratio of correctly predicted VDAs to all VDAs. Sensitivity denotes the ratio of correctly predicted positive VDAs to all positive VDAs. Specificity is the ratio of correctly predicted negative VDAs to all negative VDAs. AUC is the area under the ROC curve. The ROC curve can be plotted by a true positive rate [TPR, i.e., Eq. (13) where TPR represents the ratio of correctly predicted positive VDAs to all positive VDAs and FPR represents the ratio of mistakenly predicted positive VDAs to all negative VDAs. AUPR is the area under the PR curve. The PR curve can be plotted by precision and recall. Precision represents the ratio of correctly predicted positive VDAs to all predicted positive VDAs, and recall represents the ratio of correctly predicted positive VDAs to all positive VDAs.
where TP, FP, TN, and FN represent true positive, false positive, true negative, and false negative, respectively. Generally, larger AUC/AUPR value denotes better performances. We used five-fold cross validation to train our proposed VDA-RLSBN method. In each round, 80% of VDAs in the known VDA network was used as a training set and the remaining 20% of VDAs was the test set. The experiments were performed 100 times, and the final performance was on average over 100 times. In each round, a virus/drug is new if all of its associated drugs/viruses are selected as a test set.

Comparison With Five State-of-the-Art Methods
The performance of our proposed VDA-RLSBN and these five machine learning-based models is shown in Table 2. The best performance in each row is shown in bold in Table 2. LRLSHMDA (Wang et al., 2017), NetLapRLS (Xia et al., 2010), WNN-GIP (Laarhoven and Marchiori, 2013), and VDA-RLSBN are RLS-based methods. LRLSHMDA (Wang et al., 2017) used Laplacian RLS to tackle microbe-disease association prediction, NetLapRLS (Xia et al., 2010) extended the standard Laplacian RLS incorporating drug-target network, and WNN-GIP (Laarhoven and Marchiori, 2013) integrated a simple weighted nearest neighbor method and Gaussian kernels into RLS. SMiR-NBI (Li et al., 2016) constructed a heterogeneous network connecting genes, drugs, and miRNAs and then combined a network-based inference algorithm to characterize the responses of anticancer drugs. CMF (Zheng et al., 2013) was a collaborative matrix factorization-based drug-target interaction prediction method.
The results showed that VDA-RLSBN outperformed LRLSHMDA, SMiR-NBI, CMF, and WNN-GIP in terms of five evaluation metrics. Although the specificity value of VDA-RLSBN is slightly lower compared to NetLapRLS, its AUC and AUPR are significantly higher than NetLapRLS. Since AUC and AUPR are more important evaluation metrics compared to other three measurements, VDA-RLSBN, with the highest AUC and AUPR, is considered to be better in finding potential VDAs of novel viruses. Among six VDA prediction methods, LRLSHMDA, NetLapRLS, WNN-GIP, and VDA-RLSBN are RLS-based methods. VDA-RLSBN obtained better performance than the other three methods. Although other RLS-based prediction methods have good performance, they cannot predict the relationship between new drug candidates and new candidate targets. If a virus/drug has no known drug/virus, it is a new virus/drug. Since there are many new viruses/drugs, our proposed VDA-RLSBN approach learned labeled information from neighbors and used the information to train the model and make predictions. So VDA-RLSBN obtained better performance compared to other RLS-based methods. The results suggest that RLS combining neighbor association information can better identify new VDAs.

Case Study
The prediction performance of the proposed VDA-RLSBN method was confirmed in the last section. As a means to finding potential antiviral drugs against SARS-CoV-2, small molecular drugs were ranked based on the association scores with SARS-CoV-2 and the top 10 drugs with the highest scores were listed in Table 3. Among the predicted top 10 VDAs, 4 VDAs are reported by related literature, that is, 40% small molecular drugs are confirmed to be possible antiviral drugs against SARS-CoV-2.
Ribavirin is inferred to be the best small molecular drug against SARS-CoV-2. It is a broad-spectrum antiviral drug that can inhibit the replication of respiratory syncytial virus (Laarhoven and Marchiori, 2013). For example, it has been applied to prevent respiratory syncytial virus infection in lung transplant recipients (Hayden and Whitley, 2020) and specially used to treat SARS-CoV and MERS-CoV (Permpalung et al., 2019). Similar to SARS-CoV and MERS-CoV, SARS-CoV-2 is a respiratory syndrome betacoronavirus and may cause serious respiratory diseases. A few studies (Li and De, 2020;Wang et al., 2020b) have reported that ribavirin may take an inhibitory effect on SARS-CoV-2. More importantly, remdesivir and chloroquine are inferred to be other effective antiviral drugs. Wang et al. (2020b) presented that remdesivir and chloroquine can effectively inhibit SARS-CoV-2 and they have been used in the clinical stage. These results suggest that ribavirin, remdesivir, and chloroquine may be applied to the treatment of COVID-19.

Molecular Docking
We conducted molecular docking between the predicted top 10 small molecules and the SARS-CoV-2 spike protein/ACE2. The chemical structures of these small molecular drugs were downloaded from the DrugBank database. The structure of the virus spike protein was obtained based on homologous modeling from Zhang Lab (2020). The structure of ACE2 can be downloaded from the RCSB Protein Data Bank (Helen et al., 2000) (ID:6MJ0). AutoDock used the genetic algorithm as a search algorithm and selected the entire protein as a grid box. The molecular binding energies between the predicted top 10 small molecules and these two target proteins are described in Table 4. The results show that the predicted top 10 drugs have higher molecular binding activities with the spike protein and/or ACE2. For example, ribavirin, which is predicted to be the most possible drug against SARS-CoV-2, has a higher molecular binding energy of −6.39 kcal/mol with ACE2. In addition, remdesivir, mycophenolic acid, and chloroquine are predicted to have higher association scores with SARS-CoV-2. These three small molecular drugs showed higher binding energies of −7.4, −5.35, and −6.29 kcal/mol with ACE2, respectively. More importantly, ribavirin, remdesivir, and chloroquine have been used for the treatment of SARS, which has about 79% sequence identity with SARS-CoV-2. So the potential use of these three small molecules as a treatment for COVID-19 may be under investigation. Interestingly, FK506 is an immunesuppressive drug and mainly used to decrease the activity of the immune system after organ transplant. The molecular docking results show that  FK506 has a strong molecular binding energy of −11.06 and −10.1 kcal/mol with ACE2 and the spike protein, respectively, although it has a slightly lower rank in the predicted drugs against SARS-CoV-2 by VDA-RLSBN. Figures 2, 3 represent the docking results about four small molecules including ribavirin, remdesivir, chloroquine, and FK506 and two target proteins. The subfigure in each circle denotes the residues at the binding site of the SARS-CoV-2 spike protein/ACE2 and their corresponding orientations. For example, the amino acids L387, L368, P565, and V209 were inferred to be the key residues for ribavirin binding to the SARS-CoV-2 spike protein/ACE2 while L828, L849, W1212, N163, and N194 were inferred as the key residues for FK506 binding to the SARS-CoV-2 spike protein/ACE2.

DISCUSSION
With the spreading of SARS-CoV-2 around the world, the incidence rate is rapidly increasing, and lack of effective treatment options made it a public health threat. Therefore, various strategies are being exploited. Drug repositioning, aiming to offer a potentially valuable opportunity to find new clues of treatment for existing FDA-approved drugs, provides a far more rapid option to the clinic than novel drug design.
In the proposed VDA-RLSBN method, we predicted VDA candidates based on RLS and BLM. However, SARS-CoV-2 is a new coronavirus and has no associated drugs verified by biomedical experiments. We cannot find potential VDAs related to the virus by RLS and BLM. Therefore, we used association information of other RNA viruses similar to SARS-CoV-2 and similarities between SARS-CoV-2 and these viruses. The originality of our proposed method remains, predicting possible antiviral drugs against SARS-CoV-2 by drug repositioning through virus-drug association identification. More importantly, we integrated neighbor association information to RLS to find associated chemical agents for the new virus. The experimental results showed the merits of the VDA-RLSBN model. Higher AUC and AUPR indicated that the predicted antiviral drugs against SARS-CoV-2 are likely to be effective for preventing the rapid transmission of COVID-19.
VDA-RLSBN can obtain superior performance regardless of AUC, AUPR, accuracy, or sensitivity. This observation may be attributed to the following two features. First, VDA-RLSBN divides new VDA prediction into four cases based on BLM, a state-of-the-art method applied in various association prediction areas. More importantly, neighbor association information can help to identify possible antiviral drugs against new viruses (for example, SARS-CoV-2).
The proposed VDA-RLSBN approach is also helpful in designing and interpreting pharmacological experiments. The method can be further applied to select potential antiviral drugs against other new viruses, for example, infectious bronchitis virus.

CONCLUSION
In this study, we considered the clues of treatment from SARS-CoV, MERS-CoV, and other diseases caused by singlestrand RNA viruses and developed a VDA prediction method based on RLS, BLM, and neighbor association information. VDA-RLSBN inferred commercially available small molecular drugs that could be applied to experimental therapy options against SARS-CoV-2. We conducted molecular docking between the predicted four chemical compounds including ribavirin, remdesivir, chloroquine, and FK506 and two target proteins including the spike protein and ACE2. The results show that ribavirin, remdesivir, and chloroquine have better molecular binding activities with ACE2 and may be the best small molecular drugs against SARS-CoV-2. In addition, we found that several antiviral drugs, such as FK506, could be used to combat COVID-19. Nevertheless, the 4 predicted drugs ranked 1, 2, 4, and 6 have been supported by recent works. We hope that our predicted small molecules may be helpful in the prevention of the transmission of SARS-CoV-2.
In the future, we will develop ensemble frameworks Peng et al., 2020) and positive-unlabeled learning methods (Lan et al., 2016a;Peng et al., 2017b) to further improve the prediction performance. More importantly, we will enlarge the existing dataset. We will also integrate various biological data including long noncoding RNA (Lan et al., 2017;Zhao et al., 2018;Liu et al., 2020) and disease symptom information (Lan et al., 2016b).

DATA AVAILABILITY STATEMENT
All datasets presented in this study are included in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
LP and XT contributed equally to this work. LP, XT, JY, and LZ designed the VDA-RLSBN method. XT and MK ran VDA-RLSBN. XT wrote the original manuscript. LP, TL, and JY revised the original draft. LS conducted molecular docking for the predicted results. LP, GT, JY, and LZ discussed the proposed method and gave further research. All authors read and approved the final manuscript.