Inhibition of SARS-CoV-2 3CL Mpro by Natural and Synthetic Inhibitors: Potential Implication for Vaccine Production Against COVID-19

COVID-19 has created a pandemic situation all over the world. It has spread in nearly every continent. Researchers all over the world are trying to produce an effective vaccine against this virus, however; no specific treatment for COVID-19 has been discovered -so far. The current work describes the inhibition study of the SARS-CoV-2 main proteinase or 3CL Mpro by natural and synthetic inhibitors, which include 2S albumin and flocculating protein from Moringa oleifera (M. oleifera) and Suramin. Molecular Docking study was carried out using the programs like AutoDock 4.0, HADDOCK2.4, patchdock, pardock, and firedock. The global binding energy of Suramin, 2S albumin, and flocculating proteins were −41.96, −9.12, and −14.78 kJ/mol, respectively. The docking analysis indicates that all three inhibitors bind at the junction of domains II and III. The catalytic function of 3CL Mpro is dependent on its dimeric form, and the flexibility of domain III is considered important for this dimerization. Our study showed that all three inhibitors reduce this flexibility and restrict their motion. The decrease in flexibility of domain III was further confirmed by analysis coming from Molecular dynamic simulation. The analysis results indicate that the temperature B-factor of the enzyme decreases tremendously when the inhibitors bind to it. This study will further explore the possibility of producing an effective treatment against COVID-19.


INTRODUCTION
A new virus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was identified in patients in China in December 2019 (Kotta et al., 2020). It spread throughout the country and world quickly and infected millions of people all over the world (Kneller et al., 2020). Till now (November 2020), 55.6 million people have been detected with this virus of which 35.86 million have been recovered and 1.34 million have died (Johns Hopkins University). The disease produced by SARS-CoV-2 is termed COVID-19 (Hussin et al., 2020;, which is a short name given to this disease by the World Health Organization (WHO, 2020).
One way to prevent the COVID-19 from spreading among people is to keep a suitable distance of 1.5-2 meter as recommended by WHO Kotta et al., 2020), although a recent study has suggested that the virus can travel more than 2 m in the air (Setti et al., 2020;van Doremalen et al., 2020). The lockdown option is used in all countries of the world to achieve this social distancing and it has worked tremendously like in China Wu et al., 2020).
The genomics and proteomics of SARS-COV2 have been described in the literature (Vandelli et al., 2020). The structure of this new virus is composed of single-stranded ribonucleic acid (RNA) and displays high sequence identity to other betacoronavirus such as SARS-CoV and MERS-CoV (Middle East respiratory syndrome coronavirus) (Cascella et al., 2020). These viruses use a specific protein named spike (S) protein to adhere specifically to the angiotensin-converting enzyme (ACE2) on the host cell (Park et al., 2019;Turoòová et al., 2020). Besides spike glycol protein, the SARS-COV2 contains proteins like 3CL M pro [also called the main proteinase (M pro )] and RNA-dependent RNA polymerase (RdRp) (Jeong et al., 2020).
The life cycle of SARS-COV2 begins when the virus infects the host cell through the interaction of S protein with the angiotensin I-converting enzyme-2 (ACE2) (V'kovski et al., 2020). The S protein has two subunits called S1 and S2 , S1 it uses to attach to the N-terminal of ACE2, and the S2 subunit assists in the binding of the protein to the host membrane. This results in the binding of the virus to the membrane of the host cell. Consequently, the disruption of the membrane of the host cell occurs and endocytosis takes place (V'kovski et al., 2020). The furin proteinase and transmembrane serine proteinase 2 of the host cells cause the cleavage of S protein at the S1/S2 boundary position (V'kovski et al., 2020), which allow the entry of transmembrane serine proteinase 2-dependent entrance to the host cells (Belouzard et al., 2009;Hoffmann et al., 2020;Walls et al., 2020). The polycistronic RNA of the virus is released into the cytoplasm. The ribosomal-1 frameshifts then translates the replicase gene either into replicase polyprotein pp1a or into pp1ab (∼750 kDa, nsp1-16). This process occurs near the 3 -end of ORF 1a. This autoproteolytic cleavage results into 16 non-structural proteins (NS) by two ORF1a encoded proteinase domains (Brierley et al., 1989;Herold et al., 1993;Thiel et al., 2001Thiel et al., , 2003Harcourt et al., 2004;Prentice et al., 2004;Ziebuhr, 2004). The two other proteinases assisting in these proteolytic cleavages include the main proteinase M pro (3CL M pro ) and papain-like proteinase (PL pro ) (Hegyi and Ziebuhr, 2002). The polyprotein pp1ab is cleaved by M pro (Ziebuhr et al., 2000;Hegyi and Ziebuhr, 2002). The replication (production of the entire genome) or transcription (synthesis of intermittent mRNAs) is intervened by cytoplasmic enzyme complex termed replicase-transcriptase complex (Gorbalenya et al., 2006;Pasternak et al., 2006;Sawicki et al., 2007). The key proteins (structural and accessory) are translated from these transcripts; consequently the viruses are released into the cell (V'kovski et al., 2020).
The two important proteins in the life cycle of SARS-CoV-2, are the S protein and 3CL M pro (Kneller et al., 2020;V'kovski et al., 2020). As discussed earlier, the S protein help the virus to binds to the host cell and to facilitate its entry to the host cell (Duan et al., 2020), while 3CL M pro or the main proteinase assists in the processing of the polyproteins (Kneller et al., 2020). Owing to the main roles of these two proteins, researchers from all over the world are targeting these proteins to find out a new treatment for COVID-19 (Kotta et al., 2020). Taking this into consideration, the current work has been designed to test the efficacy of natural and synthetic inhibitors (2S albumin and flocculating proteins of Moringa oleifera and Suramin), against 3CL M pro and discover a new treatment for this pandemic disease.

Atomic Structure of SARS-CoV-2 3CL M pro and Ligands
The atomic coordinates of SARS-CoV-2 3CL M pro , Suramin, and 2S albumin were retrieved from the Protein Data Bank (PDB), with PDB IDs: 6WQF (Kneller et al., 2020), 6CE2 (SVR) (Salvador et al., 2018), and 5DOM (Ullah et al., 2015). The structure of flocculating protein was obtained as a model using the Swiss Model (Waterhouse et al., 2018). The three-dimensional atomic structure of 2S albumin from M. oleifera was used as a template (74% sequence identity).

Protein and Ligand Preparation for Docking
The ligands and crystallographic water molecules were removed from the protein and the H-atoms were added. The ionization states of the atoms were kept in the ligand as mentioned in the database. The optimization of the ligand geometry was done using the AM1 method (Dewar et al., 1985). The partial charges of the ligands were calculated by AM1-BCC method (Jakalian et al., 2002). The atoms type, bond angle, dihedral, and van der Waals parameters for the ligands were assigned using the general AMBER force field (GAFF) method (Wang et al., 2004).

Protein and Ligands Binding Interactions
The interactions (hydrogen bonds and hydrophobic contacts) between 3CL M pro was determined using LigPlot (Wallace et al., 1995) from PDBsum web server (Laskowski et al., 2018).

Molecular Dynamic Simulation
The MDMoby and MDweb programs (Hospital et al., 2012), GROMACS (Berendsen et al., 1995), AMBER16 (Case et al., 2005;Maier et al., 2015) were used for Molecular Dynamic Simulation as described previously (Ullah et al., 2019;Ullah and Masood, 2020). The all-atom-protein interaction was found out using FF14SB force field (Darden et al., 1993). The online server H + + (Anandakrishnan et al., 2012) was used for the determination of the protonation states of the amino acid side chain at pH 7.0. The neutralization of the system was carried out using Cl-. The minimization of the simulation system was carried out in order to remove the clashes in the atomic position, structural errors (bond length and bond angle). This minimization was done by executing a 500-step descent (SD) minimization, accompanied by a 2 ns position restricted MD simulation with NVT and NPT ensemble separately (Zhang et al., 2013). Subsequently, it was put in a rectangular box of TIP3P water, and extended to a minimum of 20 Å from any protein atom. The system was heated gradually from 0 to 350 K for 250 ps with a constant atom number and volume. The protein was kept with a constant force of 10 kcal/mol.Å 2 . A constant atom number, pressure, and temperature (NPT) ensemble was conducted for 500 ps to attain the equilibration step. The simulation was executed for 100 ns with a 4-fs time step. The pressure was kept at 1 atm using Nose ì-Hoover Langevin Piston algorithm (Tu et al., 1995) and the temperature was kept at 300 K, using Langevin coupling (Washio et al., 2018). The long-range electrostatic interactions were calculated using the particle-mesh Ewald (PME) method (Darden et al., 1993), by retaining the cutoff distance of Van der Waals interactions at 10 Å.

Surface Charge Determination and Visualization
The protein and ligands were prepared for surface charge distribution using PDB2PQR (Dolinsky et al., 2007) and the charges were visualized using ABS Tools from PyMOL (DeLano, 2000).

RESULTS AND DISCUSSION
The Overall Structure of SARS-CoV-2 3CL M pro The three-dimensional structure of SARS-CoV-2 3CL M pro has been described by Kneller et al. (2020) with PDB ID: 6M03. The structure is composed of 306 amino acid residues and these amino acid residues fold into distinct three domains, named domains I, II, and III ( Figure 1A). Domain I is composed of amino acid residues, from, Phy8-Tyr101, and has four α-helices and seven beta-strands. Domain II (amino acid residues, Lys102-Pro184) comprises seven beta strands only, whereas domain III (amino acid residues Thr201-Val303) contains five alpha-helices only. The enzyme active site is situated at the junction of domains I and II and comprises the amino acid residues His41 and Cys145 (Figure 1B), which make a dyad (Cys145-His41) instead of the triad (His47-Asp102-Ser195) as in the case of classical serine proteinases (Ullah et al., 2018). A catalytic water molecule is also bound to His41 and helps in the catalytic process of this enzyme ( Figure 1B). The enzyme is active in the dimeric state and the flexibility of domain III is required for its dimerization (Kneller et al., 2020).

Interaction Between SARS-CoV-2 3CL M pro and Suramin
The binding energy calculated for interaction between SARS-CoV-2 3CL M pro and Suramin was ∼ −42 kcal/mol ( Table 1). All the other form of bond energies are listed in Tables 1, 2.  Global binding energy, attractive VdW (van der Walls forces), Repulsive VdW, ACE (atomic contact energy) and HB (contribution of the hydrogen bonds to global energy) for the interaction between SARS-CoV-2 3CL M pro , Suramin, 2S albumin and flocculating protein). Suramin binding site is between the two domains (Domains II and III) of SARS-CoV-2 3CL M pro (Figures 2A-C). The amino acid residues of SARS-CoV-2 3CL M pro that interact with Suramin include Lys102, Pro108, Gln110, Asp155, Glu240, and His246 (Figures 1D,E). The Kdeep results indicate that both equilibrium dissociation constant (pKd) and Gibb's free energy ( G) are large (Table 3), which further confirmed the binding between SARS-CoV-2 3CL M pro and the three ligands (Suramin,  Figure 3 and Supplementary Table 1).
Suramin is a drug that is used to treat African sleeping sickness and river blindness (Lima et al., 2009). Suramin has been shown to inhibit Human α-thrombin (Lima et al., 2009), snake venom phospholipases A2 (Salvador et al., 2018), snake venom serine proteinases (Ullah et al., 2018), severe Fever with thrombocytopenia syndrome virus nucleocapsid protein (Jiao et al., 2013), murine Norovirus RNA-dependent RNA polymerase (Mastrangelo et al., 2012), and Leishmania mexicana pyruvate kinase (Morgan et al., 2011). In most of these cases, the Suramin binds toward the C-terminal of the proteins and restrict the motion of the C-terminal (Lima et al., 2009;Ullah et al., 2018). In the current study, Suramin binds toward the N-terminal of SARS-CoV-2 3CL M pro (Figures 2A-E).

Interaction Between SARS-CoV-2 3CL M pro , 2S Albumin and Flocculating Protein
The binding energies for SARS-CoV-2 3CL M pro , 2S albumin and flocculating protein were ∼ −9.12 and ∼ −15 kJ/mol, Frontiers in Molecular Biosciences | www.frontiersin.org respectively ( Table 1). The other form of bond energies come from docking as indicated in Tables 1, 2. The amino acid residues involved in these interactions, include S139, T139, G302, Q299 (SARS-CoV-2 3CL M pro ), R143, Q97 (2S albumin) and Q15, and Q38 (Flocculating protein). The interactions between SARS-CoV-2 3CL M pro , 2S albumin, and flocculating protein are  Frontiers in Molecular Biosciences | www.frontiersin.org largely electrostatic (Figures 3A-E and 4A-E). In both cases, the ligands binding site is between the two domains (Domains II and III) of SARS-CoV-2 3CL M pro . The LigPlot analysis shows a total of three hydrogen bonds between SARS-CoV-2 3CL M pro and both 2S albumin and flocculation protein, while the number of hydrophobic interactions were 130 and 152 for 2S albumin and flocculating protein, respectively (Supplementary Figures 4, 5).

Molecular Dynamic Simulation Analysis for SARS-CoV-2 3CL M pro Alone and With the Ligands
The MD simulation analysis indicates that the flexibility of SARS-CoV-2 3CL M pro decreases tremendously when the ligands bind to it (Supplementary Figure 1). For Suramin as an inhibitor, the fluctuation increases a little bit (temperature B-factor increases from 14 to 16) ( Supplementary Figure 1 and Figures 2A-D), while in the case of 2S albumin and flocculating proteins the fluctuation decreases (temperature B-factor decreases from 12 to 10, respectively (Supplementary Figure 1, Figures 3A-D  and 4A-D). The RMSD vs. time graph indicates that the interaction between SARS-CoV-2 3CL M pro and the three ligands was stable throughout the simulation process (Supplementary Figure 6). Suramin can make 1-5 hydrogen bonds, while both 2S albumin and flocculating protein can make 2-5 hydrogen bonds according to 100 ns MD simulation analysis (Supplementary Figure 7).
The flexibility analysis from PyMOL also indicates that all the ligands decrease the flexibility of SARS-CoV-2 3CL M pro upon binding ( Supplementary Figures 2A-D).

CONCLUSION
• The inhibition of 3CL M pro by natural (2S albumin and flocculating protein from M. oleifera) and synthetic inhibitor (Suramin) was demonstrated in this study. • The interaction between 3CL M pro and the inhibitors are largely through electrostatic force of attraction and with the interactions of amino acid residues from both sides. • All the three inhibitors bind between domain II and III (3CL M pro amino acid residues, Lys102, Pro108, Gln110, Asp155, Glu240, and His246, with Suramin and S139, T139, G302, Q299 with 2S albumin and flocculating protein.
These interactions restrict the moment in domain III, which is important for dimerization and further for the function of SARS-COV2 3CL M pro . • Here we proposed that these inhibitors will inhibit 3CL M pro by preventing this enzyme from dimerization. • The current study will lead to the production of a new vaccine against COVID-19.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
AU designed the project and reviewed the manuscript. KU drafted and proofread the manuscript, and did English language corrections in the manuscript revision stage. Both authors contributed to the article and approved the submitted version.

SUPPLEMENTARY MATERIAL
The