Modeling of noncovalent inhibitors of the papain-like protease (PLpro) from SARS-CoV-2 considering the protein flexibility by using molecular dynamics and cross-docking

The papain-like protease (PLpro) found in coronaviruses that can be transmitted from animals to humans is a critical target in respiratory diseases linked to Severe Acute Respiratory Syndrome (SARS-CoV). Researchers have proposed designing PLpro inhibitors. In this study, a set of 89 compounds, including recently reported 2-phenylthiophenes with nanomolar inhibitory potency, were investigated as PLpro noncovalent inhibitors using advanced molecular modeling techniques. To develop the work with these inhibitors, multiple structures of the SARS-CoV-2 PLpro binding site were generated using a molecular sampling method. These structures were then clustered to select a group that represents the flexibility of the site. Subsequently, models of the protein-ligand complexes were created for the set of inhibitors within the chosen conformations. The quality of the complex models was assessed using LigRMSD software to verify similarities in the orientations of the congeneric series and interaction fingerprints to determine the recurrence of chemical interactions. With the multiple models constructed, a protocol was established to choose one per ligand, optimizing the correlation between the calculated docking energy values and the biological activities while incorporating the effect of the binding site’s flexibility. A strong correlation (R2 = 0.922) was found when employing this flexible docking protocol.


Introduction
At the end of 2019, the world was surprised with the appearance of pneumonia cases in Wuhan, China, caused by a new virus originating from bats (Hao et al., 2022).The causative virus, named SARS-CoV-2, spread rapidly worldwide, marking the onset of the first pandemic of the 21st century (Van Vo et al., 2021).It was determined that SARS-CoV-2 is a species of coronavirus from animals, similar to others that had previously caused similar diseases at the beginning of the century, serving as precursors and highlighting the profile posed by these types of viruses (Rota et al., 2003;Zhang et al., 2005;de Wit et al., 2016;Chafekar and Fielding, 2018).Consequently, researchers investigated viral infection mechanisms to identify treatment options for individuals affected by zoonotic coronaviruses (CoVs).These efforts yielded valuable insights, leading to the identification of molecular targets that are currently under investigation in the quest to develop specific drugs targeting CoVs.
When someone is infected with CoVs, certain proteins are produced that play important roles in the virus's infection process.Two of these proteins, the proteases 3CLpro and PLpro, have been identified as responsible for processing large polyproteins encoded by the viral RNA genome (Lv et al., 2021;Yan and Wu, 2021;Hu et al., 2022).The functions of these two proteases have been identified; very conclusive information explains that PLpro plays a crucial role in coronavirus replication.It is involved in important biochemical processes such as deubiquitination and deISGylation of proteins in the host cells, which are vital for viral pathogenesis (Vere et al., 2022).Additionally, PLpro's enzymatic activity antagonizes the host's antiviral immune response, working in conjunction with viral protein processing (Shin et al., 2020).
PLpro has four distinct domains: the palm, the thumb, the fingers, and a terminal domain that resembles ubiquitin.The binding site of PLpro contains a catalytic triad composed of a cysteine, a histidine, and an aspartate (Cys111-His272-Asp286 in the SARS-CoV-2 PLpro), and integrates residues from the palm and thumb domains (Ratia et al., 2006).It also has specific subsites that can be occupied by the substrate RLRGG, which corresponds to the C-terminus of ubiquitin.The binding site can adopt both closed and open conformations, which are influenced by structural changes in the flexible blocking loop 2 (BL2).These changes modulate the recognition of different substrates (Henderson et al., 2020).
As PLpro is a cysteine protease with essential roles in the attack of coronaviruses (CoVs) on humans, it has become an important biological target (Brian Chia and Pheng Lim, 2023).Both covalent and non-covalent inhibitors have emerged for PLpro, many of which were reported between 2008 and 2014 (Ratia et al., 2008;Ghosh et al., 2009;Ghosh et al., 2010;Báez-Santos et al., 2014a), prior to the onset of the COVID-19 pandemic.These inhibitors were designed to target the PLpro identified in SARS-CoV-1.Among these inhibitors, the compound GRL-0617 emerged as a notable candidate due to its ability to inhibit viral replication in vitro of SARS-CoV-1 and SARS-CoV-2 viruses.Building upon this set of compounds and after the start of the COVID-19 pandemic, Shen and colleagues synthesized a new series of compounds derived from GRL-0617 with the intention of inhibiting SARS-CoV-2 PLpro (Shen et al., 2022).
The binding site of SARS-CoV-2 PLpro lacks sufficient space to accommodate ligands near the catalytic cysteine.To address this, GRL-0617 and its derivatives bind to the S 3 and S 4 subsites, causing an increase in space and a rearrangement of chemical groups, as confirmed by the opening of the BL2 loop.Crystallographic structures have been reported, enabling a comparison of these subsites in both the apo form and in the presence of some of these ligands.This helps identify which residues contribute to establishing a chemical match between specific ligands and the enzyme.This information serves as a valuable starting point for studying the family of 89 inhibitors synthesized and evaluated by Shen et al. (Shen et al., 2022) with the goal of finding a model that explains the greater activity of some compounds compared to others.
In a recent study, a computational analysis was conducted, successfully establishing a correlation between the activity of 67 non-covalent SARS-CoV-1 PLpro inhibitors and their energies, determined by docking while considering the flexibility of the binding sites (Castillo-Campos et al., 2023).Applying this methodology to the new inhibitory compounds of the latest PLpro variant will enable us to comprehend the intricacies involved in incorporating a larger number of compounds with greater structural variability.Our efforts have been focused on suggesting that the inclusion of significant flexibility in the binding site is crucial for developing models capable of predicting novel PLpro inhibitors.

Preparation of the studied ligands
The study by Shen et al. (Shen et al., 2022) provided information on a total of 89 chemical structures derived from GRL0617, along with their IC 50 values.This reference was used to gather the data.Table 1 contains the chemical structures of each compound.The chemical structures were visualized using the Maestro Molecular Editor (version 12.8.117,Schrödinger LLC, New York, NY, United States, 2021) and processed using the LigPrep module within Maestro.The protonation states of the compounds were estimated using Epik (Shelley et al., 2007) at a physiological pH of 7.
PDB structures were prepared for docking calculations using the basic functionalities of Protein Preparation Wizard module (Schrödinger LLC, New York, NY, United States, 2021).This module removed crystallographic waters but retained the Zn 2+ , determined the neutrality or possible charge of protonatable groups, and performed a steepest descent restrained minimizations of the protein models using OPLS3 force field (Harder et al., 2016).During the minimization, heavy atoms were restrained with a harmonic potential of 25 kcal mol −1 Å −2 , hydrogens were left unrestrained, and heavy atoms were converged to RMSD = 0.3 Å.

Docking calculations
We applied docking to obtain ligand-PLpro models with Glide (Friesner et al., 2004).The studied compounds were docked within  Frontiers in Molecular Biosciences frontiersin.orgthe spaces delimited between the residues at subsites S 3 -S 4 of the PLpro structure with code 7LLF.A 20 × 20 × 20 Å 3 grid with a grid spacing of 0.5 Å were defined.We utilized both the Glide standard (SP) and extra precision (XP) modes.Both modes together were employed to get high-quality solutions, although priority was given to the more rigorous solutions in XP mode (Friesner et al., 2006).
The method was employed with parameters similar to those used in our previous study of naphthalene-derived inhibitors (Castillo-Campos et al., 2023).In the process of selecting the optimal solution for each ligand, we considered the ten best poses based on the most negative docking scoring energy values, coupled with a thorough comparison between these poses and those observed in the reference PDBs.The aforementioned comparison involved calculating RMSD values, as detailed in the following section, to evaluate consensus among binding modes of multiple docked ligands.Consequently, the best pose was determined by identifying the one with the most negative energy value and a reasonable orientation in line with the comparison between poses.

LigRMSD
When similar compounds are docked, we expect the way they bind to be similar to the binding modes of the compounds already co-crystallized in PLpro structures.Consequently, the ligands are also expected to have the same orientation when compared to each other.To confirm the similarity between orientations, we employ the LigRMSD method (Velázquez-Libera et al., 2020).LigRMSD calculates RMSD values by comparing two ligands that may be identical 'or similar' (non-identical).LigRMSD identifies common graphs (maximum common substructure) between the two ligands and compares the coordinates of the equal graphs, enabling the comparison of non-identical ligands.The match between graphs is defined using the values "%Ref" and "%Mol".%Ref quantifies the percentage of overlapping structural elements between a docked compound and a selected reference, relative to the total number of atoms in that reference.On the other hand, %Mol quantifies the percentage of overlapping structural elements between a docked compound and a selected reference, relative to the total number of atoms in the docked compound.These values obtained from the LigRMSD server represent the maximum similarity between the compounds being compared, with high values of "%Ref" and "% Mol" associated with high similarity.If the two compared ligands are identical (%Ref = %Mol = 100), we perform the typical RMSD calculation to compare the coordinates between identical ligands.If the ligands are not identical, only the coordinates of the common graphs are compared, and in that case, %Ref and/or %Mol <100.Low values of %Ref and/or %Mol indicate that the comparison made is not appropriate, as it may involve comparing small portions between the two ligands.
Low RMSD values indicate that the orientations obtained for the ligands are similar, and high values of %Ref and %Mol indicate that the comparisons made were appropriate.LigRMSD operates in two modes: strict and flexible.In strict mode, the graphs must be composed of the same heavy atoms, while in flexible mode, atoms can be different in the graph (e.g., C can be changed for N).For example, when comparing benzene with pyridine, they are considered identical using the flexible mode.
Using LigRMSD, we compared the docking poses of the compounds XR8-24, XR8-65, XR8-69, XR8-83, and XR8-89 with their respective PDB structures 7LBS, 7LOS, 7LLZ, 7LLF, and 7LBR.Following that, we compared the docking poses of all the compounds with the docking poses obtained for GRL0617 and ZN-3-80.We selected these compounds as references because they possess specific characteristics compatible with different subsets of the entire dataset.

Interaction fingerprint (IFP)
Protein-ligand interaction fingerprints (IFPs) are simplified binary representations of the 3D structures of protein-ligand complexes (Deng et al., 2004).They encode whether specific interactions occur between the amino acids in the protein's binding pocket and the ligand.The IFPs utilize a onedimensional format to indicate the presence or absence of these interactions, providing a concise representation of the complex's interaction patterns.IFPs were determined between the docked poses of ligands and residues in the SARS-CoV-2 PLpro binding site.Maestro's Interaction Fingerprint panel was employed to generate them.The interaction matrix was constructed by including hydrophobic (H), polar (P), aromatic (Ar), hydrogen bond (HB) acceptor (A), HB donor (D), and charged (Ch) groups.An interaction was considered to occur when a PLpro residue was within a maximum cutoff distance of 4.0 Å from the ligand's heavy atoms.

Gaussian accelerated molecular dynamics (GaMD) and correlation analysis
To ensure a diverse sampling of the SARS-CoV-2 PLpro binding site, molecular dynamics (MD) simulations were conducted.MD simulations are recognized because they can complement and enrich the information obtained from X-ray crystallography for proteins by considering solvation and obtaining velocity and positions of atoms over time (Amadei et al., 1993;Karplus and McCammon, 2002;Bakan et al., 2011;Rodríguez et al., 2011).In this way, they contribute to the understanding of the physics that allows the existence of the protein structure and to the understanding of its biological function.MD simulations were performed with ligands present at the binding site to keep the site open and allow for the inclusion of other ligands in subsequent cross-docking calculations.The structures with PDB IDs 7LBR and 7LLF were chosen as starting points (these PDB structures have better resolutions than their analogues).The compound XR8-89 with the highest binding affinity was kept in the binding site of both protein structures (obtained with docking as indicated in section 2.3).Both MD simulations were performed using the same ligand to ensure that the site was sampled under conditions as similar as possible.XR8-89, which is one of the largest ligands, was incorporated to ensure that each part of the site remained open for subsequent docking calculations.
Prior to the simulations, protein structures were processed with the Protein Preparation Wizard from Schrödinger LLC.The protein was immersed in a truncated octahedron of TIP3P waters, ensuring a separation of 12 Å or more from the box's surface.The system was subjected to a steepest descent minimization procedure comprising 10,000 steps.For the equilibration, a first round involved heating the system to 310K over 1 ns using an isothermal-isovolumetric (NVT) assembly.After this, an 80-ns isothermal-isobaric (NPT) equilibration at 310K and 1 atm was done.
The pmemd.cuda implementation of Amber20 was used to apply Gaussian accelerated molecular dynamics (GaMD) (Miao and McCammon, 2017).In particular, we employed LiGaMD (Miao et al., 2020), designed for protein-ligand complexes.We conducted 60-ns MD simulations (with the same settings as the equilibration), with only the last 50 ns considered as the production phase.No atoms were restrained during all the MD simulations.The MD simulations are brief as its aim is to obtain a conformational variation in the active site, which is ensured by using the GaMD method.The resulting trajectories were analyzed using the VMD (Humphrey et al., 1996) and CPPTRAJ (Roe and Cheatham, 2013).
To capture a greater range of conformational diversity, the trajectories obtained for both systems were clustered using the K-means algorithm.An internal script utilizing the scikit-learn library (Varoquaux et al., 2015) was employed for this purpose.The clustering process considered six geometrical descriptors to define the clusters, including various distances between specific atoms within the PLpro binding site.These descriptors are: (a) RMSD value of Gln269, (b) distance between the more proximal carboxylate O of Asp164 and the amide N of Gln269, (c) distance between the hydroxyl O of Tyr268 and the amide N of Gln269, (d) distance between the amine N of Lys157 and the side chain O of Gln269, (e) distance between the backbone O of Asn267 and the N of Cys270, and (f) distance between the hydroxyl O of Tyr264 and the backbone O of Asn267.This process resulted in a dendrogram or "cluster tree," where each leaf represented a single cluster and the root represented the largest cluster containing all sampled states.
Representative protein structures in the GaMD trajectories were identified with the clustering.Each of them was used as the protein in molecular docking calculations for each ligand, resulting in the generation of a set of poses for each ligand, each assigned with a docking energy value.These docking calculations were performed using the same parameters described in Section 2.3.Subsequently, we applied an in-house method programmed in Python (Muñoz-Gutierrez et al., 2016) to select, through a genetic algorithm (GA), the set of poses that best fits a maximum correlation between the docking energies and the experimental IC 50 activities, converted to pIC 50 values (pIC 50 = -logIC 50 with IC 50 in M).In the GA search, random combinations were generated in each generation.The population underwent several genetic operations to produce the next generations, including one-point crossover and single-point mutation.The crossover probability was set to 0.6, and the mutation rate was set to 0.1.Elitism was also applied to prevent the loss of good combinations from one generation to the next, where the top 30% of combinations were retained.
Using this protocol, we obtained a set of models for the 89 inhibitor-PLpro complexes, allowing the analysis of conformational changes in the protein binding site necessary to incorporate ligands with interaction energies adjusted to their difference in experimental activity.

Docking results
We performed docking calculations to investigate how the noncovalent inhibitors under study interact with the SARS-CoV-2 PLpro.The docking scoring energies can be found in the Supplementary Table S2.Our results revealed that all the ligands in the series share a common binding mode (Figure 1).Specifically, all the ligands preserve the hydrogen bonds (HBs) between the amide of the central scaffold and the residues Asp164 (side chain) and Gln269 (backbone NH), and position the R 2 substituent at the S 4 subsite and the R substituent (or large RR substituents) at the S 3 subsite (the substituents are denoted as in Table 1).The naphthalen-1-yl substituents at R 2 are oriented in two ways: most are oriented closer to Pro248 (referred to as the op2 orientation in the manuscript), while some are oriented closer to Pro247 (referred to as the op1 orientation).The compounds from series XR8, ZN-3-74, and ZN-3-80 oriented the 3-(thiophenyl)phenyl (and 3-(1H-pyrrolyl)phenyl in XR8-9) closer to Pro248 (op2 orientation).It was previously observed in crystallographic structures that the S 4 subsite of PLpro is wide, and it was noted that substituents could adopt the two indicated orientations (Rut et al., 2020;Patchett et al., 2021).Regarding the S 3 substituents, the vast majority of compounds oriented the R 3 substituents near Leu162, while the R substituents were positioned close to the side chains of Asp167 and Asp164.The interactions described above align with the interactions previously reported in crystals containing compounds from series XR8 (Shen et al., 2022).
The comparison of the co-crystallized inhibitors with their respective docking poses yielded satisfactory results.For the comparison between the docking poses of compounds XR8-24, XR8-65, XR8-69, XR8-83, and XR8-89 with the poses in their respective crystals 7LBS, 7LOS, 7LLZ, 7LLF, and 7LBR (by aligning all the crystallographic structures with 7LLF), the RMSD values were 2.28, 2.47, 1.90, 2.56, and 1.75 Å, respectively.Comparisons between the docked structures and their references (as seen in Supplementary Figure S1) indicate that, although there are RMSD values greater than 2 Å, the orientations were adequate in all cases.
We also used the LigRMSD server (Velázquez-Libera et al., 2020) to assess the similarity in orientations among the 89 inhibitors.We used two references for the comparison, GRL0617 and ZN-3-80.Lower RMSD values indicate greater similarity between the compared poses.The results of these comparisons can be found in the Supplementary Table S3.Based on the %Ref and %Mol values, it was determined that the GRL0617 structure served as a more suitable reference for comparing the compounds of series DY2, XDY2, YF4, ZN-2, and ZN-3.On the other hand, ZN-3-80 is a more suitable reference for comparing the compounds of series XR8.The compounds ZN-3-45, ZN-3-59, ZN-3-67, ZN-3-71, ZN-3-74, and ZN-3-79 are exceptions, as they exhibit lower degree of similarity with both references, although they are closer in structure to ZN-3-80.
Compounds of series DY2, DY-3, ZN-2, and the majority of compounds from series ZN-3 exhibited values of %Ref (a measure of similarity to the reference) higher than 82.6% when GRL0617 is used as a reference.The RMSD values obtained were below 1.0 Å, with only five compounds (DY2-139, DY2-144, ZN-2-183, ZN-three to three, and  showing RMSD values between 2.28 and 2.77 Å.In these five compounds, the R 2 substituents (naphthalene in DY2-144, ZN-2-183, ZN-three to three, and ZN-3-33 and (R)-1-phenylethyl in DY2-139) oriented opposite to the naphthalene group in GRL0617, but their main scaffolds were correctly oriented.On the other hand, compounds of series XR8 exhibited values of %Ref higher than 82% when ZN-3-80 is used as a reference.The RMSD values obtained were between 0.30 and 2.39 Å. RMSD values around 2 Å were attributed to the diverse orientations of the R and R 2 substituents within the subsites, providing ample space for various interaction possibilities.However, the main scaffold maintained a consistent orientation across all compounds (keeping the HBs with the residues Asp164 and Gln269).Notably, a distinct difference was observed in compounds XR8-83 and ZN-3-79, where the methyl group in R 3 deviated from its position adjacent to Leu162.Therefore, from visual inspection and LigRMSD calculations, it is possible to conclude that the docking poses of all the ligands were aligned with the co-crystallized compounds and between them.
IFPs were conducted to gain a deeper insight into the interactions between the docked ligands and PLpro.This analysis allows annotating the recurrent chemical interactions observed between the studied inhibitors and the protease binding site.In Figures 2A,B, it is possible to find the IFPs for the 89 compounds docked in the PLpro crystal with the code 7LLF.The polar interactions that connect the central amide group of the compounds with residues Asp164 and Gln269 are of great importance, as reflected in the IFPs.Asp164 is involved in polar interactions with all docked poses, occurring 100% of the time.It also acts as an HB acceptor, with a frequency of over 80%, reflecting its propensity to form HBs with the central amide NH group.The residue Gln269, located in the BL2 loop, exhibits polar contacts and serves as an HB donor in 100% of cases.Hydrophobic and aromatic interactions with residues Tyr264, Tyr268, and Tyr273 are consistently observed in all the docked structures.These residues collectively form an aromatic enclosure that contributes to the attraction and stabilization of the studied inhibitors.Specifically, Tyr268 plays a crucial role in closing the BL2 loop to adopt the closed conformation of the binding site (Báez-Santos et al., 2014b).
Other notable IFPs patterns are described as follows.Leu162 and Gly163 interacted with all ligands, while leucine's side chain engaged in hydrophobic interactions with approximately 95% of the cases.Thr301 also contributed in all the cases with polar interactions.The residues Pro247 and Pro248 promote the occurrence of hydrophobic contacts at the interface between the protein and the ligand.Pro248 consistently exhibits hydrophobic contacts in 100% of cases, while Pro247 shows hydrophobic contributions with approximately 20% of occurrence.Glu167 participated in polar and charged contacts in approximately 72% of cases.Finally, Lys157 exhibited polar and charged interactions in around 10% of the docked structures.

Cross-docking results
To enhance the exploration of different chemical interactions and configurations within the binding site of the SARS-CoV-2 PLpro enzyme when forming complexes with the studied set of inhibitors, we conducted GaMD simulations following the procedures detailed in the Materials and Methods section.Specifically, two simulations were conducted on the solvated PDB structures with codes 7LBR and 7LLF in complex with XR8-89.We assessed the stability of these GaMD trajectories by measuring the RMSD of the positions of the backbone atoms of PLpro over time.The root mean square fluctuation (RMSF) values remained relatively consistent throughout the production simulations for all systems, as shown in Supplementary Figure S2.
Applying the clustering analysis to the GaMD simulations (as described in the Materials and Methods section), a set of twenty representative PLpro conformations was obtained, demonstrating structural diversity, and denoted as 7llf_0-7llf_9 and 7lbr_0-7lbr_ 9 in this manuscript.These conformations were useful for our subsequent cross-docking analysis, during which we docked the 89 compounds into these twenty PLpro structures, each exhibiting diverse binding site conformations.During the cross-docking process, we generated twenty different poses for each ligand.To validate the reliability of these poses, we employed LigRMSD (Velázquez-Libera et al., 2020) to ensure the presence of plausible solutions.Following this verification step, we identified representative complexes of PLpro and inhibitors for each ligand.This selection was carried out using a custom Python script developed in-house (Muñoz-Gutierrez et al., 2016).The script was designed to optimize the correlations between calculated docking energies and experimental PLpro inhibitory activities.It effectively identified a set of representative PLproinhibitor complexes that exhibited the strongest correlation between docking scores and experimental inhibitory activities.The energies associated with these complexes are provided in Supplementary Table S2.
The optimized and reference correlations are presented in Figure 3. Notably, the correlation for the docking experiments conducted with structure with code 7LLF (used as reference) was found to be weak (R 2 = 0.081; Figure 3A).This outcome aligns with expectations and is consistent with the known limitations of current docking scoring functions.It is well-known that scoring functions have demonstrated competence in docking and screening evaluations but may not excel in evaluating scoring power, which involves establishing a robust linear relationship between predicted and experimentally determined activities (Warren et al., 2006;Damm-Ganamet et al., 2013;Xu et al., 2015).To address this challenge, one strategy is to introduce flexibility into the protein binding site (Lexa and Carlson, 2012).In our approach, we harnessed various conformational states obtained through GaMD simulations, enabling flexibility within the binding site.As illustrated in Figure 3B, our method substantially improved the correlation between predicted and experimental pIC 50 values, yielding an R 2 value of 0.922.
The strong correlation observed signifies the effectiveness of our proposed protocol in elucidating the relationship between the molecular structure and activity.Within the dataset of twenty PLpro conformations, our script successfully identified fourteen protein conformations that combined different orientations of the residues in the binding site.These specific conformations are listed in Table 2, along with compounds docked into each PLpro conformation, crucial for generating the structure-activity relationship model with the highest R 2 value.
Our approach involved the application of GaMD simulations and clustering to generate diverse conformational states within the binding site of SARS-CoV-2 PLpro.The MD and clustering protocol generated variations in the binding site, enhancing its flexibility and producing new structural conformations that resulted in different PLpro configurations for subsequent cross-docking calculations.Since we obtained an excellent correlation between activities and docking energy values through the inclusion of these variations in the active site, it is worthwhile to examine what those variations were.Through this analysis, it was identified that the residues that had the most significant conformational changes were Arg166, Glu167, Met208 and Asn267.
Figure 4 shows a visual inspection of these residues across the fourteen conformations present in the model that optimizes the structure-activity correlation.For Arg166, multiple orientations of the side chain guanidino group were observed (Figure 4A).In the majority of structures (7lbr_2, 7lbr_9, 7llf_1, 7llf_2, 7llf_3, 7llf_5, and 7llf_8), this group is directed towards the region where the ligand binds (depicted as green sticks).Conversely, in 7llf_0 and 7llf_9, it is positioned closer to Met206 (shown in orange sticks), while in 7lbr_1 and 7lbr_5, it is nearer to Asp164 (represented by yellow sticks).In 7llf_7, 7lbr_0, and 7llf_6, this group assumes a position equidistant between Met206 and Asp164.Glu167, next to Arg166, exhibited nearly consistent orientation across all structures (Figure 4B).However, there are two exceptions: in the 7lbr_ 2 structure, the side chain carboxylate group is directed toward Arg166 (depicted as green sticks), while in 7lbr_5, this group is oriented towards Lys157 (represented by orange sticks).Met208 also exhibits a relatively consistent orientation across structures between Arg166 and Pro247 (Figure 4C).However, there are two exceptions, in both structures 7llf_3 and 7lbr_5 the side chain methylthio group moves away from Pro247 (depicted as orange and purple sticks, respectively).Notably, structure 7lbr_5 (depicted in purple sticks) presents a notably different orientation compared to the rest.Finally, the most mobile residue within the BL2 loop appeared to be Asn267 (Figure 4D).The side chain of this residue did not exhibit significant differences in its orientations across structures.However, in 7llf_7, this residue showed the most notable difference from the remaining structures (illustrated by orange sticks), where its side chain moved away from the consecutive residue Tyr268.Additionally, in 7llf_9, Asn267 displayed a somewhat distinct orientation (depicted by cyan sticks), shifting its side chain away from both Tyr268 and the ligand binding site.
The conformational variations of the mentioned residues and those of others with less mobility, but with contributions to the flexibility of the binding site (the analyzes of K157 and Y268 are included in the Supplementary Figure S3) result in differences in the size and electrostatics of the active site.These differences significantly influence the variation in docking results and their energetic evaluation.
The validation of compound interactions within the SARS-CoV-2 PLpro binding site for protein-ligand complexes within the most highly correlated model was confirmed through IFPs.Earlier, the IFP analysis of docking-derived complexes highlighted key residues.These crucial residues would persist in the complexes obtained via cross-docking.IFPs for the selected complexes involving crossdocked compounds are reported in Figures 2C,D.When these IFPs are compared with those previously obtained considering a single structure (Figures 2A,B), it is possible to see that the interactions that characterize this series of complexes are maintained.It demonstrates the consistent chemical interactions between compounds and the PLpro binding site within the model showing the highest correlation.The significant interactions identified with residues Leu162, Gly163, Asp164, Glu167, Pro248, Tyr264, Tyr268, Gln269, Tyr273, and Thr301 were also evident in the IFPs within Figures 2C,D.
It is pertinent to compare the model obtained in this work for SARS-CoV-2 PLpro inhibitors with the model recently reported for naphthalene derivatives as SARS-CoV-1 PLpro inhibitors (Castillo-Campos et al., 2023).Both works were approached with the same methodology, with the only difference being that, in the current study, we decided to use shorter GaMD simulations to demonstrate the feasibility of applying the method with a brief sampling, which is particularly suitable when correlating a large number of compounds.Although the inhibitors studied in the previous work bind to the same binding site, their activities were evaluated against SARS-CoV-1 instead SARS-CoV-2 PLpro, making it impossible to unite all the compounds into a single model.The previous work was conducted for a smaller number of compounds, and their activities were generally lower.In that study, more than half of the compounds had IC 50 activities >7 μM, while in the current work, only approximately one-third of the compounds have an activity higher than this value.Therefore, the data modeled in this work are more numerous and contain a greater number of compounds with favorable activities against PLpro.It should also be considered that the compounds in the current work exhibit more variability in the substituents placed on the S 4 subsite, which may have influenced the greater number of PLpro conformations identified.
It is interesting to analyze whether we obtained conformations of SARS-CoV-2 PLpro prone to binding active ligands and others prone to binding inactive ligands.The former would be the best options for carrying out a virtual screening that detects new compounds, while the latter would be less recommended.
Reviewing this detail in Table 2, it was observed that the 7llf_2, 7lbr_2, and 7llf_7 conformations bound six of the ten most active inhibitors (two each).On the other hand, the 7llf_3 conformation bound five of the ten least active inhibitors, while 7llf_5 two others.Based on these observations, it could be suggested that 7llf_3 (characterized by having M208 somewhat displaced, Figure 4C) would be the least recommended conformation for identifying new active compounds.

Conclusion
This study investigated a series of 89 noncovalent inhibitors of the PLpro enzyme from SARS-CoV-2.We utilized a flexible molecular docking protocol to analyze the structural features of the inhibitors and their interactions with the PLpro binding site.By incorporating protein flexibility and performing extensive docking simulations, we established a high correlation between the energy values obtained from the simulations and the experimental pIC 50 values of the inhibitors.These findings contribute to developing potential drugs targeting PLpro in respiratory diseases caused by SARS-CoVs.
The research comprised of four steps: i) Initially, the structures of the protein-ligand complexes were determined through a rigid docking approach; this involved predicting how the flexible ligands bind to a rigid protein, ii) To explore the flexibility of the PLpro binding site, multiple protein conformations were generated using a short GaMD simulation; this allowed for a more comprehensive sampling of the different ways the binding site can adopt various shapes, iii) A cross-docking experiment was conducted, which involved evaluating the interactions between the 89 compounds and selected PLpro conformations obtained in the previous step; this enabled to examine of how each compound interacts with the protein in various binding modes, iv) From the cross-docking results, the protein-ligand complexes that have the highest correlation between the docking energies (the scoring energy values obtained from the docking simulations) and the experimental activities were selected; these selected complexes served as representative models, illustrating the most accurate relationship between the predicted docking energies and the experimental activities of the compounds.
As a result of this study, a collection of complexes was identified where the ligands interacted with a flexible binding site of PLpro.The methodology proposed in this research demonstrated its effectiveness, as evidenced by a correlation value of R 2 = 0.922 obtained in the final step mentioned earlier.The inclusion of protein flexibility through the GaMD sampling of PDB structures played a pivotal role in achieving the intended objective.By considering the adaptability of the binding site, the limitations of a rigid docking approach, which fails to account for significant conformational changes in the protein upon ligand binding, were effectively overcome.Consequently, integrating flexibility in the binding site analysis presented a more rational and accurate approach.
In summary, the strategy employed in this article provides a robust framework for investigating PLpro ligands using computational tools.The approach reflects a potential conformational selection methodology, acknowledging the dynamic nature of ligand-protein interactions.Conducting a comprehensive structural investigation into the properties of SARS-CoV-2 PLpro inhibitors contributes positively to the field of research dedicated to the design and computational evaluation of more potent candidates targeting this protease.

FIGURE 1
FIGURE 1 Docked structures within the SARS-CoV-2 PLpro binding site.Docked ligands are represented by green sticks.Relevant residues at protein subsites are represented by gray sticks.Hydrogens are omitted for clarity.

FIGURE 2
FIGURE 2 IFPs depicting the interactions between the docked compounds and SARS-CoV-2 PLpro.(A, B) Interactions of compounds with residues within the PLpro crystal identified as 7LLF.(C, D) Interactions observed in the selected complexes involving cross-docked compounds and SARS-CoV-2 PLpro conformations that exhibit the highest correlation.The graphs on the left side (A, C) illustrate the percentage of occurrence of contacts [C], interactions with the residue's backbone [B], and interactions with the residue's side chain [S].The graphs on the right side (B, D) portray the occurrence percentages of diverse chemical interactions: contacts [C], polar [P], hydrophobic [H], HBs where the residue serves as an acceptor [A], HBs where the residue serves as a donor [D], aromatic [Ar], and electrostatic interactions involving charged groups [Ch].

FIGURE 3
FIGURE 3Regression plots of the docking scoring energies versus experimental activities (-pIC50) for the docking experiments performed in the SARS-CoV-2 PLpro structure with code 7LLF (A), and for the cross-docking protocol (B).

FIGURE 4
FIGURE 4 RMSD values (in Å) of the residues (A) Arg166, (B) Glu167, (C) Met208, and (D) Asn267, for the SARS-CoV-2 PLpro structures utilized in cross-docking experiments involved in the structure-activity relationship model that exhibited the highest R 2 .RMSD <1.0 Å are represented in gray, RMSD ≥1.0 Å and <2.0 Å are represented in blue, RMSD ≥2.0 Å and <3.0 Å are represented in yellow, and RMSD ≥3.0 Å are represented in red.Conformations of each residue are represented to the left, colored differently according to their orientations.

TABLE 2
PLpro structures utilized in cross-docking experiments and the SARS-CoV-2 PLpro inhibitors involved in the structure-activity relationship model that exhibited the highest R 2 .