In silico Druggability Assessment of the NUDIX Hydrolase Protein Family as a Workflow for Target Prioritization

Computational chemistry has now been widely accepted as a useful tool for shortening lead times in early drug discovery. When selecting new potential drug targets, it is important to assess the likelihood of finding suitable starting points for lead generation before pursuing costly high-throughput screening campaigns. By exploiting available high-resolution crystal structures, an in silico druggability assessment can facilitate the decision of whether, and in cases where several protein family members exist, which of these to pursue experimentally. Many of the algorithms and software suites commonly applied for in silico druggability assessment are complex, technically challenging and not always user-friendly. Here we applied the intuitive open access servers of DoGSite, FTMap and CryptoSite to comprehensively predict ligand binding pockets, druggability scores and conformationally active regions of the NUDIX protein family. In parallel we analyzed potential ligand binding sites, their druggability and pocket parameter using Schrödinger's SiteMap. Then an in silico docking cascade of a subset of the ZINC FragNow library using the Glide docking program was performed to assess identified pockets for large-scale small-molecule binding. Subsequently, this initial dual ranking of druggable sites within the NUDIX protein family was benchmarked against experimental hit rates obtained both in-house and by others from traditional biochemical and fragment screening campaigns. The observed correlation suggests that the presented user-friendly workflow of a dual parallel in silico druggability assessment is applicable as a standalone method for decision on target prioritization and exclusion in future screening campaigns.


INTRODUCTION
The nucleoside diphosphates attached to sequence-x (NUDIX) hydrolase protein family was recently comprehensively and exhaustively reviewed by Carreras-Puigvert et al. (2017) NUDIX proteins possess a conserved sequence, called the NUDIX box, i.e., Gx 5 Ex 5 [UA]xREx 2 EExGU), which differs little between individual members which are otherwise of low sequence similarity.
Structural and domain analysis revealed three major groups and one outlier, NUDT22, mostly based on their already reported activity against substrate classes such as diphosphoinositol polyphosphates (Caffrey et al., 1999(Caffrey et al., , 2000 and NADH diphosphates (Abdelraheim et al., 2003). Subsequently, a systematical screening against a large set of substrates was performed and painted a rather promiscuous picture of the NUDIX hydrolases, indicating backup functionality or redundancy. Consequently, a global expression analysis was performed and showed a clear dependency on tissue of origin and the corresponding cancer tissue. Interestingly, NUDT1, NUDT5, and NUDT14 amongst others were present in a cluster of highly expressed proteins, confirming a potential role in cancer as reported earlier (Choi et al., 2011;Gad et al., 2014;Huber et al., 2014;Wright et al., 2016). Importantly, when evaluated for epistasis, it became apparent that several NUDIX members sustain relations as measured in cell viability and cell cycle perturbations and that these interactions are more important for cancerous cells. With this overview in structure, expression, substrate specificity and relation, the NUDIX protein family members gained considerable attention as potential drug targets. The original interest in pharmacological modulation of NUDIX members was sparked by the notion that NUDT1 is overexpressed in several cancer cell types, while its role in healthy cells can largely be compensated for as evidenced by the normal life-span of knock-out mice (Tsuzuki et al., 2001). Besides GTP and dGTP, NUDT1 hydrolyzes several oxidatively damaged DNA nucleotides including 8-oxo-dGTP and 2-OH-dATP, thus preventing their incorporation into DNA, which FIGURE 1 | Published NUDIX inhibitors: TH588 was developed as a first in class NUDT1 inhibitor at Science for Life Laboratory and Karolinska Institutet (Gad et al., 2014); (S)-Crizotinib is a potent NUDT1 inhibitor and the enantiomer of (R)-Crizotinib (Huber et al., 2014), a clinically applied tyrosine kinase inhibitor; optimized by Astra-Zeneca; AZ-15, AZ-21 and AZ-24 are distinct chemotype inhibitors targeting NUDT1 (Kettle et al., 2016); BAY-707 (Ellermann et al., 2017) was discovered as a NUTD1 inhibitor by Sprint Bioscience; IACS-4759 (Petrocchi et al., 2016) is a NUDT1 inhibitor developed by MD Anderson; MI-743 is a selective inhibitor of NUDT1 in gastric cancers (Zhou et al., 2019); Compound 5 was reported by Gilead and inhibits NUDT1 (Farand et al., 2020); TH5427 was synthesized as a lead compound against NUDT5 (Page et al., 2018); NUDT7-COV-1 is a covalent inhibitor generated by electrophile screening and fragment combination (Resnick et al., 2019). otherwise would lead to DNA damage and ultimately cell death. This led to the hypothesis that increased expression of NUDT1, and hence improved sanitization capacity of oxidatively damaged DNA bases from the nucleotide pool, would enable cancer cells to cope with the increased oxidative stress they are exposed to compared with healthy cells. Gad and coworkers published TH588 (Figure 1) as the first small-molecule NUDT1 inhibitor with efficacy in mouse xenograft models (Gad et al., 2014), although subsequent potent and selective NUDT1 inhibitors disclosed by AstraZeneca, MD Anderson, Gilead and Sprint Bioscience/Bayer failed to reproduce these findings with regards to cytotoxicity (Figure 1) (Kettle et al., 2016;Petrocchi et al., 2016;Ellermann et al., 2017;Farand et al., 2020). The validity of NUDT1 as an anticancer target has thus been questioned and is still under debate (Warpman Berglund et al., 2016;Samaranayake et al., 2017). Regardless, these studies served to demonstrate significant amenability to small-molecule inhibition of NUDT1, justifying the question as to how this translates to other members of the NUDIX family.
Besides NUDT1, a series of potent, drug-like NUDT5 inhibitors have been described by Page and coworkers (Page et al., 2018). The lead compound TH5427 (Figure 1) was shown to block progestin-dependent, PAR-derived nuclear ATP synthesis and subsequent chromatin remodeling, gene regulation and proliferation in breast cancer cells, suggesting that targeting NUDT5 may represent a novel therapeutic approach for breast cancer treatment. Most recently, the covalent NUDT7 inhibitor NUDT7-COV-1 was developed by employing electrophilic fragment screening and a fragment combination approach (Figure 1) (Resnick et al., 2019). To the best of our knowledge no potent inhibitors for any of the other NUDIX hydrolase members have been disclosed to date, although there are public data on hit rates for selected family members on the Structural Genomics Consortium homepage 1, 2 .
One aspect not addressed in the recent comprehensive review (Carreras-Puigvert et al., 2017) is an assessment of the potential druggability of the different NUDIX family members, i.e., their amenability to be modulated by drug-like small molecules. With the recent dawn of PROTACs, synthetic neoantigens and biologics, but also established targeting strategies like allosteric modulation or active site inhibition, several scenarios of how to target a protein may be exploited. With that in mind, druggability as such is no longer restricted to active site inhibition of a protein by a small molecule with an optimized small-molecule druglike profile. Both orthosteric or catalytic sites and secondary, allosteric sites, may be equally interesting to be targeted for the development of small-molecule chemical probes and potential drug candidates. As high-resolution crystal structures of 18 out of the 22 human NUDIX hydrolases are now available, a familywide in silico druggability assessment for available sites is feasible.
Here we use several open-access binding site analysis methods, i.e., DoGSite (Volkamer et al., 2012) 3 , CryptoSite (Cimermancic et al., 2016) 4 , and FTMap (Kozakov et al., 2015) 5 , as well as the commercial SiteMap and in silico fragment screening of a fragment library using Glide to probe the NUDIX hydrolase protein structures for potential small-molecule binding sites and assess their druggability and suitability for a prospective drug discovery campaign. This established in silico prioritization workflow within the NUDIX family is further supported by results obtained from biochemical screens employing the malachite green assay (Baykov et al., 1988) as well as differential scanning fluorimetry (DSF) (Niesen et al., 2007) fragment screens for some of the family members. This correlation with own experimental results and those published previously highlights the benefit of this comparably low-cost computational assessment workflow prior to applying experimental screening methods for the rapid evaluation of target druggability.

Protein Preparation and Validation
Available crystal structures of human NUDIX hydrolases with the highest resolution were imported into Maestro (Schrödinger Suite 2019-1, Schrödinger, LLC, New York, NY, 2019.) The structures were then prepared using the Protein Preparation Wizard as implemented in the Schrödinger Suite. Briefly, raw PDB structures were processed by automatically assigning bond orders, adding hydrogens, creating zero-order bonds to metals, converting selenomethionine to methionine, adding missing side-chains, creating possible disulfide bridges, deleting waters beyond 5.0 Å of hetero groups (if present), and generating hetero protonation states at pH 7.0. Residues with alternate positions were locked in the conformations with the highest average occupancy. Small ligands and metal ions originating from crystallization buffer were removed. The hydrogen bonding networks were optimized automatically, by sampling water orientations and optimization of hydroxyls, Asn, Gln, and His residue states using ProtAssign. Any remaining water molecules were subsequently removed. A restrained minimization was then performed using the OPLS3e force field, until an RMSD convergence of 0.30 Å was reached for the heavy atoms. Finally, the minimized NUDIX structures were aligned to the structure of NUDT1 (3Q93) with respect to the backbone atoms of the A chain.

DoGSite
The protein structures as prepared above were exported as PDB files, uploaded to the DoGSite server and assessed for binding sites and their corresponding DrugScores according to the published protocol (Volkamer et al., 2012). Pocket Size and DrugScores were extracted for all identified sites and annotated to pocket numbers.

FTMap
All prepared PDB files were uploaded to the FTMap server and interrogated for number of probes per cluster found according to the published protocols (Kozakov et al., 2015;Vajda et al., 2018).

CryptoSite
All prepared PDB files were uploaded to CryptoSite server and assessed for amino acid flexibility according to the published protocol (Cimermancic et al., 2016). Amino acid residues exceeding a Cryptic Site Score of 0.10 were extracted.

SiteMap
Prepared protein structures were submitted to SiteMap analyses as implemented in Schrödinger Suite 2019-1. The 5 top-ranked potential binding sites were identified. At least 15 site points per reported sites were required. The more restricted definition of hydrophobicity together with a standard grid (0.7 Å) were used. Site maps at 4 Å or more from the nearest site points were cropped. Clustering of the SiteMap parameters was performed using the heatmaply library in R 6 . The SiteMap parameters were transformed using "percentize, " and average linking was used for clustering.
Virtual Fragment Screening 1) Fragment subset selection: a subset of the ZINC Frags Now set (Irwin et al., 2012) was created by applying a number of filters implemented in a Knime workflow (Knime 3.5.2, Berthold et al., 2008). Foremost, only fragments available from a list of 19 preferred suppliers, composed by a team of experienced medicinal chemists were considered. These were then filtered using a cascade of structural filters, including REOS (Walters and Murcko, 2002), PAINS (Baell and Holloway, 2010) and a set of in-house filters (ScrapFilter) compiled over the years. Lipinski-type descriptors (SlogP, TPSA, AMW, NumLipinskiHBA, NumLipinskiHBD, NumRotatableBonds, NumHeavyAtoms, NumRings, NumAromaticRings) were then calculated using the RDKit Descriptor Calculation node. An additional descriptor HetRatio was then calculated as the ratio of NumLipinskiHBA and NumHeavyAtoms, and fragments with HetRatio <0.2 or >0.5 were filtered out. Finally, remaining outliers were removed by applying Gaussian Z-score normalization on the descriptor space, and then filtering out fragments with descriptor values deviating more than 3 units from the mean. The entire filtering cascade reduced the original input file of 704,041 structures as downloaded from ZINC to 205,891 fragments (Supplementary Data Sheet 1). 2) Ligand preparation: the selected fragment subset was then prepared for docking using LigPrep (Schrödinger): the OPLS3e force field was used for minimizations; possible ionization states at pH 7.0 ± 2.0 were generated using Epik (Shelley et al., 2007;Greenwood et al., 2010), metal binding states were added, and tautomers were generated; specified chiralities were retained and at most 4 stereoisomers were generated per structure. This yielded 345,044 structures for docking. 3) Receptor grid generation: Glide docking grids (Friesner et al., , 2006Halgren et al., 2004) were generated for each target protein by focusing the grid box on the center of the site with the highest Dscore as determined by SiteMap (Halgren, 2007(Halgren, , 2009. The size of the box enclosing the grid was set to 16 Å. No constraints, rotatable groups or excluded volumes were defined. 4) Virtual screening: The Virtual Screening Workflow as implemented in Schrödinger Suite was used for docking, scoring, and ranking of the top-1,000 fragments against the sites with the highest Dscore as determined by SiteMap. The workflow comprised a cascade of docking steps with increased accuracy (Glide HTVS → SP → XP), where the top-10% ranked ligands are passed on to the next step. After Glide XP docking the top-1,000 ranked fragments were retained for druggability assessment based on their combined docking scores.

Biochemical Screening
Small-molecule screening of NUDT2, NUDT15, and NUDT16 at a compound concentration of 10 µM was conducted using coupled enzymatic assays as already described for NUDT1 (Gad et al., 2014) and NUDT5 (Page et al., 2018). In brief this involved the purification of recombinant proteins following overexpression in E. coli and subsequent validation of coupled enzymatic assays based on cognate substrates for each of these [Ap4A for NUDT2, dGTP for NUDT15 and ADP for NUDT16 (Trésaugues et al., 2015)] 2 . The assays for NUDT2 and NUDT15 were based on enzymatic release of inorganic pyrophosphate and subsequent degradation to two molecules of inorganic phosphate in the presence of excess inorganic pyrophosphatase. Levels of inorganic phosphate are measured using an established procedure for such measurements in 384-well format in our lab (see e.g., Gad et al., 2014;Page et al., 2018). The screening of NUDT16 was based on enzymatic processing of ADP to release one molecule of inorganic phosphate, such that the coupled enzyme was not needed in this assay. All assays were optimized to allow their application at close to the K m of each substrate and with an incubation time chosen to ensure consumption of <30% of substrate and near linearity of assay signal increase with time.
Slightly different screening sets have been applied for the family members, with only a smaller subset of 5,500 compounds in common. All screens conducted at Chemical Biology Consortium Sweden have 16 each of negative (DMSO only -0% inhibition) and positive controls (no enzyme or inhibitor at concentration that gives 100% inhibition). These are located in columns 23 and 24 of the 384-well plates and they are used to normalize the response in each well-containing library compounds to a % inhibition value. Hit limits are defined based on the average plus three standard deviations of the response for all library compounds and hit rates are provided as the percentage of library compounds above this limit. The malachite green assay has been extensively used for screening purposes in our lab as it is associated with low interference rates, as evidenced by the lack of common hits appearing in screens of NUDT1 (Gad et al., 2014), NUDT5 (Page et al., 2018), dCTPase (Llona-Minguez et al., 2016), dUTPase and ITPase besides the herein reported NUDIX proteins (Supplementary Material -Screens using malachite green).

DSF Fragment Screening
NUDT1, NUDT2, NUDT5, and NUDT15 druggability was further experimentally assessed through fragment screening by DSF. Different fragments sets were screened over time, reflecting history and development of the available fragment sets. The initial fragment library comprised 450 fragments selected from the Chemical Biology Consortium Sweden reagent store at the Karolinska Institutet, and this set was screened against NUDT1 and NUDT5. Over time this library was complemented with sets of nucleobase analogs acquired from the NCI Developmental Therapeutics Program, which was grown to a subset of 200 compounds. This set, together with the 450-member library, thus totaling 650 fragments, was screened against NUDT2. Subsequently the 450-member library was complemented with 550 additional fragments from the Chemical Biology Consortium Sweden reagent store in order to generate a more diverse generic fragment library of 1,000 compounds. This second version together with the 200 nucleobase analogs acquired from NCI was screened against NUDT15. The proteins were expressed and purified as previously reported (Carreras-Puigvert et al., 2017). Fragment screening by DSF was essentially performed as described in detail by Niesen et al. (2007) All fragments were screened at a final concentration of 500 µM. Positive controls for each target were used at 100 µM. Assay buffer was composed of 100 mM Tris Acetate, 40 mM NaCl, and 10 mM Mg Acetate. Sypro Orange (S6650, Molecular Probes, 5000x) was used as the fluorescent dye. Native melting points of the proteins under the assay conditions were 50.0, 50.0, 76.0, and 57.0 • C for NUDT1, NUDT2, NUDT5, and NUDT15, respectively. Screening was performed in 96-well Q-PCR plates using a BioRad 96CFX real-time PCR detection system with temperature increments of 1.0 • C. More details of the assay conditions for each target are provided in the Supplementary Material -Fragment screen conditions.

Automated Arm -Step 1: DoGSite and FTMap Predict Druggable Catalytic Sites and Potentially Druggable Secondary Sites
We started by compiling a list of available high-resolution crystal structures of human NUDIX proteins ( Table 1). Due to the systematic work of the Structural Genomics Consortium, the majority of structures were solved with high sequence coverage (Supplementary Material -SiteMap secondary sites) and are often available together with screening data 1 . PDBs were imported to Maestro and prepared as described in the Method part. To enable application in the automated workflow, the prepared proteins were exported as new PDB files (Figure 2). In a first step, these files were uploaded to the DoGSite server. DoGSite is a web-based open-access algorithm that interrogates rigid protein structures for binding hotspots, including druggability prediction (Volkamer et al., 2012). Initially, a grid covering the protein identifies grid points that overlap with protein atoms. Application of a difference of Gaussian (DoG) filter then screens for preferred binding spots of sphere-like objects. Combination of several hotspots creates subpockets, which, if neighboring, are merged into a pocket. Several geometric and physico-chemical properties are automatically calculated for the predicted pockets and subpockets. A machine learning model trained on a set of known druggable proteins is then used to predict the druggability of the pockets, expressed as DrugScore. Reported as a factor between 0 and 1.0 a DrugScore over 0.5 and closer to 1.0 corresponds to good druggability.
Application of this algorithm to NUDIX crystal structures identified between two and ten pockets with a wide range of DrugScores (Figure 3 and Supplementary Material -DogSite). Between one and four bindings sites were judged druggable by the DoGSite algorithm. For some of the NUDIX hydrolases the natural substrates and their binding sites are yet to be deciphered. In addition, with the broad targeting possibilities provided by PROTACs (An and Fu, 2018) or allosteric inhibitors (Wenthur et al., 2014;Aretz et al., 2018), it is not necessarily required to target a catalytic pocket to convey a desired phenotype. Thus, the single highest-ranking site of each NUDIX structure, often corresponding to the known substrate binding site, was used to calculate a NUDIX druggability score. With an average druggability score of 0.80, the NUDIX family of proteins qualify as good predicted drug targets. As a positive control and validated target when it comes to chemical amenability, NUDT1 (3Q93) reaches a similar score of 0.81. The protein tyrosine phosphatase 1B (PTP1B) was included into the assessment (2HNP) as this is generally known to be a challenging target for classical drug discovery approaches. PTP1B, like other tyrosine phosphatases, contains a relatively polar substrate pocket which can accommodate phosphate isosteres. In the last two decades, small molecules targeting this pocket have been shown to fail eliciting sufficient effects in vivo (Zhang and Zhang, 2007;Krishnan et al., 2018). Instead a non-classical approach of allosteric inhibition is currently under evaluation in clinical trials (Mullard, 2018). When interrogated with DoGSite, PTP1B (2HNP) scores 0.72 only by combination of two subpockets through a narrow channel. An interesting observation is that all NUDIX members, except NUDT4 (0.72, 5LTU) and NUDT18 (0.63, 3GG6), individually score a high DrugScore around 0.80. Furthermore, it can be observed that several members, including NUDT6 (3H95, 0.78), NUDT7 (5T3P, 0.77), NUDT9 (1Q33, 0.82), NUDT17 (5LF8, 0.79), NUDT12 (6SCX, 0.85), and NUDT22 (5LF9, 0.81, Figure 4), are predicted to possess a second high-ranking pocket. These sites may increase the potential for pharmacological targeting of the corresponding proteins, for instance by masking a protein-protein interaction or a cofactor binding site. Identification of a second high-ranking pocket remote from the catalytic site, however, may also point toward an artifact in the crystal structure due to the construct used for expression or lack of electron density. For a comparison of resolved and expressed sequences please refer to Supplementary Material -SiteMap secondary sites. Thus, when inspected for their location, it became apparent that secondary sites can be distinguished as either neighboring to the top-ranked site or being located more remotely.
The druggability of the identified pockets can be further assessed using FTMap (Kozakov et al., 2015;Yueh et al., 2019). FTMap interrogates the protein surface for contributions FIGURE 2 | In silico druggability prioritization workflow -High-resolution crystal structures and comprehensive commercial space compound databases are freely available. PDBs are prepared by protein preparation wizard (Thorsell et al., 2009), inspected for structural flaws and limitations and exported as new structure files. In a first step, the structures are assessed for binding sites and site druggability by algorithms using rigid proteins or allowing for flexible behavior, i.e. DoGSite (Volkamer et al., 2012) and CryptoSite (Cimermancic et al., 2016). To eliminate artifacts due to protein construct choice, limitations of crystal structure resolution or co-crystallization of additional protein copies, all identified sites are counter-screened by FTMap (Kozakov et al., 2015). In a second and parallel step, the same structures are initially assessed for binding sites, druggability and pocket parameter using SiteMap (Halgren, 2007(Halgren, , 2009. A separate Knime (Berthold et al., 2008) workflow for the elimination of promiscuous functionalities is followed by Ligand Preparation which builds an applicable set of small molecules including a number of tautomers and stereoisomers. In a three-step cascade this set is then docked (Friesner et al., , 2006Halgren et al., 2004) against the highest-ranking site as identified by SiteMap. The median docking score of the top-1,000 fragments is used to assess druggability based on commercial fragment space. In a final step, prioritization of targets passing both parallel screening schemes may be performed based on published experimental screening data or own future screening efforts during translation to the lab.
to ligand-free energy. Small organic molecules, reflecting the complexity of potential active substances, are scored using a detailed energy function. Some regions bind several clusters of probes and thus identify as a binding hotspot. Earlier, this orthogonal method was applied on pockets identified by CryptoSite (Vajda et al., 2018), where high druggability would correspond to an FTMap cluster populating these sites and containing at least 16 probes. When similarly examined for the number of bound probes, all highest-ranking sites of each NUDIX protein except for NUDT4 (5LTU) and NUDT18 (3GG6), reached more than 16 probes confirming the good druggability of the expected active sites of the enzyme family (Figure 3 and Supplementary Material -FTMap). NUDT4 (5LTU) and NUDT18 (3GG6), which showed a lower DrugScore before, failed to contain more than 16 probes and are the only family members with a lower druggability assessment based on DoGSite and FTMap. Assessment of PTP1B (2HNP) returned all FTMap probe clusters to be located in the smaller of the two sites predicted by DoGSite (DrugScore 0.38). When evaluated with FTMap, secondary sites of NUDT6 (3H95), NUDT7 (5T3P), NUDT9 (1Q33), NUDT17 (5LF8), and NUDT22 (5LF9, Figure 4) neighboring the highest-ranking site tend to harbor more probes than those sites found remotely. All remote secondary sites, i.e., NUDT9 (1Q33), NUDT17 (5LF8) and NUDT 12 (6SCX), fail to incorporate the required 16 probes. Of those located much closer to the highest-ranking pocket, only NUDT7 (5T3P) fails to accommodate 16 or more probes underscoring the potential use in pharmacological targeting additionally to the neighboring highest-ranking pocket.

Automated Arm -Step 2: CryptoSite and FTMap Confirm Druggable Active Binding Pockets With High Flexibility
Druggability predictions using DoGSite are based on rigid protein structures, not allowing for flexibility typically induced by larger natural substrates or specifically designed small molecules (Michel et al., 2019). Another aspect is the potential existence of allosteric sites. Typically, a crystal structure of a compound bound to the allosteric site or comprehensive protein dynamics calculations based on several distinct crystal structures are required for their discovery. The CryptoSite algorithm however, can give first insights in whether an already identified active site or a shallow pocket allows for high single amino acid flexibility (Cimermancic et al., 2016). Networks of these flexible cryptic sites could indicate concerted movements of the protein, possibly forming an allosteric site or conformational changes relevant for substrate binding and protein function. Cryptic scores returned by the algorithm above 0.10 and higher consider a site as cryptic and thus flexible.
When interrogated with CryptoSite, NUDIX hydrolases showed an increased number of cryptic sites around the highestranking site as identified before by DoGSite, indicating an extended and flexible three-dimensional network of amino acid residues (Figure 3 and Supplementary Material -CryptoSite). Between 11 and 86 and on average 40 residues scored higher than 0.10 (NUDT1, 42; PTP1B, 33). The highest scoring residues reached values between 0.26 and 0.52 and on average 0.40 (NUDT1, 0.49; PTP1B, 0.33). NUDT10 failed to form a cryptic network while NUDT6, NUDT7 and NUDT22 (Figure 4) possessed a second cluster of cryptic sites overlapping with the second highest-ranking sites as identified by DoGSite. Except for NUDT4 (5LTU) and NUDT18 (3GG6), all cryptic networks of the protein family members were populated by more than 16 probes in FTMap (Figure 3 and Supplementary Material -FTMap).
The result of this initial druggability assessment suggest that NUDIX hydrolases are on average good drug targets with regard to their expected or known active sites. Further, only a few members of the family possess a second druggable site as based on DoGSite and FTMap analyses, and even fewer exhibit conformational flexible sites remote from the identified active site.

User Arm -Step 1: SiteMap Binding Site Prediction and Druggability Assessment
In a second parallel approach we assessed druggability using SiteMap and a Glide-based virtual screening workflow applied to a KNIME filtered fragment library (Figure 2). SiteMap, an application to identify binding pockets and predict druggability, is implemented in the Schrödinger small-molecule modeling suite. Binding pockets identified on the protein surface are given a score, the Dscore, which is based on pocket parameters such as size, exposure to solvent, enclosure by protein, ratio of hydrogen bond donors and acceptors and importantly hydrophilicity, hydrophobicity and a determined ratio thereof. This druggability score favors proteins with a higher hydrophobic/hydrophilic ratio and thus allows for an early assessment of pocket polarity as required for binding of small-molecule drugs. Typical Dscores for druggable protein pockets are above 1.108 while Dscores below 0.871 suggest a difficult to drug protein (Halgren, 2007(Halgren, , 2009). In addition, comparing individual pocket parameters allows for FIGURE 4 | Druggable sites of NUDT22 as identified in the automated and the user arm: (A) Rigid assessment with DoGSite identifies a second highly druggable site (orange, DrugScore 0.81) in close proximity to the highest-ranking site (yellow, DrugScore 0.82); (B) Flexible assessment with CryptoSite predicts 52 amino acid residues around both pockets as part of an extended 3D network. Highlighted amino acid residues possess cryptic score over 0.2; (C) FTMap confirms all cluster hotspots and at least 16 probes within 5 Å radius of either of the binding sites (site 1: pink; site 2: light blue) or high cryptic value sites. (D) SiteMap combines the two sites identified by DogSite into a single large binding pocket with an evenly distribution of hydrophobic (yellow) and hydrophilic (red and purple) patches; (E) cascade docking of the ZINC fragment library shows a preference for the active site, while the second druggable site is only engaged by members of one chemotype among the top 1000 fragments. The assessment highlights NUDT22 comprising two adjacent druggable sites which in a prospective drug discovery campaign could be targeted separately or in combination.
a detailed picture of druggability and for specific assessment of proteins with similar Dscores and/or sequence.
When SiteMap was applied on the NUDIX hydrolases, the obtained Dscores of the highest-ranking sites were between 0.51 and 1.11 with an average of 0.88 (Figure 5A and Supplementary Material -SiteMap). Interestingly, except for NUDT4, all identified highest-ranking sites were in overlapping regions or even identical with sites identified with DoGSite FIGURE 5 | Predictive value of in silico assessment and docking for in vitro screening hit rates and suitability of fragment screens for chemical probe generation: (A) in silico druggability assessment of pockets identified by SiteMap correlates well with the observed ZINC Fragments Now (ZFN) median docking scores of the highest ranking 1000 fragments (R = −0.717); (B) when translated to in vitro to either DSF or X-Ray screens using fragment libraries with overlapping chemical space, a similar correlation can be observed, highlighting the suitability of a purely in silico druggability workflow as a standalone method (R = −0.826); (C) DSF fragment screens reported here yielded strong stabilizing fragment hits that are structural subunits of reported ligands for NUDT1 (Gad et al., 2014;Huber et al., 2014).   04, Figure 3). Due to the chosen cut-off distance to merge identified pockets (5 Å), SiteMap identified large extended pockets which included several subpockets. Furthermore, as NUDT5, NUDT12, and NUDT15 are functional homodimers, these have two high-ranking pockets. Of these, NUDT12 and NUDT15 contain a third druggable site. NUDT7on the other hand possesses a second high-ranking pocket (Dscore 0.82).
Clustering of the highest-scoring SiteMap pockets using the primary SiteMap parameters shows a clear separation of druggable versus undruggable NUDIX members and allows for comparison of members which are (dis)similar in terms of their active site properties rather than based on sequence (dis)similarities (Figure 6). Full-length sequence identity is generally low among the NUDIX family members (see Supplementary Material, Percentage Identity), with the exception of NUDT3, NUDT4, NUDT10, and NUDT11. The former 3 being deemed challenging targets and NUDT11 was not evaluated due to lack of structural data. General selectivity issues are thus not anticipated when targeting a specific NUDIX family member. On the other hand, several members have some degree of overlap in their substrate specificity, e.g. NUDT1, 15 and 18 as a subgroup, and NUDT5, NUDT9, NUDT12, and NUDT14 as a second subgroup (Carreras-Puigvert et al., 2017), implying that their active sites share some structural similarity. In this context, the SiteMap parameter profile of NUDT5 is a good reference as it has the highest Dscore of all members. In comparison, NUDT9 has a less favorable balance in hydrophobic and hydrophilic character, while the active site of NUDT12 is somewhat more exposed than NUDT5, but also larger. Although these 3 members have high Dscores, classing them clearly as druggable, they vary in their capacity to accommodate different fragments, as it is reflected by their different median docking scores further down (Figure 5). NUDT14 is considered challenging, primarily due to its smaller active site which also is more exposed.
NUDT15 and NUDT18 have been hypothesized to be able to act as back-up enzymes for NUDT1 due to their overlapping substrate specificities. Comparison of their SiteMap parameter profiles shows clear differences despite NUDT1 and NUDT15 being among the NUDIX members with highest Dscores. The active site of NUDT15 is somewhat smaller and more enclosed than for NUDT1 due to an inward movement of a helix (Carter et al., 2015). NUDT18 is considered to be challenging due to its small and more exposed active site. This also results in a poor fragment scoring profile (please see below). Collectively, these differences in site parameters allow for the development of highly selective chemical probes, as witnessed for NUDT1 and NUDT5 (Gad et al., 2014;Page et al., 2018).
User Arm -Step 2: in silico Docking of ZINC Library As a final druggability assessment and potential to identify starting points amenable for a fragment growing-based drug discovery campaign out of commercial fragment space, we performed in silico docking campaigns of the ZINC Frag now database (Irwin et al., 2012) against the structures under consideration. The comprehensive fragment library was filtered against unwanted structural motifs and prepared for docking using a KNIME workflow (Berthold et al., 2008). For a detailed description, please refer to the Methods part of this manuscript. Ultimately, 205,891 fragments remained after filtering the original set of 704,041 ZINC fragments. Using the Schrödinger suite, ligand preparation and grid generation for the highestranking pocket as identified by SiteMap were performed to enable virtual screening of this subset applying three stages of accuracy. In each stage, the top-ranked 10% of compounds were retained and passed on to the next stage. Finally, the top-ranked 1,000 fragments were used to calculate a median docking score enabling assessment of druggability based on commercially available fragment space. The returned median docking scores, where lower is better, ranged from−4.0 to −11.4 kcal/mol with an average of−6.8 kcal/mol. NUDT1 and NUDT5, both validated drug targets in the literature, scored−11.4 and −9.9 kcal/mol respectively. In addition, and judged by the median docking score, NUDT17 (−8.8 kcal/mol) is a third promising drug target. PTP1B (−6.8 kcal/mol) scores average among the NUDIX family members, while the scores for NUDT4 (−4.0 kcal/mol), NUDT10 (−4.3 kcal/mol), NUDT20 (−4.7 kcal/mol) and NUDT18 (−5.0 kcal/mol) indicate a potentially challenging drug discovery campaign ( Figure 5A). When using the median docking scores and plotted against their respective SiteMap Dscores, a good inverse correlation (R = −0.717, Bravais-Pearson) can be observed (Figure 5), suggesting an in silico-based prioritization scheme of drug discovery campaigns against NUDIX proteins. Thus, fragment docking against the top-ranked SiteMap pockets recapitulates their druggability potential but additionally provides potential starting points readily accessible for fragment-based drug discovery campaigns.
The hydrophobicity of small-molecule drugs is a property which needs to be delicately balanced since it affects multiple parameters including solubility, permeability, plasma protein binding and metabolism. Druggable binding pockets of target proteins therefore require a certain hydrophobic-hydrophilic balance to accommodate ligands with drug-like properties. When applying a balance of at least 0.5 the SiteMap assessment prefers NUDT1 (3Q93, 0.69), NUDT5 (6GRU, 1.34), NUDT7 (5T3P, 0.72), NUDT15 (5BON, 0.62), NUDT17 (5LF8, 0.50), and NUDT22 (5LF9, 0.58) and disfavors NUDT3 (2FVV, 0.01), NUDT4 (5LTU, 0.00), NUDT10 (3MCF, 0.01), and PTP1B (2HNP, 0.05). With regard to their returned median fragment docking scores, pocket polarity might correlate with either higher or lower scores (Supplementary Material -ZINC fragment docking and SiteMap). A possible explanation is, that the library was filtered to fit a drug-like profile and thus preselects for druggable proteins itself, ignoring their respective pocket properties. Importantly, none of the crystal structures used here were bound to high-affinity lead compounds originating from drug discovery campaigns and hence no hydrophobic subpockets induced by such compounds where probed in this study.
When combined, the top-1,000 ranked fragments obtained for the 18 protein targets comprised 13,203 unique fragments, indicating a certain amount of "promiscuity, " i.e., fragments binding to 2 or more proteins (36% of fragments). In fact, 73 fragments bound to 6 or more targets (see Supplementary Material -Fragment promiscuity), with one fragment hitting 11 out of 18 proteins. It should be noted that the average docking scores were rather poor, ranging from −7.56 to −5.53 kcal/mol. Of interest is the notion that the proteins deemed undruggable by SiteMap appeared to be enriched for promiscuous fragment hits (except NUDT10 and NUDT20), as opposed to druggable proteins (except NUDT15 and NUDT22).
A certain degree of promiscuity should be expected when docking 200K fragments to multiple targets, as this is in line with the basic concept of fragment-based drug discovery, i.e., the ability of low-complexity fragments to interact with a multitude of (sub)pockets across a wide range of proteins.

Correlation With Experimental Fragment-Based and Biochemical Screening Data
A number of fragment screens against NUDIX proteins have been performed by others and us 1 . For a list of applied screening techniques, library sets and hit rates, please see Table 2. When the hit rates of the fragment screens were compared with the in silico-derived median ZFN docking scores a good correlation was observed (Bravais-Pearson 0.826; Figure 5B). This underscores the applicability of in silico docking for rapid protein druggability assessment. In agreement with most computational assessments, NUDT1 and NUDT5 yield high hit rates of 9.8% and 14.7%, respectively, while the experimental hit rate of 0.9% for NUDT4 confirms its challenging character predicted by computational assessment. Other NUDIX proteins are in the range of common hit rates for fragment screens and between 1.6 and 5.6% (Aretz et al., 2014). This observation holds true for different sets screened by different groups (Figures 5B, 7). Interestingly, the DSF screen against NUDT1 found two structures with a strong thermal stabilization of 5 • C. These structures are fragments of the reported NUDT1 inhibitors TH588 (IC50: 2.1 nM) and Crizotinib (IC50: 48 nM) and thus underscore the suitability of DSF to find starting points for lead generation ( Figure 5C). However, DSF is not feasible for proteins with high native melting points (e.g., NUDT5, 76 • C), and here in silico fragment screening against druggable sites may be particularly advantageous. Several biochemical screening campaigns against NUDIX proteins have also been performed in our laboratories. While compound libraries have varied somewhat between targets, reflecting development of the compound libraries over time, there is a small core set of about 5,300 chemically diverse compounds that have been tested for all proteins. It is noteworthy that these screens were performed based on a common screening platform employing a coupled enzymatic assay with a malachite green readout. This cost-effective assay has been frequently employed in our lab, including screens on other nucleotideprocessing targets such as dCTPase (Llona-Minguez et al., 2016), ITPase and dUTPase, and with robust performance in compound sets beyond 100,000 compounds (all unpublished). A key reason for this is the low rates of interference with the coupled enzymes and the absorbance readout at 630 nm, as evidenced by a low appearance of common hits. Also, the presence of PAINS and aggregators within hit lists is generally low for this family of proteins (Supplementary Material -Screens using malachite green), demonstrating robust screening performance of the recombinantly produced proteins and other assay components. The biochemical screen outcomes are summarized in detail in Table 2 and in the Supplementary Material -Screens using malachite green. In line with assessments of chemical amenability and learnings in the fragment-based screens, the majority of targets generated hits that confirmed activity in follow-up studies, with NUDT1 demonstrating an extreme hit rate in this sub-set. This significant amenability is in line with the publication of hits from multiple groups. A critical outlier in this set was NUDT5, which demonstrated hit rates as low as notoriously challenging targets dUTPase and ITPase, while predictions and fragmentbased screening showed the opposite (Supplementary Material -Fragment screening hit rates). Already at the time of screening we had a reason to revisit the screening data for NUDT5 and follow-up studies demonstrated competition between active site hits and a structurally important Mg 2+ (Costa and Dieckmann, 2011;Vardakou et al., 2014). After correction of the assay buffer, by lowering the MgCl 2 concentration 10-fold, we observed significantly higher hit rates in the subsequently applied compound sets and identified compounds that could be further optimized to nM potencies (Page et al., 2018). As a general observation and for the five NUDIX protein members screened, no correlation between ZINC fragment docking scores and observed hit rates from biochemical screening can be observed (Supplementary Material -Screens using malachite green). In contrast to the covered fragment library chemical space, a rule-of-five compliant library of few thousand compounds may complement the search for a chemical starting point but may be limited in coverage of chemical space itself. However, in the past we have shown, that embarking on drug discovery campaigns from observed hits in both fragment and biochemical screen lead to successful generation of chemical probes for a number of NUDIX protein family members and other pyrophosphatases (Gad et al., 2014;Llona-Minguez et al., 2016;Page et al., 2018)

CONCLUSION AND SUMMARY
Here we presented a dual in silico druggability assessment workflow suitable for large-scale evaluation of proteins and protein families, applied to the NUDIX family. Initially, we introduced a hands-on workflow solely based on the protein crystal structure using the open access server of DogSite (Volkamer et al., 2012) and FTMap (Kozakov et al., 2015) for rigid and CryptoSite (Cimermancic et al., 2016) and FTMap for dynamic assessment. Importantly, before using these servers, thorough manual protein structure verification is necessary to exclude artifacts due to crystal packing, construct used and resolution limits. On the one hand, DogSite returns both identity and score of druggable sites, while FTMap docks small organic solvent molecules. Especially in cases of sparsely evaluated proteins or protein complexes this dual assessment may be beneficial for structural assessment and potential chemical probe generation. On the other hand, CryptoSite identifies conformationally active amino acid residues. In the past, the returned cryptic scores have been correlated with FTMap solvent docking and eased decision on whether or where potential allosteric sites may be situated (Vajda et al., 2018). Timewise, this quick computational assessment may be achieved within days for singular proteins and weeks for small protein families. Depending on local load and choice of sever location the return time is usually minutes for DogSite, hours for FTMap, and 1 day for CryptoSite. The detailed assessment and correlation of data from the different algorithms allows for the rationalization of targeting strategies. In case of the NUDIX proteins, NUDT22 for example showed to have high scores in DogSite and FTMap with CryptoSite confirming flexibility around two closely related sites. Further, in the past we have shown that comparing different crystals structures of the same protein can allow for the observation of targetable conformations more suitable smallmolecule development (Michel et al., 2019). With the open access deposition of all screening data by the SGC, a similar albeit more time consuming approach is possible for a number of NUDIX proteins 1 .
In a second, user-guided arm we assessed protein druggability employing several implemented functions in Schrödinger's commercial small-molecule modeling suite combined with freely available KNIME (Berthold et al., 2008). First, proteins were interrogated for potential binding pockets and the corresponding DScores using SiteMap. The highest ranking pockets were then FIGURE 7 | Comparison of fragment libraries used in screens against NUDIX protein family members: Laboratory for Chemical Biology at Karolinska Institutet (LCBKI) DSF fragment library of the first generation (DSF 1.0, blue), additional members of a second generation DSF fragment library (DSF 2.0, red), Nucleobase analogs (NCI, light blue) and Diamond-SGC Poised Library (DSPL, green) with respect to: (A) physicochemical property coverage and diversity, expressed as the first two principal components (pc1 and pc2) obtained from a principal component analysis (PCA) on six Lipinski-type properties; (B) clogP against Molecular weight. used to perform cascade docking with a filtered ZINC fragment library (Irwin et al., 2012). Subsequently, the median docking score of the top-1,000 ranked fragments was used as a chemical space-based druggability assessment. Both parameters, DScore and median docking score form the basis of this second in silico druggability assessment and require few days per protein, depending on the user set up and the size of the used in silico library. The observed docking scores correlate well with predicted DScores ( Figure 5A) and additionally provide commercially accessible chemical starting points for the development of chemical probes. At last, when compared with experimental fragment screens based on X-ray crystallography, a covalent set and thermal stabilization in DSF, a similar correlation was observed between hit rates and median docking scores ( Figure 5B), even when using chemically distinct screening sets (Figure 7). This supports the applicability of an in silico druggability workflow as a standalone method for protein assessment and speaks for the chemical space coverage of fragment libraries generated at CBCS and Diamond/SGC (Michel et al., 2019) 7 .
In summary, we report here a fully in silico druggability assessment of the NUDIX protein family, that serves as a standalone method and a workflow to identify the most suitable members for a drug discovery campaign. We show that the dual assessment correlates well with experimental results and further allows for the in silico identification of secondary druggable sites, alternative targeting strategies and structural basis for fragment growing campaigns. Importantly, the workflow allows for rapid assessment of any protein with reported structures in the protein data bank and as such should be broadly applicable in early drug discovery campaigns.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/Supplementary Material.