HMM-based profiling identifies the binding to divalent cations and nucleotides as common denominators of suramin targets

Introduction: Suramin is one of the pharmacopeia’s most promiscuous drugs. Originally developed for African trypanosomiasis, suramin was also used for onchocerciasis and it has been proposed as an anticancer agent, antiviral drug, therapy for arthritis, autism, and antidote for snake bites. Target proteins of suramin have been described from different species. Here we identify the common motifs among these various targets, aiming to explain the promiscuous nature of suramin. Methods: We have searched for suramin target proteins in the literature and in chemical databases. Applying rigorous inclusion criteria, a list of 44 diverse proteins was assembled with experimental evidence for direct interaction with, and inhibition by, suramin. Hidden Markov model-based target profiling was performed by running the full set of Pfam protein family domains against these proteins. Results: Common denominators were identified by mapping the identified Pfam domains to molecular function gene ontology terms. This in silico pipeline identified nucleotide binding, nucleic acid binding, and binding to divalent cations as the most common denominators of the suramin targets. Discussion: Our results suggest that the extraordinary polypharmacology of suramin may be caused by its ability to inhibit the interaction of proteins with nucleotides or nucleic acids and with divalent cations (Mg2+, Ca2+, Zn2+). Suramin is well known to inhibit nucleotide receptors and nucleic acid-binding enzymes. The association with divalent cations is new and might be key towards the design of better, more selective inhibitors.


Introduction
Suramin is one of the oldest drugs in use today. It was developed by Bayer in 1916 for African trypanosomiasis, has been on the WHO Model List of Essential Medicines since its onset in 1977, and is still the drug of choice for treating the first-stage of human African trypanosomiasis caused by Trypanosoma brucei rhodesiense (Lejon et al., 2013). Suramin is a colorless derivative of the azo-dye trypan blue (Wainwright, 2010). It is a large molecule (the hexasodium salt has a molecular weight of 1429 g/mol), carries six negative charges at physiological pH, is not orally bioavailable, strongly binds to albumin and other serum proteins, and lacks drug-like properties concerning the numbers of hydrogen bond donors or acceptors (Wiedemar et al., 2020). Furthermore, suramin causes various adverse effects, in particular hypersensitivity reactions and nephrotoxicity (WHO, 2013). Yet in spite of all these shortcomings, suramin has found numerous potential areas of application in the course of its hundred years of history.
Besides human African trypanosomiasis, suramin is also being used for Surra (also known as mal de caderas), a livestock disease that is caused by Trypanosoma evansi (Giordani et al., 2016). Suramin had been in clinical use against river blindness (caused by the nematode Onchocerca volvulus) (Hawking, 1978), until it got replaced by ivermectin in the early 1990s. Suramin was in the clinical phases of development against various forms of cancer (Larsen, 1993) and also against human immunodeficiency virus (De Clercq, 1987). It inhibits host cell entry by several viruses, including SARS-CoV-2 virus (Salgado-Benvindo et al., 2020). Other potential uses include arthritis and autism (Sahu et al., 2012;Naviaux et al., 2017). Furthermore, suramin was proposed as a protective agent against liver or kidney damage (Liu & Zhuang, 2011), and even as an antidote for snakebite due to its ability to inhibit the thrombinlike proteases of snake venom (Murakami et al., 2005). In accordance with such a multifaceted use, a large variety of different proteins have been proposed as targets of suramin. These include enzymes of core metabolism, enzymes involved in nucleic acid replication and epigenetics, proteases, kinases, and also several membrane receptor channels [summarized in (Wiedemar et al., 2020)]. To our knowledge, no other drug has as many different targets as suramin.
Here we perform a bioinformatic target profiling of suramin based on the hypothesis that the many targets of suramin, although of highly diverse biological nature, possess common motifs that suramin is binding to. To identify such common motifs we are using HMMer, which implements profile hidden Markov models (HMMs) built from multiple sequence alignments, as probabilistic models to score sequence homology in a position-dependent way (Eddy, 1998(Eddy, , 2011. Combining HMMer searches with GO term classification (Ashburner et al., 2000;Alborzi et al., 2018), we aim to identify common denominators, i.e., protein domains that are overrepresented, among the suramin targets. The overall in silico approach is outlined in Figure 1. It is sequence-based and complementary to the structure-based approach taken by Dey and co-workers (Dey et al., 2021). Both have the same aim: to understand the nature of suramin's promiscuous mode of action and, based on this knowledge, to design more specific inhibitors with fewer side effects.

Protein sequences
All protein sequences were obtained from UniProt (www.uniprot. org) (UniProt-Consortium, 2021) (RRID:SCR_004426) except those of viruses, which were obtained from PDB (https://www.rcsb.org) (RRID:SCR_012820) (PDB-Consortium, 2019). PDB was resorted to in order to make sure that the processed, functional polypeptides were retrieved rather than the whole viral polyproteins. Reviewed entries were used preferably. For posttranslationally cleaved proteins (e.g., thrombin), the sequence of the precursor was used (e.g., prothrombin). For proteins with several isoforms, only the isoform stated by the reference was included; if no such information was provided, the longest isoform was selected.

Perl scripting
All procedures were automated with self-made Perl (RRID:SCR_ 018313) scripts on a BioLinux platform (Field et al., 2006) (RRID:SCR_ 005399). The scripts served to run the described programs for profile and motif searching, and to parse the programs' output into tabular Flowchart of the bioinformatic pipeline from published proteins to common denominators of suramin targets (*note that Table 2 does not show all the 924 identified GO terms but only those that were associated to at least five different suramin target proteins).

Frontiers in Drug Discovery
frontiersin.org format for further analysis. All scripts were tested for accuracy by monitoring the overall numbers of sequences processed and by manual re-testing of individual samples. The scripts are available on request.

Calculation of isoelectric points
Isoelectric points of amino acid sequences were determined with the command "iep" of EMBOSS 6.6.0 (Rice et al., 2000) (RRID:SCR_008493). It calculates the isoelectric point of an amino acid sequence by estimating the overall charge at different pH values. This was performed for the suramin targets as well as for the human proteome, downloaded from UniProt (www.uniprot.org; accession UP000005640; date: 06.01.20). Statistical tests were done in RStudio (version 1.2.1335) (RRID:SCR_ 000432) using R (version 3.6.0) (RRID:SCR_001905).

Motif searches and GO terms
Motifs were identified using "hmmscan" with tabular output of the HMMer 3.2.1 package (hmmer.org/) (Eddy, 2009(Eddy, , 2011 (RRID:SCR_ 005305) against Pfam version 32.0 (El-Gebali et al., 2019) (RRID:SCR_ 004726). The expectancy (E) value cut-off was set to 0.01. Pfam accessions were linked to 'molecular function' GO terms by using a text file produced by GODomainMiner (Alborzi et al., 2018) providing associations between GO term id and Pfam accession numbers (godm.loria.fr/). The GO names were retrieved from QuickGO (www.ebi.ac.uk/QuickGO/) (Binns et al., 2009) (RRID:SCR_004608). For quality control the targets associated with these GO terms were compared to the Denylist of the respective GO term on QuickGO (where the Denylist is called Blacklist) (Binns et al., 2009). QuickGO was further used to link GO terms via "is a" relationship to higher-order terms using the ancestor chart.

Collection of the published suramin targets
A comprehensive list of suramin targets was required as the starting point for target profiling. We aimed to assemble all the proteins that had been published as putative targets of suramin in the scientific literature by performing compound searches in the chemical databases ChEMBL and PubChem. The reported proteins were supplemented with those obtained from papers on suramin targets found in PubMed, and from the references therein. Finally, the solved co-crystal structures of suramin deposited in PDB (Wiedemar et al., 2020) were added. This resulted in an initial, maximally inclusive list of 127 candidate suramin target proteins from 36 different species encompassing mammals (n = 131 sequences) and other vertebrates (n = 6), fungi (n = 1), protozoa (n = 18), plants (n = 1), bacteria (n = 10), and viruses (n = 13) (Supplementary Table S1). Additional information that was collected alongside the targets included the type of assay that was used, the potency of suramin in that assay, and the nature of the evidence for interaction of suramin with its proposed target.

Curation of the suramin target list
Special care was taken to use only proteins that physically interact with suramin. Thus the priority for curation of the suramin target list was to minimize the number of false positives; this meant accepting a few false negatives-i.e., proteins that had been wrongly excluded from the list-rather than including proteins that did not actually bind suramin. The following were used as inclusion criteria: inhibition of activity by at least 50% by a suramin concentration of no more than 50 μM, determined in an enzyme-based assay (as opposed to wholecell assay), except for cell-based assays with viral proteins. Regarding protein complexes of multiple subunits, only the subunit interacting with suramin was included. A subunit was considered to be interacting with suramin if either only one subunit had been included in the assay, or if binding to a specific subunit had been validated experimentally. Otherwise, or if no such information was provided, the whole complex was excluded. Cases where suramin inhibited protein-protein interaction (rather than protein function) were excluded as well. The resulting list of targets consisted of 50 proteins, experimentally validated to be inhibited by suramin (column E of Supplementary Table S1; Figure 1).

Redundancy reduction of the suramin targets
To avoid a possible bias from overrepresentation of certain proteins among the suramin targets, e.g., due to the presence of  closely related orthologues from different species, redundancy reduction of the sequence set was carried out as follows. All pairwise global alignments of the 50 amino acid sequences were performed and the distance d between each sequence pair was calculated. Based on the frequency distribution of d, a cut-off of 0.6 was chosen ( Figure 2). Sequence pairs with a distance below that cut-off were regarded as highly similar, and of a group of highly similar sequences only the longest sequence was kept. After this final step of curation, 44 diverse proteins from 14 different species remained ( Table 1; column E of Supplementary Table S1). This renders suramin the most promiscuous drug, surpassing other polypharmacological agents with respect to the number of reported targets (Haupt et al., 2013).

Isoelectric points of the suramin targets
Suramin is much more active against the glycolytic enzymes of T. brucei than against their mammalian orthologues (Willson et al., 1993). At the same time, the glycolytic enzymes of T. brucei have clearly higher isoelectric points (pI between 9 and 11) than their mammalian counterparts (Misset & Opperdoes, 1987). This observation has raised the hypothesis that the negatively charged suramin preferably binds the trypanosomal enzymes because it interacts with clusters of positively charged amino acids that are absent from the mammalian enzymes (Willson et al., 1993). We therefore tested whether the suramin target set has an overrepresentation of positive charges in general. However, the mean isoelectric point of the 44 suramin targets (Table 1) was only slightly higher (pI 7.67) than that of the predicted human proteome (pI 7.40). This difference was not statistically significant (p = .27, Welch two sample t-test).

HMM profiling of the suramin targets
To identify all the functional motifs in the suramin target sequences, the set of 44 proteins (Table 1) was run against the complete Pfam collection of protein domain families (El-Gebali et al., 2019) with the program hmmscan of the HMMer3 suite (Eddy, 2009). The Pfam database contained 18,000 positiondependent scoring matrices for hidden Markov model-based profile searches (El-Gebali et al., 2019). Using an expectancy (E-value) cut-off of 0.01, this search returned on average eight hits per protein. The total number of different Pfam domains that was detected in the suramin targets was 142. Only 16 of these were associated with more than one protein, and none was associated with more than three, underscoring the heterogeneity of the presumed suramin targets.

Common denominators of the HMM profiles
Given the diversity not only of the suramin targets but also of the associated Pfam domains, we had to move up yet another level of abstraction to identify potential common denominators. This was done by linking the identified Pfam domains to GO (gene ontology) terms (Ashburner et al., 2000) based on the annotations provided by GODomainMiner (Alborzi et al., 2018). Thus GO terms for molecular function were assigned to the suramin targets via their Pfam domains. The resulting annotations were examined against the Denylist provided by QuickGO (Binns et al., 2009) and annotations that were likely to be incorrect were removed. After this purification step, there remained thirteen GO terms that matched five or more targets (Table 2). Two common themes emerged from this analysis: binding to nucleotides or nucleic acids, and binding to divalent cations such as Mg 2+ , Ca 2+ , or Zn 2+ . This was confirmed by a QuickGO ancestor chart (Binns et al., 2009) to determine the higher-order GO terms, which identified "cation binding" and "nucleotide binding" as the two most frequent entries (Table 3).

Discussion
Suramin stands out as an atypical molecule for a drug due to 1) its high molecular weight, 2) its comparably high degree of flexibility (Haupt et al., 2013), and 3) the fact that it carries six negative charges at physiological pH. These properties likely account for suramin's polypharmacology, allowing it to bind to diverse kinds of target proteins. However, while suramin is a promiscuous drug, it is not indiscriminate. It binds its many targets in a selective way, which accounts for the fact that suramin is actually used as a therapeutic agent (and this for over a century). Suramin experiences no metabolism in the human body, and it has a an extremely long half-life of elimination of over 50 days (Burri et al., 2014). Understanding why suramin binds to so many different target proteins is the first step towards better, more specific inhibitors.
The prerequisite for this is a scrutinized list of suramin targets. To our knowledge, Supplementary Table S1 and Table 1 provide the first comprehensive list of proteins that are directly inhibited by suramin as based on experimental evidence. After an extensive search of the literature resulting in a maximally inclusive list of 127 putative  Table S1), the focus for the subsequent bioinformatic pipeline (Figure 1) was on specificity rather than sensitivity. Stringent criteria were applied to ensure that only proteins were included in Table 1 that physically interact with suramin. While these inclusion criteria were somewhat subjective, the subsequent analyses proceeded in an unbiased way. The cut-off for redundancy reduction of d = .6 was obvious from the frequency distribution of the pairwise distances ( Figure 2). The direct mapping of the identified suramin targets to GO terms was not possible in an unbiased way because the targets stemmed from different species (Table 1), not all of which had the same high level of annotation as e.g., H. sapiens. This is why the targets were first linked to the complete set of HMM profiles from PFAM, and then the PFAM profiles were linked to GO terms in an unbiased way. Finally, the QuickGO denylist of frequent matchers allowed to remove likely wrong associations. Thus we are confident that the identified common denominators shown in Table 2 are unbiased and indeed reflect the binding properties of suramin.
The predominant GO terms associated with the identified Pfam motifs of the suramin targets were "nucleotide-binding", "anion binding", and "cation-binding". The terms "nucleic acid binding" and "endopeptidase activity" were less frequent. Nucleotide binding as well as nucleic acid binding were to be expected given that suramin is well known to inhibit not only  polymerases and other enzymes in nucleic acid metabolism but also ATP receptors (Wiedemar et al., 2020). Seventeen of the 22 targets linked to the GO term "anion binding" were linked also to the more specific term "ATP binding", which is in turn is associated with the broader term "nucleotide binding". In addition, anion binding can be explained by the frequent binding of the anionic suramin to positively charged amino acids, which can bind other anions as well-in particular heparin (Table 3; Dey et al., 2021). Although associated only with eight targets, the GO term "endopeptidase activity" is in agreement with previous findings (Morty et al., 1998). Cation-binding was more surprising-at least to us-but actually had emerged on top of the list of common denominators (Table 3). This indicates that suramin might interfere with the binding of proteins to divalent cations (Mg 2+ , Ca 2+ , or Zn 2+ ). The negative charge of suramin suggests that it disturbs ion binding by interacting with the cations themselves; an interaction with Mg 2+ might even explain some of suramin's effects on DNA-and RNA-binding enzymes. However, suramin's action was not dependent on the concentration of divalent cations (Fong & Good, 1972), which would argue against a direct interaction between suramin and the cations. Direct interaction between suramin and cation binding sites on the target proteins is an alternative possibility. Suramin was shown to bind to the same amino acids on the P2X1 receptor that are involved in the binding of divalent cations (Igawa et al., 2015). Co-crystal structures with suramin have been solved mainly for viral proteins and snake venom proteases (Wiedemar et al., 2020). In the co-crystal structure with myotoxin I of Bothrops moojeni (Salvador et al., 2018) as well as myotoxin II of Bothrops asper (Murakami et al., 2005), suramin attaches to the so-called calcium binding loop. However, these phospholipases are catalytically inactive and their calcium binding loops harbor mutations that prevent Ca 2+ from binding. Therefore, it remains to be resolved whether suramin binding is a consequence of these mutations, or whether suramin would bind also to functional calcium binding loops. Therefore, co-crystal structures of suramin with proteins that contain functional binding sites for divalent cations will be necessary to understand the polypharmacology of suramin. Elucidation of the role of divalent cations in the mode of action of suramin may be key towards designing new and more selective inhibitors.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.