# WEAK INTERACTIONS IN MOLECULAR MACHINERY

EDITED BY : Irene Díaz-Moreno and Rivka Isaacson PUBLISHED IN : Frontiers in Molecular Biosciences

#### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88945-787-8 DOI 10.3389/978-2-88945-787-8

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# WEAK INTERACTIONS IN MOLECULAR MACHINERY

Topic Editors: Irene Díaz-Moreno, University of Seville, Spain Rivka Isaacson, King's College London, United Kingdom

Diverse cellular processes depend on weak interactions between biological components. In this volume, we bring together a wealth of recent information on this topic, combining original research articles with up-to-date reviews, organized under four separate themes. In our first section on nucleic acid regulation, we include a study of the PII-NAGK-PipX-NtcA regulatory axis of cyanobacteria and discover much about the role of RNA binding protein regulation and cross-talk in the control of AU-rich mRNA. Peptide-mediated weak interactions are our second theme in which we review weak molecular interactions in clathrin-mediated endocytosis, investigate the selectivity of the G7-18NATE inhibitor peptide for the Grb7-SH2 domain target and present new structure and interactions of the TPR domain of Sgt2 with yeast chaperones and Ybr137wp. Our third part focuses on carbohydrates and includes a thorough review of how to use NMR to study transient carbohydrate–protein binding and a structural and functional study of lysostaphin–substrate interaction. In our final section, we look at functional sensors driving weak interactions by presenting the molecular basis of the dual regulation of bacterial iron sulfur cluster biogenesis by CyaY and IscX alongside a review of intramolecular fuzzy interactions involving intrinsically disordered domains. Taken together, our eBook chapters offer some recent insight into this area of scientific understanding which is still expanding exponentially.

Citation: Díaz-Moreno, I., Isaacson, R., eds. (2019). Weak Interactions in Molecular Machinery. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-787-8

# Table of Contents

*05 Editorial: Weak Interactions in Molecular Machinery* Rivka L. Isaacson and Irene Díaz-Moreno

#### SECTION 1

#### DNA AND RNA REGULATION


Sofía M. García-Mauriño, Francisco Rivero-Rodríguez, Alejandro Velázquez-Cruz, Marian Hernández-Vellisca, Antonio Díaz-Quintana, Miguel A. De la Rosa and Irene Díaz-Moreno

#### SECTION 2

### PEPTIDE-MEDIATED WEAK INTERACTIONS

*34 Insight Into the Selectivity of the G7-18NATE Inhibitor Peptide for the Grb7-SH2 Domain Target*

Gabrielle M. Watson, William A. H. Lucas, Menachem J. Gunzburg and Jacqueline A. Wilce

*42 Structure and Interactions of the TPR Domain of Sgt2 With Yeast Chaperones and Ybr137wp*

Ewelina M. Krysztofinska, Nicola J. Evans, Arjun Thapaliya, James W. Murray, Rhodri M. L. Morgan, Santiago Martinez-Lumbreras and Rivka L. Isaacson

*53 Weak Molecular Interactions in Clathrin-Mediated Endocytosis* Sarah M. Smith, Michael Baker, Mary Halebian and Corinne J. Smith

#### SECTION 3

#### TARGETING CARBOHYDRATES AND GLYCANS


### SECTION 4

#### FUNCTIONAL SENSORS DRIVING WEAK INTERACTIONS

#### *85 The Molecular Bases of the Dual Regulation of Bacterial Iron Sulfur Cluster Biogenesis by CyaY and IscX*

Salvatore Adinolfi, Rita Puglisi, Jason C. Crack, Clara Iannuzzi, Fabrizio Dal Piaz, Petr V. Konarev, Dmitri I. Svergun, Stephen Martin, Nick E. Le Brun and Annalisa Pastore

*97 Intramolecular Fuzzy Interactions Involving Intrinsically Disordered Domains*

Miguel Arbesú, Guillermo Iruela, Héctor Fuentes, João M. C. Teixeira and Miquel Pons

# Editorial: Weak Interactions in Molecular Machinery

#### Rivka L. Isaacson<sup>1</sup> and Irene Díaz-Moreno<sup>2</sup>

<sup>1</sup> Department of Chemistry, King's College London, London, United Kingdom, <sup>2</sup> cicCartuja, Institute for Chemical Research (IIQ), University of Seville - CSIC, Seville, Spain

\*

Keywords: transient interactions, molecular machineries, structural biology, biophysics, weak contacts

#### **Editorial on the Research Topic**

#### **Weak Interactions in Molecular Machinery**

Individuals are often sustained by intense relationships with others, although few would deny the importance of those more transient interactions with service providers, colleagues, and casual acquaintances in facilitating our continued existence. These latter associations are necessarily and conveniently weaker. Nobody has time for a deep conversation with everyone they encounter during a day, and society, like so many complex systems, runs on a hierarchy of interaction strengths. Similarly, inside each crowded cell of our bodies, functions arise when communication is established through biomolecules that physically and specifically contact each other in a concerted manner to transmit messages efficiently.

"Molecular sociology" within cells involves binding events between macromolecules, such as protein–protein, protein–nucleic acid, protein–carbohydrate, and protein–membrane interactions, that are acutely orchestrated to sustain the life–death balance. These interactions between biomolecules occur on a wide range of timescales. Stable complexes, with lifetimes ranging from minutes to days, involve high affinity and high specificity binding. Amongst many others, these include irreversible enzyme inhibition and the assembly of proteins supporting the cell's ultrastructure.

#### Weak complexes are characterized by a fine balance between specificity of binding and fast turnover rate, with the majority displaying equilibrium dissociation constants within the micromolar or even millimolar range. Perhaps counter-intuitively, these weaker molecular recognition mechanisms are not uncommon and play key roles in many biological processes, including electron transfer chains in respiration and photosynthesis and the cell signaling cascades involving kinases and phosphatases. Due to technical limitations, weak complexes remain poorly understood despite their critical role in many biological events inside the cell.

Over the last few years, specific tools have been developed to analyse more transient intermolecular interactions, including NMR paramagnetic relaxation enhancement and kinetic approaches. These methods have recently been complemented by high-throughput techniques to identify novel biomolecules that weakly bind to each other. Such advances make the analysis of the transient biointeractome (the so-called trans-biointeractome) more affordable. Overall, trans-biointeractome analysis remains highly dependent on the specific technique employed. In addition, the dynamics of the system can enhance the strength of interactions that are weak. Similarly, "one person may not be able to pull a truck from the mud but many would be able to." This may also be the reason that interactions detected under controlled conditions in vivo become extremely difficult to study in isolation. Protein compartmentalization, elevated protein levels at defined moments of the cell cycle, multivalent binding, can increase (temporarily) the affinities of transient interactions, suggesting that the so-called trans-biointeractome is inherently plastic and that it evolves during cell lifespan.

#### Edited by:

Anastasia S. Politou, University of Ioannina, Greece

#### Reviewed by:

Konstantinos Tripsianes, Masaryk University, Czechia

> \*Correspondence: Irene Díaz-Moreno idiazmoreno@us.es

#### Specialty section:

This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences

Received: 26 November 2018 Accepted: 22 December 2018 Published: 23 January 2019

#### Citation:

Isaacson RL and Díaz-Moreno I (2019) Editorial: Weak Interactions in Molecular Machinery. Front. Mol. Biosci. 5:117. doi: 10.3389/fmolb.2018.00117

**5**

Therefore, the study of weakly interacting systems can be reliably tackled in depth only by an integrative and holistic approach combining in vitro strategies with other methods to detect such interactions within the cell. We are delighted by the breadth of techniques and systems covered in the articles of our research topic.

Unsurprisingly, given its well-documented strengths in measuring weak binding, NMR spectroscopy features heavily as a method used in this Special Issue in Frontiers in Molecular Biosciences.

Nieto provides a comprehensive mini review on using NMR to investigate binding between carbohydrates and proteins, which is particularly useful given the challenges of isotopically labeling glycans. In their original research article, the group headed by Krysztofinska et al. use NMR alongside X-ray crystallography and isothermal titration calorimetry (ITC) to analyse the structure and binding of different heat shock chaperones, via a carboxylate clamp mechanism, to a cochaperone involved in targeting proteins to membranes. Multiple proteins competing for the same binding site, as analyzed using NMR, is a theme that continues in the research article by Adinolfi et al. who also employ small-angle X-ray scattering (SAXS), biolayer interferometry (BLI), and native mass-spectrometry (MS) to dissect the dual regulation of bacterial Iron Sulfur cluster biogenesis by two proteins, CyaY and IscX. Tossavainen et al. also use NMR and SAXS, alongside Molecular Dynamics simulations to explore the variable affinity interactions between Staphylococcus aureus cell-wall digester, lysostaphin, and its substrates.

Surface plasmon resonance (SPR) is another widely used technique to measure interactions, and along with microarray technology, is employed by Watson et al. to establish the efficacy of a peptide inhibitor in blocking an important weak interaction between an SH2 domain and its phosphotyrosine target. This is a step on the way to rational drug design for cancer.

The importance and unique attributes of intrinsically disordered proteins (IDPs) are becoming increasingly recognized, acknowledged and valued. Arbesú et al. feed into this theme by presenting a fascinating mini review on the intramolecular "fuzzy" interactions that occur within intrinsically disordered domains of proteins. This mini-review perfectly links with the Frontiers topic on Function and Flexibility: Friend or Foe?

Perhaps the most self-contained story in our collection, that truly conveys the variety of weak interactions in a well-characterized pathway, is the review from the group of Forcada-Nadal et al. on the PII-NAGK-PipX-NtcA regulatory axis of cyanobacteria, which beautifully compiles the information that led to an understanding of this network. Clathrin-mediated endocytosis is an elegant process as delineated in a review from. Smith et al. who examine the vital roles of weak interactions in this choreographed cellular system. On the other hand, it becomes more and more evident that there is a need for a transition from a static to a dynamic point of view in order to take into account the biological environment during RNA binding and RNA metabolism, as reviewed by the team of García-Mauriño et al. Actually, the understanding of the mRNAs processing in highly dynamic and often transient macromolecular complexes also remains challenging.

The subjects covered in our research topic span a wide range of biological questions and methodology. We believe this is merely the tip of the iceberg and that the future will bring vast insight into the importance of weak interactions in molecular machinery.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Isaacson and Díaz-Moreno. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## The PII-NAGK-PipX-NtcA Regulatory Axis of Cyanobacteria: A Tale of Changing Partners, Allosteric Effectors and Non-covalent Interactions

Alicia Forcada-Nadal 1,2, José Luis Llácer 1,3, Asunción Contreras <sup>2</sup> , Clara Marco-Marín1,3 and Vicente Rubio1,3 \*

1 Instituto de Biomedicina de Valencia del Consejo Superior de Investigaciones Científicas, Valencia, Spain, <sup>2</sup> Departamento de Fisiología, Genética y Microbiología, Universidad de Alicante, Alicante, Spain, <sup>3</sup> Group 739, Centro de Investigación Biomédica en Red de Enfermedades Raras – Instituto de Salud Carlos III, Valencia, Spain

#### Edited by:

Irene Diaz-Moreno, Universidad de Sevilla, Spain

#### Reviewed by:

Karl Forchhammer, Universität Tübingen, Germany Juan A. Hermoso, Consejo Superior de Investigaciones Científicas (CSIC), Spain

> \*Correspondence: Vicente Rubio rubio@ibv.csic.es

#### Specialty section:

This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences

Received: 26 August 2018 Accepted: 18 October 2018 Published: 13 November 2018

#### Citation:

Forcada-Nadal A, Llácer JL, Contreras A, Marco-Marín C and Rubio V (2018) The PII-NAGK-PipX-NtcA Regulatory Axis of Cyanobacteria: A Tale of Changing Partners, Allosteric Effectors and Non-covalent Interactions Front. Mol. Biosci. 5:91. doi: 10.3389/fmolb.2018.00091 PII, a homotrimeric very ancient and highly widespread (bacteria, archaea, plants) key sensor-transducer protein, conveys signals of abundance or poorness of carbon, energy and usable nitrogen, converting these signals into changes in the activities of channels, enzymes, or of gene expression. PII sensing is mediated by the PII allosteric effectors ATP, ADP (and, in some organisms, AMP), 2-oxoglutarate (2OG; it reflects carbon abundance and nitrogen scarcity) and, in many plants, L-glutamine. Cyanobacteria have been crucial for clarification of the structural bases of PII function and regulation. They are the subject of this review because the information gathered on them provides an overall structure-based view of a PII regulatory network. Studies on these organisms yielded a first structure of a PII complex with an enzyme, (N-acetyl-Lglutamate kinase, NAGK), deciphering how PII can cause enzyme activation, and how it promotes nitrogen stockpiling as arginine in cyanobacteria and plants. They have also revealed the first clear-cut mechanism by which PII can control gene expression. A small adaptor protein, PipX, is sequestered by PII when nitrogen is abundant and is released when is scarce, swapping partner by binding to the 2OG-activated transcriptional regulator NtcA, co-activating it. The structures of PII-NAGK, PII-PipX, PipX alone, of NtcA in inactive and 2OG-activated forms and as NtcA-2OG-PipX complex, explain structurally PII regulatory functions and reveal the changing shapes and interactions of the T-loops of PII depending on the partner and on the allosteric effectors bound to PII. Cyanobacterial studies have also revealed that in the PII-PipX complex PipX binds an additional transcriptional factor, PlmA, thus possibly expanding PipX roles beyond NtcA-dependency. Further exploration of these roles has revealed a functional interaction of PipX with PipY, a pyridoxal-phosphate (PLP) protein involved in PLP homeostasis whose mutations in the human ortholog cause epilepsy. Knowledge of cellular levels of the different components

**7**

of this PII-PipX regulatory network and of K<sup>D</sup> values for some of the complexes provides the basic background for gross modeling of the system at high and low nitrogen abundance. The cyanobacterial network can guide searches for analogous components in other organisms, particularly of PipX functional analogs.

Keywords: protein structure, nitrogen regulation, gene expression regulation, signaling, PII complexes, PipX complexes, NtcA structure and complexes, PlmA

Protein PII was discovered in the late sixties of last century (Stadtman, 2001), when Escherichia coli glutamine synthetase (GS) was found to exist in feed-back inhibition susceptible or refractory forms depending on the adenylylation state of one tyrosine per GS subunit. P<sup>I</sup> and PII were the first and second peaks from a gel filtration column (Shapiro, 1969). P<sup>I</sup> is a bifunctional enzyme (ATase) that adenylylates or deadenylylates GS (Jiang et al., 2007). PII controls the activity of the ATase. We now know that PII proteins are highly conserved and very widespread sensors used to transduce energy/carbon/nitrogen abundance signals in all domains of life (Kinch and Grishin, 2002; Sant'Anna et al., 2009). They are found in archaea, bacteria (Gram+ and Gram−), unicellular algae and plants. Many organisms have two or more genes for PII proteins (reviewed in Forchhammer and Lüddecke, 2016), as E. coli, that has two paralogous genes encoding PII proteins with distinct functions, one (GlnB) involved in the control of GS, and the other one (GlnK) being involved in the regulation of ammonia entry into the cell. By binding to target proteins, including channels, enzymes, or molecules involved in gene regulation and by altering the function of these target molecules, PII proteins can regulate ammonia entry, nitrogen metabolism and gene expression (Forchhammer, 2008; Llácer et al., 2008). Cyanobacteria, and particularly among them Synechococcus elongatus PCC 7942 (hereafter S. elongatus), have been and continue to be very useful organisms for studies of PII actions, fuelling structural understanding of PII regulation. Studies on these organisms exemplify very clearly how enzyme activity and gene regulation can be controlled by PII via formation of several complexes (summarized in **Figure 1**) mediated by weak intermolecular interactions that are crucially regulated by allosteric effectors of the proteins involved in these complexes. This is the focus of the present review.

## THE PII SIGNALING PROTEIN

S. elongatus PII (**Figure 2**), as other PII proteins, is a homotrimer of a polypeptide chain of 112 amino acids that exhibits the ferredoxin fold (βαβ)<sup>2</sup> followed by a beta hairpin (Xu et al., 2003). The trimer (**Figure 2A**) has a hemispheric body nucleated by three antiparallel oblique (relative to the three-fold axis) β-sheets, each one formed by the 4-stranded sheet (topology ↓β2↑β3↓β1↑β4) of a subunit (see for example subunit B in the central panel of **Figure 2A**) extended on its β4 end by the Cterminal hairpin (β5-β6) of an adjacent subunit (subunit A) and on the β2 end by the β2-β3 hairpin stem (the root of the Tloop, see below) of the other subunit of the trimer (subunit C). The three sheets become continuous on the flat face of the hemispheric body via their β2-β3 hairpins (**Figure 2A**). The subunit sheets encircle like a 3-sided pyramid the three-fold axis, filling the inner space between them with their side-chains. They are covered externally by 6 helices (two per subunit) that run parallel to the β strands, contributing to the rounded shape of the hemispheric trimer (**Figure 2A**, panel to the right) and to the outer part of its equatorial flat face. In the convex face, three crevices are formed at subunits junctions between adjacent βsheets, over the β2-β3 hairpins (**Figure 2B**). These crevices host the sites for the allosteric effectors ATP/ADP [and in some species AMP (Palanca et al., 2014)] and 2-oxoglutarate (2OG) that endow PII with its sensing roles (Kamberov et al., 1995; Zeth et al., 2014) (**Figure 2B**), the nucleotides reflecting the energy status (Fokina et al., 2011) and 2OG reflecting the abundance of carbon and, inversely, the nitrogen richness (see for example Muro-Pastor et al., 2001).

Very salient structural features of PII are the long flexible Tloops (**Figures 2A,C**) formed by the 18 residues that tip the β2-β3 hairpin of each subunit (Xu et al., 2003). These loops are key elements (although not the exclusive ones, see Rajendran et al., 2011 and Schumacher et al., 2015) for PII interaction with its targets (Conroy et al., 2007; Gruswitz et al., 2007; Llácer et al., 2007, 2010; Mizuno et al., 2007; Zhao et al., 2010b; Chellamuthu et al., 2014). By binding at the boundary between the T-loop and the PII body, at the crevice formed between adjacent subunits, the adenine nucleotides and MgATP/2OG promote the adoption by the T-loop of different conformations (**Figure 2C**) (Fokina et al.,

**Abbreviations:** PII, a homotrimeric signaling protein; GlnK, GlnK3 and GlnB, different paralogous forms of PII proteins; GlnD, the bifunctional enzyme that uridylylates and deuridylylates GlnB in E. coli; GS, glutamine synthetase; AmtB and Amt, homologous trimeric bacterial transporters of ammonia; PamA, putative channel of unknown function that is encoded by sll0985 of Synechocystis sp. PCC 6803; NtcA and CRP, homologous homodimeric transcription factors of cyanobacteria and of E. coli, respectively; the imperfectly palindromic target DNA sequences to which they bind specifically are called NtcA box and CRP box, respectively; PlmA, putative homodimeric transcription factor of the GntR family that is found in cyanobacteria; P<sup>I</sup> or ATase, bifunctional enzyme that adenylylates and deadenylylates glutamine synthetase in E. coli and other enterobacteria; NAGK, N-acetyl-L-glutamate kinase; NAG, N-acetyl-L-glutamate; PipX, a small monomeric protein of cyanobacteria that can interact with PII and with NtcA; FRET, fluorescence resonance energy transfer (also known as Förster resonance energy transfer), a phenomenon in which a fluorophore emits light of its characteristic frequency when a nearby different absorbing group is excited by

light; the FRET signal decreases with the 6th power of the distance between the absorbing and emitting groups; 2OG, 2-oxoglutarate, also called α-ketoglutarate; AcCoA carboxylase, acetyl coenzyme A carboxylase; BCCP, biotin carboxyl carrier protein, the protein subunit that hosts the covalently bound biotin in many bacterial AcCoA carboxylases; PLP, pyridoxal phosphate; PipY, product of the gene that in S. elongatus is the next downstream of pipX, forming a bicistronic operon with it; it is a PLP-containing protein.

FIGURE 1 | Summary of the PII-PipX-NtcA network of S. elongatus. The network illustrates its different elements and complexes depending on nitrogen abundance (inversely related to 2OG level) and the structures of the macromolecules and complexes formed (when known). For PlmA (dimer in darker and lighter blue hues for its dimerization and DNA-binding domains, respectively) and its complex the architectural coarse model proposed (Labella et al., 2016) is shown, with the C-terminal helices of PipX (schematized in the extended conformation) pink-colored and the two PII molecules in dark red. The DNA complexed with NtcA and with NtcA-PipX is modeled from the structure of DNA-CRP (Llácer et al., 2010), since no DNA-NtcA structure has been reported. BCCP, biotin carboxyl carrier protein of bacterial acetyl CoA carboxylase (abbreviated AcCoA carboxylase); the other two components of this enzyme, biotin carboxylase and carboxyl transferase are abbreviated BC and CT, respectively. No structural model of BCC has been shown because the structure of this component has not been determined in S. elongatus and also because the structures of this protein from other bacteria lack a disordered 77-residue N-terminal portion that could be highly relevant for interaction with PII. The yellow broken arrow highlights the possibility of further PipX interactions not mediated by NtcA or PII-PlmA resulting in changes in gene expression (Espinosa et al., 2014). The solid semi-transparent yellowish arrow emerging perpendicularly from the flat network symbolizes the possibility of functional interactions of PipX not mediated by physical contacts between the macromolecules involved in the interaction, giving as an example the functional interaction with PipY. Its position outside the network tries to express the different type of interaction (relative to the physical contacts shown in the remainder of the network) as well as to place it outside the field of 2OG concentrations.

2010a; Truan et al., 2010; Maier et al., 2011; Zeth et al., 2014) that favor or disfavor PII binding to a given PII target.

The T-loop also is the target of regulatory post-translational modification (reviewed in Merrick, 2015), first recognized in the regulatory cascade of the GS of E. coli as uridylylation of Tyr51 (see **Figure 2C**, 3rd panel from the left) mediated by a glutamineregulated bifunctional PII uridylylating-deuridylylating enzyme, GlnD (Stadtman, 2001). Thus, in the enterobacterial GS regulating cascade PII is uridylylated or deuridylylated depending on whether 2-OG is abundant and L-glutamine is low or the reverse. PII-UMP activates the GS deadenylylating activity of ATase (Jiang et al., 2007), activating GS by decreasing its susceptibility to feed-back inhibition (Stadtman, 2001). This uridylylation (or in Actinobacteria adenylylation of Tyr51) occurs at least in proteobacteria and actinobacteria (Merrick, 2015), but it might be more widespread, since it has also been reported in an archaeon (Pedro-Roig et al., 2013). Structural studies with E. coli PII (Palanca and Rubio, 2017) have excluded the stabilization of the T-loop into a fixed conformation by Tyr51 uridylylation, suggesting that the Tyr51 bound UMP physically interacts with the ATase. Although Tyr51 is conserved in cyanobacteria, it is not uridylylated. The Tloop serine 49 (**Figure 2C**, 1st, 2nd and 4th panels from the left) is phosphorylated in S. elongatus under conditions of nitrogen starvation by an unknown mechanism (Forchhammer and Tandeau de Marsac, 1994), whereas the phosphatase that dephosphorylates phosphoSer49 has been identified and proven to be 2OG-sensitive (Irmler et al., 1997).

complex with AmtB (PDB 2NUU, Conroy et al., 2007); the rightmost panels show Chlamydomonas reinhardtii PII, taken from its complex with Arabidopsis thaliana NAGK (PDB 4USJ; Chellamuthu et al., 2014). (C) Illustration of different shapes of the T-loops found in distinct complexes with allosteric effectors or with partner proteins. The T-loop is shown in cartoon representation, within a semi-transparent surface representation as if this loop were isolated from the remainder of PII and from the protein partner in the complex. In the third panel, the side chain of Arg47 of E. coli GlnK is represented in sticks, given its importance for inhibiting the AmtB channel. Taken, from left to right, from: S. elongatus PII with MgATP and 2OG bound (PDB file 2XZW; Fokina et al., 2010a); S. elongatus PII-PipX (PDB 2XG8; Llácer et al., 2010); E. coli GlnK-AmtB complex (PDB 2NUU, Conroy et al., 2007); and PII-NAGK complex (PDB 2V5H; Llácer et al., 2007).

Forcada-Nadal et al. PII-NAGK-PipX-NtcA Regulatory Network

FRET studies with engineered fluorescent S.elongatus PII used as an ADP and ATP-sensitive probe (Lüddecke and Forchhammer, 2015) have challenged the claim (Radchenko et al., 2013) that PII proteins have a very slow ATPase activity that would regulate PII similarly as the signaling GTPases with bound GTP and GDP. Although this ATPase was reported as a 2OG-triggered switch that appeared an intrinsic trait of PII proteins (Radchenko et al., 2013), the FRET experiments with S. elongatus PII (Lüddecke and Forchhammer, 2015) appear to indicate that an endogenous ATPase is not a relevant mechanism for the transition of PII into the ADP state.

## PII COMPLEXES WITH CHANNELS

The first structurally solved PII complex was the one of E. coli GlnK with the AmtB ammonia channel (Conroy et al., 2007; Gruswitz et al., 2007) (**Figure 3A**) formed under nitrogen richness conditions. This structure showed that AmtB was inhibited by GlnK because the extended T-loop fits the channel entry, with the insertion into the channel of a totally extended arginine emerging from the T-loop and blocking the channel space (**Figure 2C**, 3rd panel from the left, and **Figure 3A**, zoom). The ADP-bound and MgATP/2OG-bound structures of an Archeoglobus fulgidus GlnK protein (Maier et al., 2011) indicated that 2OG may prevent GlnK binding to the ammonia channel because of induced flexing outwards (relative to the 3-fold molecular axis) of the T-loops, preventing their topographical correspondence with the three holes of the trimer of ammonia channels (**Figure 3B**). Interestingly, the T-loops of MgATP/2OG-bound S. elongatus PII (**Figure 2C**, leftmost panel) and A. fulgidus GlnK3 (**Figure 3B**) exhibited different flexed conformations (relative to the ADP-bound extended forms), and thus 2OG-binding by itself does not determine a single T-loop conformation, at least with different PII proteins.

Yeast two hybrid approaches (Osanai et al., 2005) detected the interaction between PII and the putative channel PamA (encoded by sll0985) of Synechocystis sp. PCC 6803 (from now on Synechocystis), but molecular detail on this protein is non-existent, and, therefore, it is uncertain whether such interaction might resemble the GlnK-AmtB interaction. PamA is not conserved in many cyanobacteria, and the most closely related putative protein of S. elongatus, the product of the Synpcc7942\_0610 gene, failed to give interaction signal with S. elongatus PII in yeast two hybrid assays (Castells, M.A., PhD Dissertation, Universidad de Alicante, 2010), despite the fact that the sequence identity with PamA concentrated in the C-terminal region, where PII binds in Synechocystis (Osanai et al., 2005). In vitro studies with the recombinantly produced Synechococystis PamA and PII showed that their interaction was lost in the presence of ATP and 2OG. Thus, similarly to the GlnK-AmtB and GlnK3-Amt complexes, the PII-PamA complex is formed under conditions of nitrogen abundance. However, T-loop phosphorylation did not dissociate this complex (Osanai et al., 2005). The function of PamA is not known, but its deletion from Synechocystis changed the expression of

FIGURE 3 | PII proteins and the ammonia channel. (A) The structure (PDB file 2NUU; Conroy et al., 2007) of the E. coli complex of GlnK (a PII protein in charge of ammonia channel regulation) and the ammonia channel AmtB is shown to the right, whereas the zoom to the lower left shows only a part of the complex, to highlight the interaction of one T-loop with one channel. AmtB is in semi-transparent surface representation. GlnK is in the main figure in cartoon representation with each subunit colored differently, with the side-chain of the T-loop residue Arg47 shown in sticks representation. In the zoomed image GlnK is shown in surface representation in yellow with the T-loop residues highlighted in space-filling representation, illustrating the fact that the side-chain of Arg47 is the element getting deep into the channel and blocking it. (B) Super-imposition of the structures of Archeoglobus fulgidus GlnK3 (one of the three PII proteins of the GlnK type in this archaeon; Maier et al., 2011) with ADP bound (green; PDB file code 3TA1) or with ATP and 2OG bound (yellow; PDB 3TA2) to illustrate how 2OG binding fixes the T-loops in an outwards-flexed position (relative to the positions without 2OG) that would be inappropriate for fitting the topography of the entry chambers to the three ammonia channels in trimeric Amt (the ammonia channel in this organism).

some NtcA-dependent genes (Osanai et al., 2005) by unclarified mechanisms.

### PII COMPLEXES WITH ENZYMES IN CYANOBACTERIA (AND BEYOND)

The complexes of PII with the N-acetyl-L-glutamate kinase (NAGK) enzymes from S. elongatus and Arabidopsis thaliana presented a very different architecture with respect to the structure of the GlnK-AmtB complex of E. coli (Llácer et al., 2007; Mizuno et al., 2007) (**Figures 4A,B**). The PII-NAGK complex is an activating complex in which the T-loops of PII are flexed (**Figure 2C**, rightmost panel) and integrated into a

FIGURE 4 | PII-NAGK complex and active and arginine-inhibited NAGK. (A) The PII-NAGK complex of S. elongatus (PDB 2V5H; Llácer et al., 2010). Surface representations of the complex formed by two PII trimers (yellow) capping on both ends the doughnut-like NAGK hexamer (trimer of dimers; each dimer in a different color). The three-fold axis is vertical (top) or perpendicular to the page (bottom). Figure of J.L. Llácer and V. Rubio taken from Chin (2008). Reprinted with permission from AAAS (B). Cartoon representation of the S. elongatus PII-NAGK complex after removing the back NAGK dimer for clarity. The three-fold symmetry axis is vertical. Reprinted from Current Opinion in Structural Biology, 18, Llácer et al., Arginine and nitrogen storage, 673–681, 2008, with permission from Elsevier. (C) PII subunit-NAGK subunit contacts. PII, NAGK, and NAG are shown as strings, ribbons, and spheres, respectively. The contacting parts of the T-loop, B-loop, and β1–α1 connection, including some interacting side chains (in sticks), are blue, red, and green, respectively. The surfaces provided by these elements form meshworks of the same colors. The NAGK central β-sheet is green, and other β-strands and the α-helices are brownish and grayish for N- and C-domains, respectively. Some NAGK elements and PII residues are labeled. This figure and its legend reproduce with some modifications a figure and its legend of Llácer et al. (2007). The crystal structure of the complex of PII and acetylglutamate kinase reveals how PII controls the storage of nitrogen as arginine. Copyright (2007) National Academy of Sciences. (D,E), active and inactive conformations, respectively, of hexameric arginine-inhibitable NAGK. The active form is from a crystal of the enzyme from Pseudomonas aeruginosa (PDB 2BUF) while the inactive form is from the Thermotoga maritima enzyme (PDB 2BTY) (Ramón-Maiques et al., 2006). Note that the inactive form is widened relative to the active form, and that it has arginine sitting on both sides of each interdimeric junction. In the active form the nucleotide (in this case the product ADP rather than the substrate ATP) and NAG sit one in each domain of individual subunits. The NAGK observed in the PII-NAGK complex is in the active form, being stabilized in this form by its contacts with PII.

hybrid (both proteins involved) β-sheet with NAGK, forming also a hybrid ion-pair network (**Figure 4C**; Llácer et al., 2007). Apparently this flexing from an extended conformation could occur in two steps (Fokina et al., 2010b). The initial step would be mediated by a smaller loop of PII called the B-loop (**Figures 2A**, **4C**). PII binding of 2OG also favors the flexing of the T-loop (**Figure 2C**, leftmost panel) (Fokina et al., 2010a; Truan et al., 2010) although the resulting conformation appears inappropriate for interacting with NAGK. In addition, 2OG can also promote the disassembly of the PII-NAGK complex because certain PII residues like Arg9 that are involved in the binding of 2OG are also involved in the interaction with NAGK (and also with PipX, another target of PII, see below). Therefore, 2OG, an indicator of low ammonia levels (Muro-Pastor et al., 2001), abolishes PII-NAGK complex formation (Maheswaran et al., 2004) (**Figure 1**). In S. elongatus 2OG can also promote the disassembly of the PII-NAGK complex by favoring the phosphorylation of Ser49 (Forchhammer and Tandeau de Marsac, 1994; Irmler et al., 1997), since the bound phosphate sterically prevents formation of the PII-NAGK hybrid β-sheet (Llácer et al., 2007).

Plants and cyanobacteria stockpile ammonia as arginine, the protein amino acid with the largest nitrogen content (four atoms per arginine molecule). Arginine-rich proteins are very abundant in plant seeds (VanEtten et al., 1963). Cyanobacteria make non-ribosomally an arginine-rich amino acid polymer called cyanophycin (Oppermann-Sanio and Steinbüchel, 2002; Watzer and Forchhammer, 2018). The arginine stockpiling as arginine-rich macromolecules minimizes the osmotic effect while permitting rapid nitrogen mobilization for protein-building processes such as seed germination and cell multiplication. The selection of NAGK as the regulatory target stems from the fact that in many bacteria (including cyanobacteria) and in plants NAGK controls arginine synthesis via feed-back inhibition by L-arginine (Hoare and Hoare, 1966; Cunin et al., 1986; Lohmeier-Vogel et al., 2005; Beez et al., 2009). This inhibition must be overcome if large amounts of ammonia have to be stored as arginine (Llácer et al., 2008). Indeed, the PII-NAGK complex exhibits decreased inhibition by arginine (Maheswaran et al., 2004; Llácer et al., 2008).

In arginine-sensitive NAGK (**Figures 4D,E**) the N-terminal αhelix of each subunit interacts with the same helix of an adjacent dimer, chaining three NAGK homodimers into a doughnutshaped hexameric ring with three-fold symmetry and a central large hole (Ramón-Maiques et al., 2006). The NAGK reaction (phosphorylation of the γ-COOH of N-acetyl-L-glutamate, NAG, by ATP) occurs within each NAGK subunit. NAG and ATP sit over the C-edge of the central 8-stranded largely parallel β sheet of the N-terminal and C-terminal domains, respectively (**Figure 4D**; Ramón-Maiques et al., 2002). Catalysis requires the mutual approach of both domains of each subunit to allow the contact of the ATP terminal phosphate with the attacking NAG γ-COOH (Ramón-Maiques et al., 2002; Gil-Ortiz et al., 2003). Arginine, by binding in each subunit next to the N-terminal αhelix (**Figure 4E**), expands the hexameric ring hampering the contact of the reacting groups and preventing catalysis (Ramón-Maiques et al., 2006).

In the PII-NAGK complex two PII trimers sit on the threefold axis of the complex, one on each side of the NAGK ring, making contacts with the inner circumference of this ring (**Figures 4A,B**). Each PII subunit interacts via its T and B loops with each NAGK subunit (**Figure 4C**) gluing the two domains of this last subunit (**Figure 4A**). By restricting NAGK ring expansion (**Figure 4C**) even when arginine is bound, PII renders NAGK highly active (Llácer et al., 2007; Mizuno et al., 2007). PII does not compete physically with arginine for its sites on NAGK, simply these sites are widened in the PII-NAGK complex (Llácer et al., 2007), resulting in decreased apparent affinity of NAGK for arginine (as reflected in the dependency of the NAGK activity on the arginine concentration). In addition, the hybrid PII-NAGK ion pair network (**Figure 4C**) enhances the apparent affinity for NAG (assessed as the K<sup>m</sup> or S0.5 value of NAGK for NAG) of cyanobacterial NAGK (Maheswaran et al., 2004; Llácer et al., 2007). Overall, the NAGK bound to PII exhibits decreased apparent affinity for arginine and increased activity, rendering NAGK much more active in the presence of arginine than when not bound to PII (Llácer et al., 2008), something that is crucial for nitrogen storage as arginine.

NAGK appears to be a PII target only in organisms performing oxygenic photosynthesis (cyanobacteria, algae, and plants, Burillo et al., 2004). PII proteins from plants have lost the ability to bind ADP, while still binding ATP and 2OG (Lapina et al., 2018). In addition, except in Brassicae, the Cterminal part of plant PII is extended to form two helical segments and a connecting loop (Q-loop; **Figure 2B**, rightmost panels), creating a novel glutamine site, resulting in glutaminesensitivity of the PII-NAGK interaction (Chellamuthu et al., 2014). This is not the case with cyanobacterial PII, which binds both ADP and ATP and is glutamine-insensitive (Chellamuthu et al., 2014).

PII has also been shown to interact in plants (Feria-Bourrellier et al., 2010) and bacteria, including cyanobacteria (Rodrigues et al., 2014; Gerhardt et al., 2015; Hauf et al., 2016), with the biotin carboxyl carrier protein (BCCP) of the enzyme acetyl coenzyme A carboxylase (AcCoA carboxylase) (**Figure 1**), although this complex has not been characterized structurally. BCCP is the component that hosts the covalently bound biotin that shuttles between the biotin carboxylase component and the transferase component of AcCoA carboxylase (Rubio, 1986). PII-BCCP complex formation tunes down AcCoA utilization and thus subsequent fatty acid metabolism (Feria-Bourrellier et al., 2010; Gerhardt et al., 2015; Hauf et al., 2016), promoting uses of AcCoA for different purposes than the synthesis of fatty acids, and therefore linking PII to AcCoA and fatty acid metabolism. For interaction, PII has to be in the ATP-bound and 2OGfree form (**Figure 1**) (Gerhardt et al., 2015; Hauf et al., 2016), which are conditions at which PII also binds to NAGK (Llácer et al., 2008). Therefore, there could be in vivo simultaneous activation of NAGK and inhibition of AcCoA carboxylase by PII. Mutational evidence suggests the involvement of the T-loop in this interaction with AcCoA carboxylase (Hauf et al., 2016), in principle excluding PII-NAGK-BCCP ternary complex formation and raising the possibility of competition between NAGK and BCCP for PII.

The classical example of interaction of PII with an enzymatic target was with the ATase of E. coli (see introductory section), with which uridylylated or deuridylylated GlnB (GlnB is one of the two PII proteins of E. coli) can interact (Stadtman, 2001). We will not deal with this enzyme here because the PII/ATase/GS cascade of enterobacteria does not appear to have general occurrence, for example in cyanobacteria, and also because we have only partial information on the structure of the ATase (Xu et al., 2004, 2010) and no direct information on the structure of the GlnB-ATase complex, although a model for such complex has been proposed (Palanca and Rubio, 2017).

### THE PipX ADAPTOR PROTEIN AND ITS COMPLEX WITH PII

A yeast two hybrid search for proteins interacting with PII in S. elongatus identified (Burillo et al., 2004), in addition to NAGK, a small novel protein (89 amino acids) that was named PipX (PII-interacting protein X). This protein was identified later in a search (Espinosa et al., 2006) for proteins interacting with NtcA, the global nitrogen regulator of cyanobacteria (Vega-Palas et al., 1992). PipX binding to PII occurs under conditions of ammonia abundance (**Figure 1**), the same conditions prevailing for PII-NAGK complex formation (Espinosa et al., 2006). NAGK-PipX competition for PII was revealed in NAGK assays that showed that PipX decreased PII-activation and increased arginine inhibition of NAGK (Llácer et al., 2010), excluding NAGK-PII-PipX ternary complex formation. 2OG binding to PII disassembles the PII-PipX complex (Espinosa et al., 2006; Llácer et al., 2010), leaving PipX free to interact with NtcA (**Figure 1**).

The crystal structures of PII-PipX complexes of S. elongatus (Llácer et al., 2010) and of Anabaena sp. PCC7120 (Zhao et al., 2010b) provided the first structural information on PipX (**Figure 5A**), revealing that it is formed by a compact body folded as a Tudor-like domain (a horseshoe-curved β-sheet sandwich) (Lu and Wang, 2013), followed by two C-terminal helices. In the PII-PipX complex (**Figure 5B**) the three PipX molecules are enclosed in a cage formed between the flat face of the hemispheric PII trimer and its three fully extended T-loops (see **Figure 2C**, 2nd panel from the left) emerging perpendicularly to the PII flat face at its edge. The shape and orientation of these T-loops is very different relative to the PII bound to NAGK, **Figure 5C**). In turn, the caged Tudor-like domains form a homotrimer over the PII flat surface (**Figure 5B**, bottom), with the PipX self-interaction detected in yeast three-hybrid assays using PII as bridging protein (Llácer et al., 2010). Tudor-like domains characteristically interact with RNA polymerase (Steiner et al., 2002; Deaconescu et al., 2006; Shaw et al., 2008), suggesting that PipX could have some role in gene expression that would be blunted by sequestration of these domains in the PII cage.

In the structure of the Anabaena PII-PipX complex, the two C-terminal helices of each PipX molecule lie one along the other in antiparallel orientation ("flexed"), being exposed between two adjacent T-loops in transversal orientation relative to these loops (Zhao et al., 2010b). Recent structural NMR data on isolated PipX showed that when PipX is alone (that is, not bound to a partner) the C-terminal helices are "flexed" (**Figure 5A**) (Forcada-Nadal et al., 2017). As shown below, the C-terminal helices of PipX in the NtcA-PipX complex are also flexed (Llácer et al., 2010). However, in the S. elongatus PII-PipX complex only one PipX molecule presents the "flexed" conformation, whereas in the other two PipX molecules the C-terminal helix is "extended," not contacting the previous helix and emerging centrifugally outwards from the complex, between two T-loops (**Figure 5B**) (Llácer et al., 2010). PII binding might facilitate the extension of the PipX C-terminal helix, endowing the PII-PipX complex with a novel surface and novel potentialities for interaction with other components. These novel potentialities were substantiated recently by the identification, in yeast three-hybrid searches (Labella et al., 2016), of interactions of PipX in the PII-PipX complex with the homodimeric transcription factor PlmA (see proposal for the architecture of this complex in **Figure 1** bottom left; Labella et al., 2016). Interactions were not observed in yeast two-hybrid assays between PipX or PII and PlmA. Residues involved in three-hybrid interactions, mapped by site-directed mutagenesis, are largely localized in the C-terminal helix of PipX. PlmA belongs to the GntR super-family of transcriptional regulators, but is unique to cyanobacteria (Lee et al., 2003; Hoskisson and Rigali, 2009; Labella et al., 2016). Little is known about PlmA functions other that it is involved in plasmid maintenance in Anabaena sp. strain PCC7120 (Lee et al., 2003), in photosystem stoichiometry in Synechocystis sp. PCC6803 (Fujimori et al., 2005), in regulation of the highly conserved cyanobacterial sRNA YFR2 in marine picocyanobacteria (Lambrecht et al., 2018), and that it is reduced by thioredoxin, without altering its dimeric nature in Synechocystis sp. PCC6803 (Kujirai et al., 2018). The PII-PipX-PlmA ternary complex suggests that PipX can influence gene expression regulation via PlmA, although the PlmA regulon remains to be defined.

### THE GENE EXPRESSION REGULATOR NtcA

When ammonia becomes scarce the increasing 2OG levels should determine the disassembly of the PII-NAGK, PII-BCCP, and PII-PipX complexes (**Figure 1**). These same conditions promote the binding of PipX (see below) to the transcriptional regulator NtcA (**Figure 1**), an exclusive cyanobacterial factor of universal presence in this phylogenetic group (Vega-Palas et al., 1992; Herrero et al., 2001; Körner et al., 2003). The determination of the structures of NtcA from S. elongatus (**Figures 6A,B**) (Llácer et al., 2010) and from Anabaena sp. PCC7120 (Zhao et al., 2010a) confirmed the sequence-based inference (Vega-Palas et al., 1992) that NtcA is a homodimeric transcriptional regulators of the family of CRP (the cAMP-regulated transcriptional regulator of E. coli) (McKay and Steitz, 1981; Weber and Steitz, 1987). Similarly to CRP, NtcA has a C-terminal DNA binding domain of the helix-turn-helix type. In CRP, the DNA binding helices of its two C-terminal domains are inserted in two adjacent turns of the major groove of DNA that host the imperfectly palindromic target DNA sequence (called here the CRP box)

(McKay and Steitz, 1981; Weber and Steitz, 1987). The consensus DNA sequence to which NtcA binds (consensus NtcA box) is quite similar to the consensus CRP box (Berg and von Hippel, 1988; Luque et al., 1994; Jiang et al., 2000; Herrero et al., 2001; Omagari et al., 2008), and thus NtcA and CRP are expected to bind in similar ways to their target DNA sequences.

In vitro studies revealed that 2OG is an NtcA activator (Tanigawa et al., 2002; Vázquez-Bermúdez et al., 2002), increasing NtcA affinity for its target sequences. As in the case of cAMP for CRP, 2OG binds to the NtcA regulatory domain. This domain is responsible for the NtcA dimeric nature (**Figure 6A**) (Llácer et al., 2010; Zhao et al., 2010a). The regulatory domain of NtcA is highly similar to the corresponding domain of CRP (Llácer et al., 2010). The main differences reflect the changes in the characteristics of the site for the allosteric effector that enable the accommodation of 2OG instead of cAMP. Each 2OG molecule interacts in NtcA with the two (one per subunit) long interfacial helices that form the molecular backbone, crossing the molecule in its longer dimension, linking in each subunit both domains (**Figures 6A,C**) (Llácer et al., 2010; Zhao et al., 2010a). 2OG interactions with both interfacial helices favor a twist of one helix relative to the other, dragging the DNA binding domains and helices to apparently appropriate positions and interhelical distance for binding in two adjacent turns of the major groove of DNA where the NtcA box should be found (**Figures 6A,B**, and inset therein), although the experimental structure of

(bottom). (C) Superimposition of S. elongatus PII in the complex with PipX and in that with NAGK. The changes in the T-loops are very patent.

FIGURE 6 | NtcA structure, 2OG binding to it and associated conformational changes. (A,B), structures of "active" (A) and "inactive" (B) S. elongatus NtcA (PDB files 2XHK and 2XGX, respectively) (Llácer et al., 2010). The two subunits of each dimer are in different colors, with the DNA-binding domains in a lighter hue than the regulatory domain of the same subunit. In the cartoon representation used, helices are shown as cylinders to illustrate best the changes in position of the DNA binding helices and of the long interfacial helices (labeled) upon activation. Bound 2OG is shown in "active" NtcA (in spheres representation, with C and O atoms colored yellow and red, respectively). The DNA, in surface representation in white, has been modeled from the CRP-DNA structure (for details see Llácer et al., 2010). The inset superimposes the "active" and "inactive" forms colored as in the main figure to illustrate the magnitude of the changes. (C) Stereo view of sticks representation of the 2OG site residues of the "active" (green) and "inactive" (raspberry) forms of NtcA. The 2OG bound to the "active" form is distinguished by its yellow C atoms. Note that only two residues, both 2OG-interacting and highly polar, experience large changes in their positions between the inactive and the active forms: Arg128 from the long interfacial helix of the subunit that provides the bulk of the residues of the site, and Glu133 from the interfacial helix of the other subunit. They are believed to trigger the changes in the relations between the two interfacial helices that result in NtcA "activation".

DNA-bound NtcA should be determined to corroborate this proposals.

Although NtcA and CRP boxes are quite similar, plasmon resonance experiments (Forcada-Nadal et al., 2014) revealed that CRP exhibits complete selectivity and specificity for the CRP box, with absolute dependency on the presence of cAMP. In contrast, NtcA had less strict selectivity, since it still could bind to its promoters in the absence of 2OG, although with reduced affinity, and it could also bind to the the CRP promoter tested. Nevertheless, it is unlikely that NtcA could bind in vivo to the CRP boxes of cyanobacteria in those species where CRP is also present, given the much higher affinities for the CRP sites of cyanobacterial CRP and the relative concentrations in the cell of both transcriptional regulators (Forcada-Nadal et al., 2014).

While the structures of 2OG-bound NtcA of S. elongatus (Llácer et al., 2010) and of Anabaena (Zhao et al., 2010a) are virtually identical, the reported structures of "inactive" NtcA of Anabaena without 2OG (Zhao et al., 2010a) and of S. elongatus (Llácer et al., 2010) differed quite importantly in the positioning of the DNA binding domains (**Figure 1**, "Inactive forms" under "NtcA"), although in both cases the DNA binding helices were misplaced for properly accommodating the NtcA box of DNA, raising the question of whether these structural differences are species-specific or whether "inactive" NtcA can be in a multiplicity of conformations.

## PipX AS AN NtcA CO-ACTIVATOR

Soon after PipX was found to interact with NtcA (Espinosa et al., 2006), it was also shown to activate in vivo transcription of NtcA-dependent promoters under conditions of low nitrogen availability (Espinosa et al., 2006, 2007). Direct binding studies with the isolated molecules proved that PipX binding to NtcA requires 2OG (Espinosa et al., 2006). Nevertheless, as PipX was not totally essential for transcription of NtcA-dependent promoters (Espinosa et al., 2007; Camargo et al., 2014), it was concluded (1) that PipX was a coactivator of 2OG-activated NtcA-mediated transcription (Espinosa et al., 2006; Llácer et al., 2010); and (2) that the degree of activation by PipX depended on the specific NtcA-dependent promoter (Espinosa et al., 2007; Forcada-Nadal et al., 2014). Detailed plasmon resonance studies (Forcada-Nadal et al., 2014) using sensorchip-bound DNA confirmed for three Synechocystis promoters that PipX binding to promoter-bound NtcA has an absolute requirement for 2OG, since no PipX binding was observed when NtcA was bound to the DNA in absence of 2OG. In these studies PipX increased about one order of magnitude the apparent affinity of NtcA for 2OG. In other in vitro experiments with four NtcAdependent promoters of Anabaena sp. PCC 7120, PipX was also found to positively affect NtcA binding to its DNA sites (Camargo et al., 2014). The induction by PipX of increased NtcA affinity for 2OG and for its promoters could account for the PipX-triggered enhancement of NtcA-dependent transcription.

The crystal structure of the NtcA-PipX complex of S. elongatus (**Figure 7**) (Llácer et al., 2010) corresponded to one "active" NtcA dimer with one molecule of each 2OG and PipX bound to each subunit. PipX is inserted via its Tudor-like domain (**Figure 7A**), filling a crater-like cavity formed over each NtcA subunit (**Figure 7B**) largely over one regulatory domain, being limited between the DNA binding domain and the long interfacial helix of the same subunit, and the regulatory domain of the other subunit. PipX extensively interacts with the entire crater, with nearly 1200 Å<sup>2</sup> of NtcA surface covered by each PipX molecule, of which 65% belongs to one subunit (40%, 15% and 10% belonging to the DNA-binding domain, the interfacial helix and the regulatory domain, respectively) and 35% belongs to the regulatory domain (including the interfacial helix) of the other subunit, gluing together the elements of half of the NtcA dimer in its active conformation, stabilizing this conformation (Llácer et al., 2010). This conformation is the one that binds 2OG and that should have high affinity for the DNA, thus explaining the requirement of 2OG for PipX binding and the increased affinities of NtcA for 2OG and DNA when PipX was bound to NtcA (Forcada-Nadal et al., 2014). Since a similar crater-like cavity exists in other transcription factors of the CRP family including CRP (see for example McKay and Steitz, 1981 or Weber and Steitz, 1987) it would be conceivable that PipX-mimicking proteins could exist for these other transcriptional regulators of the CRP family, although PipX cannot do such a role since it does not bind to CRP (Forcada-Nadal et al., 2014). Furthermore, a large set of highly specific contacts (Llácer et al., 2010) ensure the specificity of the binding of PipX to NtcA.

The elements of the Tudor-like domain that interact with NtcA are largely the same that interact with the flat surface of the hemispheric body of PII (many of them mediated by the upper layer of the Tudor-like β-sandwich, particularly strands β1 and β2), predicting total incompatibility for the simultaneous involvement of PipX in the NtcA and PII complexes (Llácer et al., 2010). While the Tudor-like domain monopolizes the contacts of PipX with NtcA, the C-terminal helices of PipX do not participate in these contacts and remain flexed, as in isolated PipX (Forcada-Nadal et al., 2017), protruding away from the

complex (**Figure 7A**). The coactivation functions of PipX for NtcA-mediated transcription could also involve these helices. However, in vitro experiments (Llácer et al., 2010; Camargo

counterpart in CRP and is involved in the interactions with PipX.

et al., 2014) and modeling (based on CRP) of DNA binding by the NtcA-PipX complex (**Figure 7A** and inset therein) (Llácer et al., 2010) did not support the idea of PipX binding to DNA.

(αCTD), to show that the C-terminal helices of PipX could reach this part of the polymerase. In this figure the C-terminal helix of NtcA is colored red because it has no

Alternatively, these helices could interact with RNA polymerase, particularly given the location, in the homologous CRP-DNA complex, of the binding site for the α-subunit of the C-terminal domain (αCTD) of RNA polymerase (**Figure 7C**; and discussed in Llácer et al., 2010). Further structures of PII-PipX bound to DNA alone or with at least some elements of the polymerase are needed to clarify these issues.

### THE PipX REGULATORY NETWORK IN QUANTITATIVE PERSPECTIVE

The gene encoding PII could not be deleted in S. elongatus unless the pipX gene was previously inactivated (Espinosa et al., 2009). Further studies led to the conclusion that decreasing the PII/PipX ratio results in lethality in S. elongatus, indicating that PipX sequestration into PII-PipX complexes is crucial for survival and implicating both proteins in the regulation of essential processes (Espinosa et al., 2010, 2018; Laichoubi et al., 2012). The ability of PII to prevent the toxicity of PipX suggests that PII acts as a PipX sink even under conditions in which the affinity for NtcA would be highest, supporting the idea that not all PipX effects are related to its role as NtcA co-activator. Mutational studies (Espinosa et al., 2009, 2010) and massive transcriptomic studies of S. elongatus mutants centered on PipX (Espinosa et al., 2014) also support the multifunctionality of PipX, stressing the need for additional studies, including the determination of PlmA functions and the search for further potential PipX-interacting proteins (**Figure 1**, broken yellow arrow).

Massive proteomic studies (Guerreiro et al., 2014) have estimated the number of chains of each protein of the PII-PipX network (**Figure 1**) in S. elongatus cells. The values obtained (**Table 1**) are corroborated by those obtained in focused western blot studies for some of these macromolecules (**Table 1**) (Labella et al., 2016, 2017). These quantitative data give an opportunity to evaluate the possible frequency of the different complexes and macromolecules of the PII-PipX-NtcA network (**Figure 1**) in one or another form (schematized in **Figure 8**). Of all the proteins mentioned here until now, PII is by far the most abundant in terms of polypeptide chains (**Table 1**). In comparison, the sum of all the chains of other known PII-binding proteins represents no more than 20% of the PII chains. Among these molecules is PipX, which only represents ∼10% of all the PII chains. This indicates that PII has the potential to sequester all the PipX that is present in the cell (**Figure 8**). In turn, PlmA could be fully trapped in the PII-PipX-PlmA complex if this complex has the 1:1:1 stoichiometry proposed for it (**Figure 1**) (Labella et al., 2016), since the number of PlmA chains only represent about 10% of the number of PipX chains. Thus, about 10 and 1% of the PII trimer could be as the respective PII-PipX and PII-PipX-PlmA complexes under nitrogen abundance conditions. In contrast, with nitrogen starvation all of the NtcA could be bound to PipX (**Figure 8**), given the ∼five-fold excess of PipX chains over NtcA chains (**Table 1**). Thus, assuming that under conditions at which 2OG and ATP reach high levels PII is totally unable to bind PipX, ∼80% of the PipX molecules could be free to interact with additional protein partners.



When given as percentages, the data are relative to the number of PII chains (given the value of 100).

<sup>a</sup>Data from massive proteomic study (Guerreiro et al., 2014). Percentages within parentheses are data based on immunoquantification in Western blots (Labella et al., 2016, 2017).

<sup>b</sup>Rounded to the closest integer or to the closest first decimal figure.

<sup>c</sup>Given for reference, since there is no evidence of physical interaction with any of the other proteins.

<sup>d</sup>The physical interaction of this putative channel with PII was found in Synechocystis sp. PCC6803, but the findings were not replicated in S. elongatus with the homologous product of gene Synpcc7942\_0610.

These inferences are consistent with the K<sup>D</sup> values for the PII-NAGK (Llácer et al., 2007) and PII-PipX complexes in the absence of 2OG (Llácer et al., 2010) and for the PipX-NtcA complex at high 2OG and ATP (Forcada-Nadal et al., 2014) (∼0.08, 7 and 0.09µM, respectively). For the estimated cellular levels of the different components (**Table 1**), assuming a cell volume of 10−<sup>12</sup> ml, virtually all the NAGK and ∼95% of the PipX could be PIIbound in the absence of 2OG, and ∼98% of the NtcA could be PipX-complexed in the presence of 2OG. However, the impacts of varying concentrations of 2OG on the disassembly or assembly of the complexes most likely differ for the various complexes. For example, a two-order of magnitude increase in the K<sup>D</sup> value for the PII-NAGK complex due to 2OG binding might have much less impact (a 7% decrease in the amount of NAGK bound to PII would be estimated from the mere total protein levels and K<sup>D</sup> value) than a two-order of magnitude increase in the K<sup>D</sup> for the PII-PipX complex (an 80% decrease in the amount of PipX bound estimated similarly). These estimations are very crude, since they do not take in consideration that in S. elongatus PII phosphorylation prevents NAGK binding (Heinrich et al., 2004), and that this phosphorylation is greatly influenced by the abundance of ammonia (Forchhammer and Tandeau de Marsac, 1994). Furthermore, we have not considered in these estimates the influence of the ATP concentrations, recently shown to decrease in vivo in S. elongatus upon nitrogen starvation (Doello et al., 2018). Therefore, the situation is much more complex than would be expected from the mere consideration of the abundances of the different proteins and of the K<sup>D</sup> values for the non-phosphorylated form of PII. Nevertheless, it appears desirable to estimate the influence of different 2OG levels on K<sup>D</sup> values as an important element to take into consideration in future attempts to model the concentrations of the different

FIGURE 8 | Protein complexes of the PII regulatory system in S. elongatus according to availability of ammonium in the cell, and their corresponding functional consequences. The frequencies of the different chains in the various forms are based on the levels of the proteins in the cell found in proteomic studies (Table 1). The PII trimer has been colored blue, PipX and its C-terminal helices are red, PlmA dimers have their DNA binding domains yellow or orange and their dimerization domains in two hues of green, NAGK is shown as a purple crown, and the regulatory and DNA binding domains of NtcA are given dark and light shades of blue, respectively.

complexes of PII and PipX under intermediate conditions of nitrogen richness.

### A NOVEL NETWORK MEMBER FUNCTIONALLY RELATED TO PipX

Recently, a novel protein has been identified as belonging to the PipX regulatory network (**Figure 1**, yellow 3D arrow projecting upwards from the plane of the network). In this case no direct protein-protein interaction with PipX has been shown (Labella et al., 2017). This protein (PipY), is the product of the downstream gene in the bicistronic pipXY operon. The regulatory influence of PipX on PipY was originally detected in functional, gene expression and mutational studies (Labella et al., 2017). More recently it has been concluded (Cantos et al., 2018) that PipX enhances pipY expression in cis, preventing operon polarity, a function that might implicate additional interactions of PipX with the transcription and translation machineries, by analogy with the action of NusG paralogues, which are proteins bearing, as PipX, Tudor-like domains. It has been proposed (Cantos et al., 2018) that the cis-acting function of PipX might be a sophisticated strategy for keeping the appropriate PipX-PipY stoichiometry.

PipY is an intriguing pyridoxal phosphate-containing protein that is folded as a modified TIM-barrel (**Figure 1**). The PipY structure (Tremiño et al., 2017) gives full structural backing to unsuccessful experimental attempts to show enzymatic activity of PipY and its orthologs (Ito et al., 2013). Because of these negative findings, and given the pleotropic effects of the inactivation of the PipY orthologs in microorganisms and humans, it has been concluded that these proteins have as yet unclarified roles in PLP homeostasis (Ito et al., 2013; Darin et al., 2016; Prunetti et al., 2016; Labella et al., 2017; Plecko et al., 2017). Interestingly, these proteins are widespread across phyla (Prunetti et al., 2016) and the deficiency of the human ortholog causes vitamin B6 dependent epilepsy (Darin et al., 2016; Plecko et al., 2017; Tremiño et al., 2018), providing an excellent example that investigations of cyanobacterial regulatory systems like the one summarized here can have far-reaching consequences spanning up to the realm of human and animal pathology. If any lesson can be inferred from all of the above, is that the investigation on PII and particularly on PipX proteins require further efforts.

### FINAL REMARKS

The rich PII regulatory network summarized in **Figure 1** of this review even for a unicellular microorganism with a single type of PII protein, attests the importance of PII and of its regulatory processes. This importance, possibly underrecognized until now, is highlighted, for example, by the very wide distribution of PII proteins among microorganisms and plants. Furthermore, in the many organisms with several genes for PII proteins the levels of complexity expected from PII regulatory networks may be much greater than the one presented here. Each one of these paralogous PII proteins may command a regulatory network, and it would be unlikely that these networks would not be interconnected into a large meshwork that will require the instruments of systems biology to be fully understood.

PipX also deserves deeper attention than received until now. Massive transcriptomics studies (Espinosa et al., 2014) have ascribed to this protein a paramount regulatory role in S. elongatus. For full understanding of this role further searches for PipX-interacting or functional partners like PipY appear desirable, with detailed investigation of the molecular mechanisms of the physical or the functional interactions. PlmA merits particular attention to try to characterize the roles of the PII-PipX-PlmA complex. PipY and its orthologs deserve similar attention, to try to define molecularly their PLP homeostatic functions, a need that is made more urgent by the role in pathology of the human ortholog of PipY. In addition to all of this, the structural evidence reviewed here makes conceivable that adaptor proteins capable of stabilizing active conformations of other transcriptional regulators of the CRP family could exist outside cyanobacteria, mimicking the PipX role. In summary, there are many important questions to be addressed arising from the field reviewed here, some within cyanobacteria, but others concerning whether the mechanisms and complexes exemplified here could have a parallel in other bacterial or even plant species. Clearly more investigations on PII and its partners in other phylogenetic groups using the approaches and experimental instruments used to uncover the cyanobacterial PII regulatory network would appear highly desirable.

## AUTHOR CONTRIBUTIONS

AF-N, JLL, AC, CM-M, and VR reviewed the literature and their own previous work and contributed to the discussions for writing this review. The main writer was VR, but all the authors contributed to the writing of the manuscript. AF-N and CM-M prepared the figures, with key inputs from VR.

### FUNDING

Supported by grants BFU2014-58229-P and BFU2017-84264-P from the Spanish Government.

### ACKNOWLEDGMENTS

We acknowledge support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI). We are grateful to ESRF (Grenoble, France), Diamond (Oxfordshire, UK) and Alba (Barcelona, Spain) synchrotrons for access and for staff support to collect the data used for the determination by our group of most of the structures mentioned in this paper, which have been previously published as referred. CM-M holds a contract of CIBERER.

### REFERENCES


activity is 2-oxoglutarate-regulated by interaction of PII with the biotin carboxyl carrier subunit. Proc. Natl. Acad. Sci. U.S.A. 107, 502–507. doi: 10.1073/pnas.0910097107


revealed by the structures of two hexameric N-acetylglutamate kinases, from Thermotoga maritima and Pseudomonas aeruginosa. J. Mol. Biol. 356, 695–713. doi: 10.1016/j.jmb.2005.11.079


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer JAH declared a shared affiliation, with no collaboration, with several of the authors, AF-N, JL, CM-M, and VR to the handling editor at time of review.

Copyright © 2018 Forcada-Nadal, Llácer, Contreras, Marco-Marín and Rubio. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# RNA Binding Protein Regulation and Cross-Talk in the Control of AU-rich mRNA Fate

Sofía M. García-Mauriño, Francisco Rivero-Rodríguez, Alejandro Velázquez-Cruz, Marian Hernández-Vellisca, Antonio Díaz-Quintana, Miguel A. De la Rosa and Irene Díaz-Moreno\*

Instituto de Investigaciones Químicas, Centro de Investigaciones Científicas Isla de la Cartuja, Universidad de Sevilla, Consejo Superior de Investigaciones Científicas, Seville, Spain

#### Edited by:

Maria Rosaria Conte, King's College London, United Kingdom

#### Reviewed by:

Graeme L. Conn, Emory University School of Medicine, United States Santiago Martinez-Lumbreras, King's College London, United Kingdom Scott A. Tenenbaum, University at Albany, Suny, United States Teresa Carlomagno, Leibniz University of Hanover, Germany

> \*Correspondence: Irene Díaz-Moreno idiazmoreno@us.es

#### Specialty section:

This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences

> Received: 26 July 2017 Accepted: 04 October 2017 Published: 23 October 2017

#### Citation:

García-Mauriño SM, Rivero-Rodríguez F, Velázquez-Cruz A, Hernández-Vellisca M, Díaz-Quintana A, De la Rosa MA and Díaz-Moreno I (2017) RNA Binding Protein Regulation and Cross-Talk in the Control of AU-rich mRNA Fate. Front. Mol. Biosci. 4:71. doi: 10.3389/fmolb.2017.00071 mRNA metabolism is tightly orchestrated by highly-regulated RNA Binding Proteins (RBPs) that determine mRNA fate, thereby influencing multiple cellular functions across biological contexts. Here, we review the interplay between six well-known RBPs (TTP, AUF-1, KSRP, HuR, TIA-1, and TIAR) that recognize AU-rich elements (AREs) at the 3 ′ untranslated regions of mRNAs, namely ARE-RBPs. Examples of the links between their cross-regulations and modulation of their targets are analyzed during mRNA processing, turnover, localization, and translational control. Furthermore, ARE recognition can be self-regulated by several factors that lead to the prevalence of one RBP over another. Consequently, we examine the factors that modulate the dynamics of those protein-RNA transient interactions to better understand the final consequences of the regulation mediated by ARE-RBPs. For instance, factors controlling the RBP isoforms, their conformational state or their post-translational modifications (PTMs) can strongly determine the fate of the protein-RNA complexes. Moreover, mRNA specific sequence and secondary structure or subtle environmental changes are also key determinants to take into account. To sum up, the whole understanding of such a fine tuned regulation is a challenge for future research and requires the integration of all the available structural and functional data by in vivo, in vitro and in silico approaches.

Keywords: mRNA fate, post-transcriptional regulation, RNA binding proteins, stability, translation

#### POST-TRANSCRIPTIONAL REGULATION OF GENE EXPRESSION BY ARE-RBPs

In eukaryotes, gene expression levels and protein abundance are often correlated but are subjected to a strict regulation. The control of mRNA metabolism allows cells to rapidly adapt to changing environmental conditions. Regulatory processes occurring after mRNA transcription—namely post-transcriptional control—strongly influence mRNA fate and, consequently, final protein levels (Vogel and Marcotte, 2012). Once mRNA transcription occurs in the nucleus, RNA Binding Proteins (RBPs) recognize the primary transcript or pre-mRNA to regulate its alternative splicing, polyadenylation, and capping (**Figure 1**). The generated mature mRNA is then transported to the cytoplasm by various other RBPs. Once in the cytoplasm, RBPs govern the stability, distribution to different cellular compartments and the translation of target mRNAs into their corresponding protein products (Matoulkova et al., 2012).

Within RBPs, ARE-RBPs function as trans-acting factors recognizing cis-acting elements in the 3′ -Unstranslated Regions (UTR) of eukaryotic mRNA enriched in adenylate and uridylate (AU-rich elements or AREs). AREs are present in 5–8% of human genes with diverse functions such as cell growth and differentiation, signal transduction, apoptosis, nutrient transport, and metabolism. This list is dominated by genes involved in transient processes, which therefore require strict expression control (Barreau et al., 2005). For instance, the length and specific pattern of AREs may contribute to mRNA lifetime (Khabar, 2005). However, the final mRNA fate will be determined by the variable and dynamic ARE-RBPs/mRNA interactions or by RBP competition for the same transcript. Besides, ARE-RBPs bind to AREs via a variety of domains including the so-called RNA-Recognition Motif (RRM), the CCCH tandem zinc finger and the K-Homology domain (KH) (Stoecklin and Anderson, 2006; Clery et al., 2008; Valverde et al., 2008; Daubner et al., 2013). A single protein can contain several of these motifs leading to simultaneous interactions with either multiple targets or multiple sites within a particular target (Shen and Malter, 2015). Additionally, most ARE-RBPs shuttle between nucleus and cytoplasm; and their functions are linked to their specific subcellular distribution (Gama-Carvalho and Carmo-Fonseca, 2001).

In this mini-review we focus on the post-transcriptional regulation exerted by six of the best studied ARE-RBPs whose cross-talk has biological relevance and has been widely reported in the literature. Moreover, we examine the multiple intracellular signals and factors controlling the interactions between these proteins. AU-binding Factor 1 (AUF1), also known as Heterogeneous Nuclear RiboNucleo-Protein D (hnRNPD), is included for being the first identified ARE-RBP (Brewer, 1991). AUF1 is generally considered to promote the decay of target mRNAs, although the stabilization of some other transcripts has been also reported (Xu et al., 2001; Stoecklin and Anderson, 2006). Since AUF1 discovery, 20 additional ARE-RBPs have been identified. That list includes those that primarily promote mRNA degradation, such as Tristetraprolin (TTP) and KH domainsplicing regulatory protein (KSRP) (Gherzi et al., 2004; Sanduja et al., 2011); those stabilizing mRNA, such as Human antigen R (HuR) (Brennan and Steitz, 2001); and translational control proteins, such as T-cell intracellular antigen 1 (TIA-1) and TIA-1–related protein (TIAR) (Kawai et al., 2006; Mazan-Mamczarz et al., 2006).

**Abbreviations:** AREs, AU-Rich Elements; ARE-RBPs, RNA Binding Proteins that recognize AU-Rich Elements; AUF-1, AU-binding Factor 1; c-fos, Finkel-Biskis-Jinkins murine osteosarcoma viral oncogene homolog; c-myc, Avian myelocytomatosis virus oncogene cellular homolog; COX-2, Cyclooxygenase-2; DDR, DNA Damage Response; hnRNPD, Heterogeneous Nuclear RiboNucleo-Protein D; HuR, Human antigen R; KH, hnRNP K-Homology; KSRP, KH type Splicing Regulatory Protein; PAR, Poly (ADP-Ribose); P-bodies, Processing bodies; PTMs, Post-Translational Modifications; PRD, Prion-Related Domain; RBPs, RNA Binding Proteins; RNPs, Ribonucleoprotein Particles; RRM, RNA Recognition Motif; SGs, Stress Granules; TIA-1, T-cell Intracellular Antigen 1; TIAR, TIA-1 Related protein; TNFα, Tumor Necrosis Factor α; TTP, Tristetraprolin; UTR, Untranslated Region; VEGF, Vascular Endothelial Growth Factor.

### INTERPLAY BETWEEN ARE-RBPs IN THE POST-TRANSCRIPTIONAL REGULATION OF mRNAs

It is well-known that the substrates of post-transcriptional control are RNA ribonucleoprotein particles or RNPs containing mRNA molecules covered with RBPs, rather than naked mRNA (Szostak and Gebauer, 2013) (**Figure 1**). However, our understanding of how ARE-RBPs interact with each other at different regulatory levels is rather limited. Noticeably, some RBPs regulate the mRNA that encodes their own gene products, as well as those of other RBP counterparts, establishing selfregulatory loops controlling mRNA metabolism (Pullmann et al., 2007).

A good example of cross-talk between RBPs is the one involving HuR, KSRP and TTP proteins. These three proteins compete with each other for binding to common recognition sequences in the AREs that they regulate. Hence, TTP and KSRP negatively control the stability of several mRNAs—such as c-fos, TNFα and COX-2—whereas HuR generally acts in an opposite way, stabilizing them (Chen et al., 2001, 2002; Dean et al., 2001; Sawaoka et al., 2003; Katsanou et al., 2005; Winzen et al., 2007) with some exceptions (Katsanou et al., 2005; Kim et al., 2009) (**Table 1**). Moreover, TTP acts as a negative regulator of its own mRNA (Tchen et al., 2004; Lin et al., 2007) as well as HuR mRNA, its direct antagonist in mRNA regulation (Al-Ahmadi et al., 2009). On the other hand, HuR acts as a positive translational regulator of both KSRP and HuR mRNAs (Pullmann et al., 2007; Yi et al., 2010); while both proteins regulate the stability of their own mRNAs (Winzen et al., 2007; Al-Ahmadi et al., 2009) (**Table 1**, dashed square). Consequently, the redundant feedback involving KSRP, TTP, and HuR may provide a bi-stable signal transduction circuit in which either all or none of their target mRNAs are stabilized and/or translated. More intriguingly is the role of AUF1 in this regulatory loop as it presents four isoforms generated by alternative splicing of a single mRNA transcript (Wagner et al., 1998) with different RNA-binding affinities and specificities for its target mRNAs—such as c-fos, c-myc, TNFα, VEGF, and COX-2 (Brewer, 1991; Loflin et al., 1999; Lasa et al., 2000; Xu et al., 2001; Fellows et al., 2012).

In addition to recognizing AU-rich sequences at the 3′ UTR of target mRNAs, some ARE-RBPs are able to activate splice 5′ sites followed by U-rich sequences. This is the case of TIA-1 and TIAR, that upregulate the translation of their own coding mRNAs (Le Guiner et al., 2001). Conversely, although consistent with their functional redundancy, their translation levels are negatively cross-regulated by each other (Le Guiner et al., 2001; Izquierdo and Valcárcel, 2007b; Pullmann et al., 2007) (**Table 1**, dotted square). Interestingly, TIA-1 and TIAR share common functions, acting as negative translational regulators of diverse mRNAs—such as c-myc, TNFα, VEGF, and COX-2—and are able to compensate for each other (Gueydan et al., 1999; Piecyk et al., 2000; Zhang et al., 2002; Cok et al., 2003; Dixon et al., 2003; Lu et al., 2009; Hamdollah Zadeh et al., 2015). In addition, it has been shown that HuR positively controls TIA-1 expression by enhancing its mRNA stability (Pullmann et al., 2007). By contrast, TIA-1 knockdown causes a marked increase in HuR levels, indicating that TIA-1 may contribute to lowering HuR levels in the cell (Kawai et al., 2006) (**Table 1**, black square). This is of great importance because both HuR and TIA-1 bind to cytochrome c (Cc) mRNA, respectively promoting or inhibiting its translation without affecting its mRNA stability. The struggle between HuR (antiapoptotic factor) and TIA-1 (proapoptotic factor) for the control of Cc mRNA translation underlies possible mechanisms to regulate both cellular respiration and programmed cell death. A direct binding between HuR and TIAR mRNA has also been reported (Pullmann et al., 2007) but, unexpectedly, TIAR does not seem to complex with Cc mRNA, despite the extensively shared homology between TIAR and TIA-1 (Kawai et al., 2006).

### FACTORS THAT MODULATE ARE-RBP/mRNA INTERACTIONS

Several examples of cross-talk between ARE-RBPs highlight that there must be an intricate network of regulatory events that lead to the prevalence of one RBP over the others when recognizing the same mRNA target. Thus, the regulatory activity of RBPs on gene expression is dynamic and adapts to cell conditions continuously. In this section, we briefly describe those factors for which there is evidence of their influence on the interaction between RBPs and their mRNA targets (Supplemental Figure 1).

#### RBP Isoforms

Alternative splicing is a highly regulated process that allows the synthesis of multiple different transcripts from the same gene, and is therefore an important source of protein diversity and complexity. The slight differences in amino acid sequence between isoforms can be determining for their function (Gallego-Páez et al., 2017). For example, TIA-1 and TIAR present two isoforms—a and b—in humans. Isoform a in TIA-1 and TIAR possesses 11 or 17 extra amino acids, respectively, that are critical for distinct functional properties. For instance, only TIAR isoform a—but not TIAR b and none of TIA-1 isoforms—has a translational silencing activity on the proteolytic enzyme Human Matrix Metalloproteinase-13 (HMMP13) in HEK293 cells. High levels of expression of HMMP13 have been documented in certain subset of cancers. Therefore, its downregulation by TIAR a may act as a tumor suppression mechanism (Yu et al., 2003).

As previously mentioned, AUF1 isoforms come from the alternative splicing of the same pre-mRNA. They differ as a function of the presence or absence of two independent domains encoded by exons 2 and 7. While p37AUF1 lacks both domains, p42AUF1 and p45AUF1 include a 49-amino acid domain encoded by exon 7 and p40AUF1 and p45AUF1 both contain a 19-amino acid domain encoded by exon 2. Inclusion of the exon 2-encoded sequence reduces the affinity of the first and second binding events of AUF1 dimers toward their mRNA substrates, but incorporation of the exon 7-encoded sequence increases the affinity of the second binding event. The isoformspecific differences provide unique biochemical characteristics that explain the diversity of AUF1 functions and complex regulation (Zucconi et al., 2010).

TABLE 1 | Matrix representation of the interaction of selected RBPs (vertical axis) with the mRNA of those RBPs (Upper table) and several ARE-containing mRNA targets (Lower table, horizontal axis).


#### ARE-containing mRNA


Blue colors show positive regulation, whereas negative regulations are colored in red. Gray color indicates interactions that have been described but the effects were not examined. n.d., non-described; S, Splicing; \*, Postulated regulations. The cross-talk between TTP, KSRP and HuR is highlighted by a dashed square; between HuR and TIA-1 by a black square; and between TIA-1 and TIAR by a dotted square.

#### RBP Post-Translational Modifications

Post-Translational Modifications (PTMs), such as phosphorylation, isomerization, methylation, NEDDylation, acetylation, and ubiquitination of RBPs have a major influence on their function and/or their affinity toward their targets, with the consequent impact on mRNA stability, turnover and translation efficiency (Lee, 2012). For instance, the phosphorylation of p40AUF1 in residues Ser83 and Ser87 influences the sequential binding of dimers to TNFα mRNA (Wilson et al., 2003). Single phosphorylation of Ser83 inhibits by 40% the initial dimer binding to mRNA substrate, whereas Ser87-single phosphorylation induces a 2-fold increase in the affinity of the second binding event. In addition, when simultaneous phosphorylation of both residues occurs, the negative effect on the binding affinity of Ser83 prevails over the positive effect of Ser87 (Wilson et al., 2003). Several phosphorylation sites have also been identified in TTP (Cao et al., 2006, 2014). Phosphorylated TTP binds with a lower affinity than the dephosphorylated TTP to target AREs (Carballo et al., 2001; Hitti et al., 2006). Phosphorylation of RBPs can also modify their activity without altering the affinity for mRNA targets. Such is the case of TIA-1 and TIAR, whose splicing control over the Fas gene sequence determines the expression of the pro-apoptotic membrane-bound form in detriment of the anti-apoptotic soluble one (Izquierdo and Valcárcel, 2007a). Moreover, HuR methylation has been proposed to increase the nuclear export of HuR, which could be important for mRNA localization (Li et al., 2002). NEDDylation of HuR increases its stability and lifetime, which, in turn, can affect the total levels of HuR target mRNAs due to its main stabilizing action (Embade et al., 2012; Fernández-Ramos and Martínez-Chantar, 2015).

#### RBP Conformational Changes

ARE-RBPs can undergo conformational changes upon binding to their targets (Ellis and Jones, 2008). These variations can be detected in the contact surface with mRNAs as well as in distant areas, meaning that ARE-RBPs can adapt both the local and global structure. An example of conformational changes that influence ARE recognition has been reported for KSRP. An interdomain re-arrangement, that orients the two central KH domains and their RNA-binding surfaces creating a two-domain unit, is crucial for its role in ARE-mediated mRNA decay (Supplemental Figure 2) (Díaz-Moreno et al., 2010). Additionally, some of the PTMs mentioned above can also influence the conformation of RBPs. Hence, the phosphorylation of Ser193 within the Nterminal KH motif (KH1) of KSRP leads to the unfolding of this structurally atypical and unstable domain, creating a binding site for 14-3-3ζ, driving the nuclear localization of KSRP and controlling its mRNA-degradation activity (Díaz-Moreno et al., 2009).

Another important regulation factor is the RBP oligomerization state upon mRNA recognition. HuR RRM1 domain and RRM1-2 di-domain (the main platform of cytoplasmic mRNA binding in HuR) form homodimers in solution (Benoit et al., 2010). This phenomenon is dependent on Cys13, which is able to form disulfide bonds. Such homodimerization may modulate HuR function upon oxidative stress. Moreover, the HuR RRM3 domain has been found to be involved in protein oligomerization and RNA recognition, both functions regulated by the same RRM but using different surfaces at opposite sides of the domain. The conserved Trp261 residue is key for dimerization, as the substitution by glutamic acid alters its dimerization dynamics and stabilizes the monomeric state (Scheiba et al., 2014; Díaz-Quintana et al., 2015).

### Cellular Conditions and Stress Response

Eukaryotic cells have evolved sophisticated strategies to overcome stress. One of them is the assembly of Stress Granules (SGs), which allows mRNA translation silencing and protection from degradation. Among RBPs with critical roles in neurodegenerative diseases, TIA-1 proteins are essential in SG formation (Mazan-Mamczarz et al., 2006; Vanderweyde et al., 2012). Hence, under hypoxic conditions, TIA-1 and TIAR block the expression of hypoxia-inducible factor (HIF)-1α through binding to its ARE-containing mRNA (Gottschald et al., 2010). Inhibition of this transcription factor is enhanced when both RBPs are organized into SGs. In addition, HuR also aggregates into SGs to halt the translation of specific housekeeping mRNAs under stress conditions (Bergalet et al., 2011). The deregulation of SGs results in cytoplasmic accumulation and subsequent pathologies such as Parkinson and Alzheimer (Vanderweyde et al., 2012).

Variations in pH values can also modulate the binding of TIA-1 to nucleic acids, acting as a pH-dependent molecular switch. The pK<sup>a</sup> values of the histidine imidazole groups of TIA-1 RRM2 and RRM3 are substantially higher in complexes with short RNA and DNA oligonucleotides than in the isolated domains. Interestingly, those pK<sup>a</sup> values are also controlled by slight environmental pH changes (Cruz-Gallardo et al., 2013, 2015). This fact provides valuable information to understand the pH effect on ARE-RBPs when shuttling among cellular compartments with different pHs (nucleus, cytoplasm, SGs, etc.).

During oxidative stress, AUF1 binding to mRNAs containing 8-oxo-7,8-dihydro-guanine could play a role in the selective elimination of oxidized mRNA by presumably driving their degradation (Ishii et al., 2015). Finally, HuR localization can also be altered upon different stress signals such us UV, actinomycin D or hydrogen peroxide, leading to the cytoplasmic accumulation of the protein. However, after a heat shock treatment, the decrease in HuR protein levels enhances cell survival. This phenomenon is linked to the ubiquitination of Lys182, promoting protein degradation, which finally interferes with the binding of HuR to its target mRNAs (Abdelmohsen et al., 2009).

#### mRNA Specific Sequence and Conformation

RBPs do not interact with the same affinity with every ARE-containing mRNA; instead, preferences exist for certain sequences. For instance, TIA-1 RRM domains display different binding constants during nucleic acid recognition. Indeed, the central domains (RRM2 and RRM3) constitute the mRNA binding platform of the protein. RRM2 drives the interaction with RNA, and shows the highest affinities for pyrimidine rich sequences. In turn, RRM3 enhances the overall TIA-1 binding affinity for RNA, preferentially interacting with C-rich motifs (Cruz-Gallardo et al., 2014; Wang et al., 2014; Waris et al., 2017). Moreover, HuR and TIAR interact with U- and AU-rich mRNAs in vitro, with greater affinity (≈10-fold) for the former ones. This higher affinity for U-rich mRNAs results from a higher association rate constant, mainly derived from the presence of a greater number of effective binding positions (Kim et al., 2011). However, in vivo analysis showed that HuR stabilized AU-rich mRNAs to a greater extent than U-rich mRNAs (Brennan and Steitz, 2001). Additionally, the KH domains of KSRP behave as independent binding modules with different affinities for AUrich mRNAs, explaining the broad range of targets recognized by the protein. While the fourth KH domain (KH4) is primarily responsible for mRNA binding and decay through an essential structural element in its β4, KH3 is also necessary to drive the recognition of AU- and G-rich sequences. On the other hand, all KH domains show a clear negative selection for C-rich sequences (García-Mayoral et al., 2007, 2008). Interestingly, many RNA targets of HuR, which acts antagonistically to KSRP, often contain isolated Gs but very rarely Cs (López De Silanes et al., 2004).

Conformational changes in the ARE-mRNA structure have also the potential to regulate the binding affinity of RBPs. These changes may precede the binding of RBPs, as occurs with TNFα mRNA as a consequence of the stabilization of its folding mediated by divalent cations such as Mg2<sup>+</sup> (Wilson et al., 2001a,b). In addition, the AU-rich motif of TNFα mRNA can also adopt a hairpin-like structure that inhibits specifically p37AUF1 binding, but hardly affects its interaction with HuR (Fialcowitz et al., 2005). On the other hand, the association of RBPs can cause local changes in the structure of their cognate mRNAs, which may affect the recruitment of new trans-acting factors or establish preferences for one RBP over another. Consequently, these changes would directly impact on the turnover rates of such ARE-containing mRNAs (Wilson et al., 2001b; Zucconi et al., 2010).

### DNA Recognition and Role of RBPs in DNA Damage Response

Some ARE-RBPs also have the ability to bind to DNA. Importantly, in the case of TIA-1 and TIAR, it occurs with a markedly higher affinity than both RBPs show for their mRNA targets (Suswam et al., 2005; Waris et al., 2017). In fact, it has been hypothesized that the formation of the RBP-mRNA complexes would require the direct displacement of the RBP from its DNAbinding site by the polymerase. This dual binding capacity of TIA-1 and TIAR could be potentially providing a link between transcription and splicing (Suswam et al., 2005; Mcalinden et al., 2007; Waris et al., 2017).

Interestingly, several RBPs are involved in DNA Damage Response (DDR), being recruited to DNA breaks in a Poly (ADP-Ribose) (PAR)-dependent manner and/or forming liquid-like compartments by phase separation (Kai, 2016). The formation of these phases requires the presence of an unstructured Prion-Related Domain (PRD) like the one that is present in TIA-1 and TIAR proteins (Gilks et al., 2004). Importantly, abnormal phase separation by mutated PRD-containing proteins leads to pathological protein aggregation and is associated with neurodegenerative and aging-associated diseases (Kai, 2016).

### CONCLUSIONS AND FUTURE PERSPECTIVES

Before being translated into proteins, mRNAs are subjected to a sequential and strict control by RBPs exerted by the

#### REFERENCES


recognition of AREs in their 3′ -UTRs. Regulation of mRNA homeostasis through ARE-RBPs allows the fine tuning of responses by controlling mRNA translation, degradation, or storage in diverse eukaryotic cell compartments (Glisovic et al., 2008; Ganguly et al., 2016). As reviewed above, many examples of ARE-RBP interactions have been reported in the literature, but it is still not well-understood how RBP domains collaborate or compete with each other for the modulation of its targets. The proper inspection of such a convoluted interplay between RBPs requires the combination of different methods in order to compensate the specific strengths and weaknesses of each technique. On the other hand, it becomes more and more evident the need of a transition from a static to a dynamic point of view to take into account the biological environment during RNA binding. Consequently, the integration of the information obtained by in vivo approaches with the structural data would be of great interest. Moreover, the understanding of the ARE-mRNAs processing in highly dynamic and often transient macromolecular complexes also remains challenging (Rissland, 2017). Finally, the key role of intrinsically disordered connecting linkers between RNA binding domains has acquired significant relevance in the latest reports (Basu and Bahadur, 2016). Altogether, the examples of mRNAprotein interactions by ARE-RBPs herein reviewed highlight the need for integrative studies to fully understand such a fine tuned regulation.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct, and intellectual contribution to the work, and approved it for publication.

#### ACKNOWLEDGMENTS

Financial support was provided by the Andalusian Government (P11-CVI-7216, BIO198); the Spanish Ministry of Economy, Industry and Competitiveness (BFU2015-71017-P); the Spanish Ministry of Education, Culture and Sports (FPU013/04373, FPU016/01513) and the Ramón Areces Foundation.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb. 2017.00071/full#supplementary-material


translation mainly by altering Tristetraprolin expression, stability, and binding to adenine/uridine-rich element. Mol. Cell. Biol. 26, 2399–2407. doi: 10.1128/MCB.26.6.2399-2407.2006


Valverde, R., Edwards, L., and Regan, L. (2008). Structure and function of KH domains. FEBS J. 275, 2712–2726. doi: 10.1111/j.1742-4658.2008.06411.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer SM-L and handling Editor declared their shared affiliation.

Copyright © 2017 García-Mauriño, Rivero-Rodríguez, Velázquez-Cruz, Hernández-Vellisca, Díaz-Quintana, De la Rosa and Díaz-Moreno. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Insight into the Selectivity of the G7-18NATE Inhibitor Peptide for the Grb7-SH2 Domain Target

Gabrielle M. Watson, William A. H. Lucas, Menachem J. Gunzburg and Jacqueline A. Wilce\*

Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia

Growth factor receptor bound protein 7 (Grb7) is an adaptor protein with established roles in the progression of both breast and pancreatic cancers. Through its C-terminal SH2 domain, Grb7 binds to phosphorylated tyrosine kinases to promote proliferative and migratory signaling. Here, we investigated the molecular basis for the specificity of a Grb7 SH2-domain targeted peptide inhibitor. We identified that arginine 462 in the BC loop is unique to Grb7 compared to Grb2, another SH2 domain bearing protein that shares the same consensus binding motif as Grb7. Using surface plasmon resonance we demonstrated that Grb7-SH2 binding to G7-18NATE is reduced 3.3-fold when the arginine is mutated to the corresponding Grb2 amino acid. The reverse mutation in Grb2-SH2 (serine to arginine), however, was insufficient to restore binding of G7-18NATE to Grb2-SH2. Further, using a microarray, we confirmed that G7-18NATE is specific for Grb7 over a panel of 79 SH2 domains, and identified that leucine at the βD6 position may also be a requirement for Grb7-SH2 binding. This study provides insight into the specificity defining features of Grb7 for the inhibitor molecule G7-18NATE, that will assist in the development of improved Grb7 targeted inhibitors.

#### Edited by:

Rivka Isaacson, King's College London, United Kingdom

#### Reviewed by:

Krystle J. McLaughlin, Lehigh University, United States Emeric Miclet, Université Pierre et Marie Curie, France

> \*Correspondence: Jacqueline A. Wilce jackie.wilce@monash.edu

#### Specialty section:

This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences

Received: 25 July 2017 Accepted: 13 September 2017 Published: 26 September 2017

#### Citation:

Watson GM, Lucas WAH, Gunzburg MJ and Wilce JA (2017) Insight into the Selectivity of the G7-18NATE Inhibitor Peptide for the Grb7-SH2 Domain Target. Front. Mol. Biosci. 4:64. doi: 10.3389/fmolb.2017.00064 Keywords: Grb7, SH2 domain, inhibitor specificity, BC loop, peptide inhibitor

### INTRODUCTION

For over 20 years, Src homology 2 (SH2) domains have been explored as targets for the development of potential therapeutics (Kraskouskaya et al., 2013; Morlacchi et al., 2014). SH2 domains mediate the formation of protein complexes by recognizing phosphorylated tyrosines (pY) on target tyrosine kinases and subsequently mediating their downstream effects. They are utilized by proteins with a range of functions, including enzymes such as the PLCγ and the Src kinase (Rebecchi and Pentyala, 2000; Roskoski, 2004), transcriptional regulators including STAT 1-6 (Darnell, 1997) and adaptor proteins including growth factor receptor bound protein 7 (Grb7) (Han et al., 2001). The reversible association of the SH2 domain/pY complex allows for the efficient regulation of signaling and is therefore frequently found in proteins mediating intracellular signaling pathways—including those regulating proliferation, migration and growth (Schlessinger and Lemmon, 2003). It is these same signaling networks that are susceptible to dysregulation, leading to the amplification and transmission of signals that drive cells into a cancerous state. Thus, targeting SH2 domains can serve as an effective point for the development of anti-cancer agents that effectively block the downstream effects of signaling.

Grb7 has been identified as a therapeutic target due to its role in the progression of a number of cancers including breast, pancreatic, ovarian and oesophageal cancers (Stein et al., 1994; Tanaka et al., 1997, 2006; Wang et al., 2010). In HER2+ breast cancer, Grb7 is co-overexpressed with HER2, leading to enhanced tumorigenesis and cell proliferation (Bai and Luoh, 2007; Chu et al., 2010; Pradip et al., 2013). While HER2 has a well-known role in breast cancer progression, overexpression of Grb7 has now also been identified as a significant predictor for reduced cancer-free periods and a worse prognosis for breast cancer patients (Ramsey et al., 2011). In pancreatic, oesophageal and triple negative breast cancers, Grb7 drives migratory and invasive events, with Grb7 knockdown inhibiting these processes when tested in vitro (Tanaka et al., 1997, 2000, 2006; Giricz et al., 2012). Furthermore, a significant relationship has been identified between Grb7 expression and tumor metastasis in pancreatic and esophageal cancers (Tanaka et al., 1997, 2006).

Grb7 has a multi-domain architecture consisting of an Nterminal proline rich domain, a Ras-associating (RA) domain, pleckstrin homology (PH) domain, a between the PH and SH2 (BPS) domain, and lastly a C-terminal SH2 domain (Shen and Guan, 2004). It is via the SH2 domain that Grb7 interacts with phosphorylated tyrosine kinases including growth factor receptors such as HER2, HER3 and EGFR, as well as cytoplasmic kinases including the focal adhesion kinase (FAK) (Stein et al., 1994; Daly et al., 1996; Han and Guan, 1999). Through these interactions, Grb7 mediates signaling networks controlling proliferation, migration and growth, making the Grb7-SH2 domain an attractive candidate for the development of targeted inhibitors (Han and Guan, 1999; Pero et al., 2003; Pradip et al., 2013).

SH2 domains are extremely prevalent in the proteome, with over 110 proteins bearing the domain. Thus, ensuring target selectivity is a critical aspect of the development of molecules targeting SH2 domains. Despite the high number, SH2 domains display exquisite selectivity for their substrates (Pawson, 2004). SH2 domains contain a well-characterized positively charged cleft for the pY to bind, but it is typically the residues C-terminal to the target pY that dictate the binding specificity for a substrate (Songyang et al., 1993). It has been determined that individual SH2 domains recognize characteristic binding motifs. In the case of Grb7 this motif has been identified as pYXN, where any residue is accommodated at the +1 position, and an asparagine is preferred at the +2 position (Pero et al., 2003). The SH2 domains of Grb2 and Grb7 have low amino acid identity (29%), however, the Grb2-SH2 also recognizes this binding motif, and both adaptor proteins bind to HER2 at pY1139 (Janes et al., 1997). Despite this binding similarity, Grb7 and Grb2 can have remarkable selectivity and an inhibitor has been developed that specifically binds and inhibits Grb7-SH2 (Pero et al., 2002).

The cyclic, non-phosphorylated peptide (named G7-18NATE (cyclo-(CH2CO-WFEGYDNTFPC)-amide) was developed via phage display and was found to specifically decrease binding between Grb7 and tyrosine phosphorylated HER family members in breast cancer cell extracts, but have no effect on the interaction between Grb2 and HER3 (Pero et al., 2002). When G7-18NATE was attached to the Penetratin cell permeability sequence, the peptide inhibited growth, migration and proliferation in breast cancer cell lines, and displayed synergistic effects on cell proliferation with the currently available chemotherapeutics Doxorubicin and Trastuzumab (Pero et al., 2007; Pradip et al., 2013). G7-18NATE-Penetratin was also shown to specifically inhibit the interaction between Grb7 and FAK, but not interfere with the Grb2/FAK or Grb2/EGFR interactions (Tanaka et al., 2006). This specificity for the Grb7-SH2 has also been demonstrated in vitro with surface plasmon resonance (SPR) experiments confirming that G7-18NATE specifically binds to Grb7-SH2 preferentially over the Grb2-SH2 domain (Gunzburg et al., 2012). G7-18NATE also displayed minimal binding to the SH2 domains of Grb10 and Grb14, proteins that share the same domain structure as Grb7 (with the three Grbs collectively termed the Grb7 family). Derivatives of G7-18NATE have also been developed that bind to the Grb7-SH2 domain with higher affinity than G7-18NATE (K<sup>D</sup> = 0.83 µM) but with the equivalent specificity (Watson et al., 2015; Gunzburg et al., 2016).

While it has been demonstrated repeatedly that G7-18NATE and its derivatives are specific for Grb7, there has been little investigation into the molecular basis of Grb7 ligand selectivity, except for analysis of the βD6 amino acid (Leu481 in Grb7). It is important to note that the nomenclature to be used to describe SH2 domain secondary structure is as derived by Eck et al. as illustrated in **Figure 1A** (Eck et al., 1993). While Grb10 and Grb14 possess a polar glutamine at the βD6 position, Grb7 has a hydrophobic leucine residue. When Leu481 in Grb7- SH2 was mutated to glutamine, binding to the HER2 receptor was completely abolished (Janes et al., 1997). Conversely, the opposite mutation in Grb14 (glutamine to leucine) imparted binding of Grb14 to the HER2 receptor. Together this established the importance of a leucine at the βD6 position for Grb7-SH2 domain interactions with its target motif. In the case of the Grb2- SH2 domain, however, the βD6 amino acid is a lysine and this would be predicted to interfere with ligand binding. Yet some binding to Grb2-SH2 domain by G7-18NATE still takes place, showing that it is not only the βD6 amino acid that dictates G7-18NATE binding (Gunzburg et al., 2012).

The structure of the Grb7-SH2 domain in complex with G7- 18NATE revealed the amino acid residues forming the binding site for the peptide ligand (Ambaye et al., 2011b). It showed the way in which G7-18NATE Y5 and F2 (note that single letter code will be used in reference to the peptide ligand amino acids for clarity) straddle the Grb7-SH2 domain Leu481 residue, helping to visualize the role of this amino acid in the specificity of the interaction. Leu481, and other Grb7-SH2 domain amino acids that form hydrogen bond interactions or contribute to the buried surface area with G7-18NATE are indicated in **Figure 1C**. For the most part, these residues are conserved between Grb7-SH2 domain and the closely related Grb10, Grb14, and Grb2 SH2 domains and, thus, do not help to explain the basis for Grb7 ligand specificity. Only the βE4 methionine and EF4 glutamine are unique to the Grb7-SH2 domain and may act alongside Leu481 to contribute to ligand specificity.

**Abbreviations:** BPS, between PH and SH2 domain; FAK, focal adhesion kinase; Grb, growth factor receptor bound protein; PH, pleckstrin homology; pY, phosphorylated tyrosine; RA, ras-associating; SPR, surface plasmon resonance; SH2, src homology 2; WT, wild-type.

Frontiers in Molecular Biosciences | www.frontiersin.org

displayed above the alignment at the corresponding position, and sequence numbering for the Grb7-SH2 displayed below the alignment. Grb7 amino acids that form hydrogen bond interactions with the G7-18NATE peptide are identified with filled circles and additional amino acids that contribute to the buried surface area are identified with empty circles. The unique Grb7-SH2 arginine at position 462 (BC2) is highlighted by an orange box.

In this study, we identified an additional amino acid in the Grb7-SH2 BC loop that is involved in binding to G7- 18NATE and potentially also contributes to Grb7-SH2 domain binding specificity. Based on another Grb7-SH2/peptide complex X-ray crystal structure, we postulated that the Grb7 BC2 arginine contributes to the Grb7-SH2/peptide interaction. In Grb2, Grb10, and Grb14 the BC2 residue is a serine. Using mutational analysis we demonstrate that this Grb7 residue does indeed contribute to the G7-18NATE interaction, and thus may dictate some of the Grb7 selectivity. We identified that mutating the BC2 residue in the Grb2-SH2 from serine to arginine is not sufficient to confer binding to G7-18NATE indicating that other Grb7 amino acids also contribute to binding strength and specificity. Lastly, using a pY micro-array we confirmed that G7- 18NATE is specific for Grb7 compared to a panel of 79 SH2 domains, providing extra verification of the specificity of this peptide. Together this study provides insight into the specificity of G7-18NATE for Grb7 which is critical for the development of effective therapeutics targeted to the Grb7-SH2 domain signaling in cancer.

### MATERIALS AND METHODS

### Protein and Peptide Preparation

Grb7-SH2 and Grb2-SH2 were prepared as GST fusion proteins and expressed and purified as previously described (Watson et al., 2015). Briefly, Grb7-SH2 (residues 415-532) and Grb2-SH2 (residues 58-160) were incorporated into the pGex2T plasmid and expressed in BL21(DE3)pLysS competent cells following IPTG induction (0.4 mM) at 25◦C. Subsequently, glutathione affinity chromatography and size exclusion chromatography were used to purify the soluble GST tagged proteins. To generate the Grb7-SH2 R462S and Grb2-SH2 S90R mutants, site-directed mutagenesis was utilized (Liu and Naismith, 2008). For this, the wild-type SH2 domain constructs were used as templates for polymerase chain reaction with the primers listed in Supplementary Material. The parent plasmid was subsequently removed by treatment with Dpn1 and the presence of the mutation verified by DNA sequencing. The mutant proteins were expressed and purified as per the wild-type proteins. GST alone was purified similarly to the GST-SH2 domain proteins with the exception that size exclusion chromatography was not necessary for purification.

G7-18NATE (cyclo-(CH2CO-WFEGYDNTFPC)-amide) was synthesized using standard Fmoc-chemistry and purchased from Purar Chemicals (Australia). The synthesis of G7-18NATE-PB ((cyclo-(CH2CO-WFEGYDNTFPC-RRMKWKKK(Biotin)) amide)) has been previously described (Ambaye et al., 2011a).The purity of both peptides was >95% as determined using LC-MS.

The final solution concentration of all proteins and peptides used in this study were determined spectroscopically at 280 nm using extinction coefficients predicted by the ProtParam server (Gasteiger et al., 2005).

### Binding Studies Using Surface Plasmon Resonance

SPR experiments were performed on a Biacore T100 using CM5 series S sensor chips. The GST tagged proteins were immobilized onto the sensor chip surface by amine coupling an anti-GST antibody to the surface of the chip. For this, firstly the chip was activated by 0.4 M 1-ethyl-3-(3 dimethylaminopropyl)carbodiimide hydrochloride and 0.1 M Nhydroxysuccinimide. The polycloncal antibody (Abcam) was passed over the chip surface at 30 µg.mL−<sup>1</sup> before the flow cells were blocked with 1 M ethanolamine. The running buffer used for the immobilization contained 50–150 mM NaPO4, 150 mM NaCl and 1 mM DTT (pH 7.4), with levels reaching between 3289 RU and 6324 RU. The GST fusion proteins were passed over the active chip surface at 0.7–0.9 µM to achieve final immobilization levels of 838-1283 RU. GST alone was immobilized on the reference flow cell (462-616 RU) as a negative control to allow for double referencing. Lyophilized G7-18NATE was resuspended in running buffer containing 150 mM NaPO4, 150 mM NaCl and 1 mM DTT (pH 7.4) and injected over the chip surface for 60 s at 30 µL.min−<sup>1</sup> in triplicates. Due to machine error some injections were removed before analysis due to presence of spikes that rendered the sensorgrams unable to be interpreted. The interpretable data were analyzed using Scrubber2.0 (BioLogic Softward, Campbell, ACT, Australia) and GraphPad Prism.

### SH2 Domain pY Microarray

The microarray was conducted by the Protein Array and Analysis Core at the MD Anderson Cancer Center (University of Texas, USA) using standard protocols. Briefly, lyophilized G7-18NATE-PB (100 µg) was resuspended in PBST and incubated with the nitrocellulose slide pre-spotted with GST-fusion SH2 domains (full list provided as Supplementary Material). Each SH2 domain was spotted in duplicate.

### RESULTS

### The Unique Grb7-SH2 BC2 Amino Acid Engages E3 in the G7-B1 Peptide

We recently characterized the interaction between the Grb7-SH2 and a G7-18NATE derivative peptide, G7-B1 (B for bicyclic) (Gunzburg et al., 2016). In the course of our investigation, we solved the X-ray crystal structure of the Grb7-SH2/G7-B1 complex to 1.6 Å (PDB ID: 5EEQ). This revealed an interesting feature potentially underlying binding specificity of the Grb7 targeting peptides. We identified that an arginine at the BC2 position (Arg462) in the Grb7 BC loop reaches up and hydrogen bonds with the G7-B1 E3 backbone carbonyl and forms a salt bridge with the E3 sidechain carboxylate (**Figure 1B**; Gunzburg et al., 2016).

A sequence alignment of the SH2 domains of Grb2, Grb7, Grb10, and Grb14 revealed that an arginine at this BC2 position is unique to Grb7 with Grb10, Grb14, and Grb2 possessing a serine at the equivalent position (**Figure 1C**). Although the Arg462/E3 interaction has not been clearly observed in other Grb7-SH2/peptide crystal structures (PDB ID:3PQZ Ambaye et al., 2011b; PDB ID 4X6S Watson et al., 2015; Gunzburg et al., 2016), we postulated that this interaction might occur transiently and help to define the molecular basis of binding specificity of G7-18NATE for Grb7.

### Arg462 Contributes to the Binding Affinity between Grb7-SH2/G7-18NATE

To test this hypothesis, we mutated the Grb7-SH2 R462 (BC2) amino acid to a serine (the corresponding residue in the Grb2, Grb10. and Grb14 SH2 domains). Using SPR, we directly compared the binding of the cyclic peptide G7-18NATE to the Grb7-SH2 wild-type (WT) and the mutated Grb7-SH2 (R462S). The Grb7-SH2 proteins were prepared as GST fusions to enable efficient immobilization on the sensor chip surface via anti-GST antibodies. The G7-18NATE peptide was flowed over the sensor chip surface and the measured response at equilibrium used to construct binding curves that were fitted by a single-site binding model.

Upon injection of G7-18NATE, equilibrium was rapidly reached for both WT Grb7-SH2 (**Figure 2A**) and the mutated R462S Grb7-SH2 (**Figure 2B**), followed by rapid dissociation after the 60 s injection was complete. Excellent fits by a 1:1 interaction model were obtained with a R <sup>2</sup> of 0.99 for both interactions. The binding curves indicated that the R462 of the BC loop did indeed contribute to the binding affinity of the G7-18NATE/Grb7-SH2 interaction. While G7-18NATE bound to WT Grb7-SH2 with a K<sup>D</sup> of 2.43 µM, the R462S mutant had a 3.3-fold reduction in binding affinity for G7-18NATE, with a K<sup>D</sup> of 7.97 µM (**Figure 2C**; **Table 1**). This suggested that the BC2 arginine does indeed engage with the peptide ligand, and therefore may impart some of the specificity of Grb7 for its substrates.

#### The Corresponding Grb2 Mutation Is Insufficient to Confer Binding to G7-18NATE

We next postulated that the opposite mutation at the BC2 position in the Grb2-SH2 (serine to arginine) could confer binding of the Grb2-SH2 to G7-18NATE. This would indicate the degree to which Arg462 in Grb7 is responsible for binding specificity. We therefore generated the Grb2-SH2 S90R (BC2) mutant protein using site-directed mutagenesis and tested its binding to G7-18NATE using the same SPR experimental procedure described above. As shown in **Figure 3A**, there was

Grb7-SH2 (A) or mutated (R462S) Grb7-SH2 (B) with the concentrations of peptide tested displayed on the right of the sensorgram; (C) The corresponding binding curves were calculated from the responses at equilibrium and fitted by a 1:1 interaction model. All data points from the repeated experiments are displayed in the binding profiles.

TABLE 1 | SPR investigations into the specificity of G7-18NATE for the Grb7-SH2.


#Equilibrium dissociation constants were calculated from fits by a single-site binding model. Errors displayed are the standard error of the fits.

only a low binding response to the Grb2-SH2 WT following injection of G7-18NATE over the chip surface, reflecting low binding affinity as previously established for Grb2-SH2 compared to Grb7-SH2 (Gunzburg et al., 2012). The binding to G7-18NATE by Grb2-SH2 S90R, however, also gave rise to low responses, though slightly higher than seen for Grb2-SH2 WT (**Figure 3B**). Excellent fits were obtained by a single-site binding model (R <sup>2</sup> = 0.99), however the calculated equilibrium dissociation constant was higher than the highest peptide concentration measured (125 µM; **Figure 3C**). Therefore, we could not reliably determine differences in the K<sup>D</sup> between WT Grb2-SH2 and the mutated Grb2-SH2. This demonstrated, nevertheless, that the incorporation of the arginine at the BC2 position was not sufficient to significantly increase the binding of Grb2-SH2 to G7-18NATE.

### G7-18NATE Is Specific for the Grb7-SH2 Compared to a Panel of SH2 Domains

In the course of our analysis of G7-18NATE and its derivatives, specificity was assessed by comparing the binding of the peptides to the SH2 domains of Grb2, Grb7, and Grb10, and in some instances Grb14-SH2 (Watson et al., 2015; Gunzburg et al., 2016). While Grb2 shares the same consensus pYXN binding motif, Grb10 and Grb14 share significant sequence identity (approximately 70%) with Grb7-SH2 (Daly et al., 1996). We therefore considered that these were the most closely related SH2 domains for assessment of specificity for Grb7-SH2 using SPR experiments.

In order to determine whether there were other SH2 domains that G7-18NATE could bind to that we had not yet encountered, we utilized the pY reader protein microarray at the Protein Array and Analysis Core (MD Anderson Cancer Center). Here, a biotinylated G7-18NATE derivative (G7-18NATE-PB) was tested for binding against 79 SH2 domains (a complete list of tested SH2 domains is provided as Supplementary Material). The G7-18NATE derivative also contained a shortened Penetratin sequence (displayed as a schematic in **Figure 4A**), and the presence of this sequence has been previously demonstrated not to interfere with G7-18NATE binding to the Grb7-SH2 (Ambaye et al., 2011a; Watson et al., 2017). As shown in **Figure 4B**, it was clear that G7-18NATE-PB preferentially binds to the Grb7-SH2 over the suite of SH2 domains. Although this array was not quantitative, the level of fluorescence was found to be significantly higher for the positions containing the Grb7-SH2, indicative of peptide binding. This was not due to differences in the amount of protein blotted on the array, as similar fluorescence levels were detected when probed with an anti GST antibody (**Figure 4B**).

There was no observation of fluorescence at the positions corresponding to Grb2, Grb10, or Grb14, consistent with our previously published SPR experiments (Gunzburg et al., 2012). Fluorescence was observed, although at much lower intensity, at the positions corresponding to a previously untested SH2 domain—that of SH2D2A. A sequence alignment of the SH2 domains of Grb7 and SH2D2A revealed that SH2D2A also contains a serine at the BC2 position and, strikingly, a leucine at the βD6 position (**Figure 4C**). Although this is not the only SH2 domain in the array that posseses a leucine at the βD6 position, it is suggestive that leucine at this position is also determining feature for binding the G7- 18NATE peptide and its derivatives. Thus, as well as further confirming the specificity of the ligand G7-18NATE for the Grb7- SH2 domain, this experiment has also provided further insights into the defining features of Grb7 that affect ligand binding selectivity.

#### DISCUSSION

SH2 domains are frequently found in signaling pathways regulating critical processes that are perturbed in aggressive diseases like cancer. Due to this, and their well-characterized

binding mode, a number of SH2 domains have been targeted for the development of novel therapeutics (Kraskouskaya et al., 2013; Morlacchi et al., 2014). However, with over 110 SH2 domains in the proteome, ensuring selectivity for the target SH2 domain is a significant challenge in the development of specific and potent inhibitors. Grb7 is a signaling adaptor protein that has been specifically targeted through the development of cyclic peptide inhibitors that are able to inhibit proliferation and migration in breast cancer cell lines and reduce tumor metastasis in a mouse model of pancreatic cancer (Tanaka et al., 2006; Pradip et al., 2013). The cyclic peptide, G7-18NATE, has been demonstrated to be specific for the Grb7-SH2 domain over the closely related SH2 domains of Grb2, Grb10, and Grb14, and this specificity has, to date, been attributed to the Grb7 specific leucine at the βD6 position (Janes et al., 1997; Gunzburg et al., 2012). Here, we have investigated whether additional amino acids also contribute to the specificity of Grb7 for its ligands.

Based on the high-resolution X-ray crystal structure of the Grb7-SH2/G7-B1 complex, we identified an arginine in the Grb7-SH2 BC loop that mediates additional hydrogen bond interactions with the inhibitor peptide. We identified that this arginine is unique to Grb7 compared to the SH2 domains of Grb2, Grb10 and Grb14 (where there is a serine at the comparable position) and considered that this interaction could contribute to the specificity for Grb7 of all the G7-18NATE derived peptide ligands. Using mutagenesis we henceforth determined that Arg462 contributes to the binding affinity between Grb7- SH2 and G7-18NATE. The converse mutation in the Grb2-SH2 domain did not confer binding to G7-18NATE, suggesting other amino acids also define Grb7 specificity. We also probed a panel of 79 SH2 domains with a biotinylated version of G7-18NATE and identified that the peptide is highly specific for the Grb7- SH2 domain. The only other SH2 domain that showed detectable binding by G7-18NATE was that of SH2D2A. Interestingly, while SH2D2A does not possess an arginine at the BC2 position it has a leucine at the βD6 position.

The initial observation that Arg462 may contribute to the binding interaction came from the Grb7-SH2/G7-B1 X-ray crystal structure (PDB ID: 5EEQ). In this structure, the Arg462 side-chain extends and engages the E3 backbone carbonyl in G7-18NATE, as well as the E3 sidechain carboxylic acid. The former of these interactions is likely to be more influential to binding. In previously described crystal structures of cyclic peptides bound to the Grb7-SH2 domain, the E3 sidechain is positioned in various rotamer conformations whereas Arg462 is frequently observed in the proximity of the E3 backbone carbonyl (Gunzburg et al., 2016). Furthermore, peptide ligands without glutamic acid at the E3 position have also been shown to bind the Grb7-SH2 domain (Ambaye et al., 2011b). This was initially identified using a constrained phage display library, enriched for a CX1FX2GYDNX3X4X<sup>5</sup> motif, whereby tryptophan was selected in approximately 50% of clones at the corresponding E3 position. Thus, G7-18NATE derivatives that are modified at the E3 position are predicted to maintain the hydrogen bond interaction with the Grb7 Arg462 (BC2). It should be noted that the formation of this single hydrogen bond, that may only occur transiently (since it is not observed in all crystal structures), is consistent with the relatively small increase in affinity that is conferred. A 3.3-fold change in K<sup>D</sup> corresponds to approximately only 1 kcal/mol change in ∆G, which is less than expected for a well formed hydrogen bond.

The G7-18NATE peptide and its derivatives are constrained in a β-turn conformation with extensive intra-molecular hydrogen bonds. A hydrogen bond between the F2 backbone carbonyl and the Y5 backbone amine positions the E3 backbone carbonyl for engagement with the Arg462 (BC2) sidechain. This suggests that the Arg462/substrate interaction may only occur in ligands that are naturally restricted in a β-turn conformation. To date, no Xray crystal structures have been solved of Grb7 in complex with an in vivo binding partner (such as FAK or HER2) to determine whether or not Arg462 (BC2) contributes to the interaction with these molecules, and defines Grb7 selectivity for natural substrates.

We have also described how the introduction of the arginine mutation to the Grb2-SH2 at the BC2 position was insufficient to confer binding by the G7-18NATE peptide (although it may have made a contribution to binding to Grb10 or Grb14 that are more closely related to Grb7). This suggests that additional amino acids underlie the specificity of G7-18NATE for Grb7- SH2 over Grb2-SH2. This is supported by our microarray results where the only SH2 domain other than Grb7 to show detectable binding to G7-18NATE was that of SH2D2A, an SH2 domain that contains a leucine at the βD6 position. This finding is consistent with previous reports that report the contribution of leucine at the βD6 position to the specificity of Grb7. When Leu481 is mutated to glutamine (the corresponding amino acid in Grb14), binding to the HER2 receptor is lost, and conversely in Grb14, the glutamine to leucine mutation enables HER2 binding (Janes et al., 1997). The X-ray crystal structure of the Grb7-SH2/G7-18NATE complex revealed how a leucine at this position enhances the interaction (Ambaye et al., 2011b). Leucine at the βD6 position is not, however, sufficient for G7- 18NATE binding to SH2 domains, as other SH2 domains in the microarray with leucine present at this position showed no ability to be bound by G7-18NATE. Clearly other amino acid residue differences contribute to binding specificity such as the βE4 methionine or EF4 glutamine that are unique to the Grb7-SH2 domain.

Thus, in this study we have demonstrated that Grb7 Arg462 contributes to G7-18NATE binding, but that specificity is mediated by multiple factors including the leucine at the βD6 position. We have also established that the G7-18NATE peptide is exquisitely specific for Grb7-SH2 over other SH2 domains. Together this provides insight for the development of even more potent ligands that target the Grb7-SH2 domain, and demonstrates that SH2 domains, despite their common structural features, are readily able to differentiate between ligands and thus have high potential as targets for therapeutics development.

#### AUTHOR CONTRIBUTIONS

GW was responsible for the detailed experimental design, protein overexpression and SPR experiments, figure and manuscript preparation. WL undertook the cloning of SH2 domain mutants and was also involved in protein expression, purification and SPR experiments. MG provided SPR expertise and training. JW designed the conceptual framework for the experiments, oversaw the interpretation of the results and finalized the manuscript with the other authors.

#### ACKNOWLEDGMENTS

We wish to thank the Monash Micromon DNA sequencing facility, and Danielle Cini and the MD Anderson Cancer Center for assistance with the pY microarray. This work was supported by a grant from the National Health and Medical Research Council (APP3236668) awarded to JW.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmolb. 2017.00064/full#supplementary-material

#### Watson et al. Selectivity for Grb7-SH2

#### REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Watson, Lucas, Gunzburg and Wilce. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Structure and Interactions of the TPR Domain of Sgt2 with Yeast Chaperones and Ybr137wp

Ewelina M. Krysztofinska<sup>1</sup> , Nicola J. Evans <sup>1</sup> , Arjun Thapaliya<sup>1</sup> , James W. Murray <sup>2</sup> , Rhodri M. L. Morgan<sup>2</sup> , Santiago Martinez-Lumbreras <sup>1</sup> and Rivka L. Isaacson<sup>1</sup> \*

*<sup>1</sup> Department of Chemistry, King's College London, London, United Kingdom, <sup>2</sup> Department of Life Sciences, Imperial College London, South Kensington, United Kingdom*

Small glutamine-rich tetratricopeptide repeat-containing protein 2 (Sgt2) is a multi-module co-chaperone involved in several protein quality control pathways. The TPR domain of Sgt2 and several other proteins, including SGTA, Hop, and CHIP, is a highly conserved motif known to form transient complexes with molecular chaperones such as Hsp70 and Hsp90. In this work, we present the first high resolution crystal structures of Sgt2\_TPR alone and in complex with a C-terminal peptide PTVEEVD from heat shock protein, Ssa1. Using nuclear magnetic resonance spectroscopy and isothermal titration calorimetry, we demonstrate that Sgt2\_TPR interacts with peptides corresponding to the C-termini of Ssa1, Hsc82, and Ybr137wp with similar binding modes and affinities.

#### Edited by:

*Piero Andrea Temussi, University of Naples Federico II, Italy*

#### Reviewed by:

*Filippo Prischi, University of Essex, United Kingdom Doriano Lamba, Consiglio Nazionale Delle Ricerche (CNR), Italy*

> \*Correspondence: *Rivka L. Isaacson rivka.isaacson@kcl.ac.uk*

#### Specialty section:

*This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences*

Received: *28 July 2017* Accepted: *21 September 2017* Published: *11 October 2017*

#### Citation:

*Krysztofinska EM, Evans NJ, Thapaliya A, Murray JW, Morgan RML, Martinez-Lumbreras S and Isaacson RL (2017) Structure and Interactions of the TPR Domain of Sgt2 with Yeast Chaperones and Ybr137wp. Front. Mol. Biosci. 4:68. doi: 10.3389/fmolb.2017.00068* Keywords: Sgt2, TPR, carboxylate clamp, NMR, CSP, x-ray crystallography, ITC

### INTRODUCTION

Transient interactions between proteins confer functional versatility upon a range of cellular processes, including protein modification, transport, folding, and cell signaling pathways (Perkins et al., 2010). The co-chaperone, Small glutamine-rich tetratricopeptide repeat (TPR) protein alpha (SGTA), is involved in the decision to target various misfolded and mislocalized proteins into their appropriate pathways, upstream of either insertion to the endoplasmic reticulum (ER) or degradation (Hessa et al., 2011; Leznicki and High, 2012; Wunderley et al., 2014; Casson et al., 2016; Shao et al., 2017). SGTA interacts with many proteins and forms transient complexes with chaperones, membrane targeting proteins, and members of the ubiquitin/proteasome system (UPS; Rodrigo-Brenni et al., 2014; Leznicki et al., 2015; Krysztofinska et al., 2016; Thapaliya et al., 2016).

The yeast ortholog of SGTA, Sgt2, is best understood in the context of post-translational insertion of tail-anchored (TA) proteins into the ER membrane. The majority of membrane proteins undergo targeting to the endoplasmic reticulum in a co-translational process mediated by the signal recognition particle (SRP) as the nascent peptide chain emerges from the ribosomal tunnel. However, TAs are a special case of membrane proteins with obscured targeting signals at the extreme C-terminus. Therefore, their membrane delivery occurs post-translationally via the Guided Entry of Tail-anchored proteins pathway (GET; Schuldiner et al., 2008; Rabu et al., 2009; Borgese and Fasana, 2011; Hegde and Keenan, 2011). Sgt2 captures TA substrates after they are released from the ribosome and passes them on to the Get3 ATPase, the central targeting complex, in a process mediated by the Get4/Get5 heterodimeric scaffolding complex (Chartron et al., 2010; Wang et al., 2010; Simon et al., 2013; Mateja et al., 2015). This is followed by subsequent TA-protein release at the ER membrane, assisted by the Get1/Get2 heterodimeric membrane receptor complex (Wang et al., 2010; Mariappan et al., 2011; Vilardi et al., 2014). A new pathway in yeast has recently been discovered which is suggested to be the back-up mechanism in the event of GET system failure (Aviram et al., 2016). This involves three proteins, named Snd1, Snd2, and Snd3 (for SRPindependent targeting), which have possible roles in targeting substrates to the ER translocation machinery Sec61 (Aviram et al., 2016).

Importantly for its role, Sgt2 associates with several heatshock proteins such as Hsp104, Hsc82 (yeast ortholog of Hsp90), and Ssa1/Ssa2 (yeast orthologs of Hsp70), which can bind directly to its central TPR domain (Liou and Wang, 2005). The Hsp70 and Hsp90 chaperones are important parts of the cellular machinery for protein folding, maturation and structural stability (Richter and Buchner, 2001; Pratt and Toft, 2003). They often associate with co-chaperones containing multiple copies of TPR domains (Frydman and Hohfeld, 1997; Pratt, 1997) which help them to facilitate correct folding of client proteins (Wang et al., 2010; Morgan et al., 2015). It has also been proposed that SGTA regulates the ATPase activity and folding rates of Hsp70 (Angeletti et al., 2002).

Recently, Sgt2 was reported to interact with Ybr137wp, a protein of uncharacterized function that is specific to yeast (Yeh et al., 2014). The Sgt2 TPR domain binds to the C-terminal end of Ybr137. Ybr137wp is thought to be a decamer both in its crystal form and in solution (Yeh et al., 2014). The function of Ybr137 is linked to the GET pathway where it is able to rescue the TA protein delivery-defect caused by a GET system that is impaired under starvation conditions (Yeh et al., 2014). However, the exact role for Ybr137wp in the TA targeting mechanism is not understood.

Sgt2 contains an N-terminal dimerization domain (Liou and Wang, 2005; Simon et al., 2013; Tung et al., 2013), followed by the conserved, central TPR domain and a glutamine rich region toward the C-terminus. The N-terminal domain of Sgt2 can directly bind the ubiquitin-like (UBL) domain of Get5 and facilitate the handover of TA substrates downstream onto GET pathway components for membrane delivery (Chartron et al., 2012; Simon et al., 2013; Darby et al., 2014). The C-terminal domain is predicted to be flexible based on SAXS experiments and, is structurally uncharacterized (Chartron et al., 2012). These domains of both Sgt2 and SGTA bind hydrophobic substrates including the TMDs of TA-proteins (Dutta and Tan, 2008; Wang F. et al., 2011; Leznicki et al., 2013). TPR domains typically consist of three or more tandem repeats of a loosely conserved 34 residue motif (Lamb et al., 1995; Smith, 2004). Each tandem motif is formed of two antiparallel α-helices. TPR domains are well-known for mediating protein-protein interactions (Das et al., 1998). The structure of the human SGTA TPR domain was determined previously by X-ray crystallography (Dutta and Tan, 2008) and, like the yeast ortholog, has been reported to interact directly with Hsp70/Hsp90 chaperones, the proteasomal subunit Rpn13 and a variety of disease-related proteins (Buchanan et al., 2007; Dutta and Tan, 2008; Roberts et al., 2015; Thapaliya et al., 2016). Moreover, the TPR domain structure (including some additional linker residues at the C-terminal) of the Sgt2 homolog from Aspergillus fumigatus was solved by crystallography (Chartron et al., 2011).

Several TPR domains of various proteins are known to interact with Hsp70/Hsp90 peptides. The high-resolution structures of such complexes were reported for HOP TPR1 with an Hsp70 derived peptide, HOP TPR2A with an Hsp90-derived peptide (Scheufler et al., 2000) and CHIP TPR with an Hsp70 C-terminal peptide (Zhang et al., 2005; Wang Q. et al., 2011). The common mode of interaction involves the formation of a carboxylate clamp where both the side-chain and main-chain terminal carboxylate groups of the C-terminal aspartic acids of these peptides form salt bridges with conserved arginine residues within the groove of the cochaperone TPR domains. Currently there is no structure of the Sgt2 TPR domain from Saccharomyces cerevisiae.

In this study, we report the first high resolution X-ray structure of the Sgt2\_TPR at 1.55 Å and also in complex with the PTVEEVD peptide corresponding to the C-terminus of Ssa1, at 2.0 Å. In addition, we have characterized the interaction between Sgt2\_TPR and C-terminal protein fragments of Ssa1/Ssa2 and Hsp82 (Hsp70 and Hsp90 in mammals respectively) and Ybr137wp using isothermal titration calorimetry (ITC) and Nuclear magnetic resonance (NMR).

### RESULTS

### Overall Sgt2\_TPR Structure

The structure of Sgt2\_TPR was determined by molecular replacement and refined to 1.55 Å resolution (**Figure 1A**, PDB: 5LYP; data collection and refinement parameters in **Table 1**). The coordinates of SGTA\_TPR (PDB: 2VYI) were used as a search model since there is high structural homology with Sgt2\_TPR (57% sequence identity; **Figure 1B**).

All residues were built into the electron density map except for the C-terminal Val229 and the solvent-exposed sidechains of Glu93 and Asp94. The final model also contains 90 water molecules and a single BO<sup>4</sup> ion from the crystallization condition. The TPR domain of Sgt2 consists of three TPR repeats, comprising six almost identical α-helices and a C-terminal "capping" helix (α1 = A96-N115; α2 = Y118-V131; α3 = A136-L149; α4 = Y152-I165; α5 = F170-183Q; α6 = P186- E200; α7 = E206-L225) connected by short loops and arranged in a antiparallel fold homologous to that of SGTA\_TPR. A structural overlay with the equivalent human domain is shown in **Figure 1A** (RMSD of 1.13 Å over 135 Cα).

### Complex Structure of Sgt2\_TPR with the C-Terminal Peptide of Ssa1

Initially it was not possible to form a crystal complex of Sgt2\_TPR (93–229) with the C-terminal of Ssa1 due to the flexible N-terminal and C-terminal ends of symmetry-related molecules in the crystal occluding the binding interface. However, producing a shorter Sgt2\_TPR (96–225) construct by removing three residues (EDD) from the N-terminus and four (EKTV) from the C-terminus resulted in successful crystallization of the Sgt2\_TPR/PTVEEVD complex (**Figures 2A–C**, PDB: 5LYN, statistics in **Table 2**).

Two copies of the TPR domain were present in the asymmetric unit due to non-crystallographic symmetry (Figure S1A). All

Sgt2\_TPR residues could be built into electron density maps in chain A and chain B. The two peptides (chains C and D) were partially occupied and were built with care. They were modeled into electron density (**Figure 2A**) and then verified by producing a simulated annealing omit map (Figure S2). The overall structures of the two chains, A and B, are very similar in their backbones (RMSD of 0.77 Å over 135 Cα), with deviations observed for the R171 sidechains possibly due to their flexibility and significant differences in the modeled C and D peptides (Figures S1A,B). The electron density was ambiguous for peptide C at chain B, especially for Pro1 with some electron density appearing in the Fo-Fc map which could be suggesting the presence of another atom. However, neither a zinc ion nor water molecule could be fitted. Nevertheless, all PTVEEVD (1–7) residues were successfully modeled. The occupancy for the peptide chains (C and D) was refined to an Rfactor of 0.158 and an Rfree of 0.202, and both converged to occupancy 0.93. The final model also contained 148 water molecules, nine Zinc ions and a single BO<sup>4</sup> ion, which was present in the M9 medium we used for protein expression. Zinc ions were added at peaks of the phased anomalous difference map (DANO).

The interaction between the Sgt2\_TPR and the PTVEEVD peptide from Ssa1 is mostly driven by the formation of a two-carboxylate clamp. Most of the electrostatic interactions between the TPR domain and the peptide occur in the Cterminal EEVD region and anchor the peptide in place. Peptide chains C and D, whilst overlapping, show slight conformational differences with an RMSD of 1.24 (**Figure 2B**). PDBePISA highlights this difference showing a binding surface area of 474.7 Å<sup>2</sup> between chain B and C, and 514.3 Å<sup>2</sup> between chain A and D, and a difference in solvation energy of binding of −0.8 1iG kcal/mol and −3.7 1iG kcal/mol respectively. In the interface between chain A and D direct backbone contacts involve hydrogen bond formation between the carboxamide sidechains of Asn141 and Asn110 in Sgt2\_TPR, and the sidechain and backbone of the terminal Asp7 of the Ssa1 peptide. Moreover, the sidechain amine of Lys 106 binds to the same carbonyl sidechain of Asp7. The guanidinium group of Arg171 forms a salt-bridge with the carbonyl main chain of Asp7 and forms an additional internal contact with Tyr169. The Arg175 sidechain interacts with the carbonyl main chain of Glu4 and Glu5 of the peptide (**Figure 2C**). In addition, the N-terminal of the peptide is involved in hydrophobic and van der Waals interactions. Phe178 and Tyr181 contribute to creating hydrophobic pockets and interact with the aliphatic part of the Pro1, Thr2, and Val3 residues. Met113 makes a hydrophobic contact with Val6 of the peptide (Figure S3). The "two-carboxylate clamp" binding mode is characteristic for TPR domains interacting with the conserved C-terminal IEEVD and MEEVD motifs of Hsp70 and Hsp90 chaperones, respectively (Scheufler et al., 2000; Zhang et al., 2005). Comparing the Sgt2\_TPR/PTVEEVD binding interface with a previously published complex of HOP TPR1/GPTIEEVD (Hsp70-derivative; Scheufler et al., 2000) shows that the PTVEEVD peptide occupies the same position at the Sgt2 TPR groove as GPTIEEVD. The peptides overlap apart from a difference in the conformation of the main chain

TABLE 1 | Data collection and refinement statistics of Sgt2\_TPR.


*Statistics for the highest-resolution shell are shown in parentheses.*

*CC*\* *is a derived quantity that links data and model and estimates the correlation of an observed data set with the underlying true signal.*

and sidechain of the terminal Asp7 of GPTIEEVD (**Figure 2B**). The alternative Asp7 conformation however overlaps with the Asp7 conformation in our chain C peptide in the complex structure. Comparing the two copies of the TPR-bound peptide indicates that, in the interface between chains B and C, the main difference is in the backbone of the N-terminal of peptide C. This is manifested through small variations in the orientation of Pro1, Thr2 and Val3 sidechains, changing hydrophobic interactions (particularly between Val3 and Phe178 of the TPR), and new electrostatic interactions between Thr2 and Asp211. These differences are not driven by the binding interface of Sgt2\_TPR, as the sidechains of A and B largely adopt the same conformation, except for Arg171 due to its inherently flexible sidechain (Najmanovich et al., 2000). Notably, the Arg171 backbone NH was missing in the HSQC spectrum, which is a common feature of flexible sidechains in intermediate exchange on NMR timescales. The change in orientation of chain B Arg171, creates a salt bridge with the sidechain of Glu4. In addition, slight conformational changes in chain C Asp7 facilitate formation of hydrogen bond between the hydroxyl group of Tyr169 and the Asp7 peptide sidechain and the interaction between the sidechains of Asn141 and Asn110 of Sgt2\_TPR and the main chain of Asp7 (Figure S1B).

#### Interactions of the TPR Domain of Sgt2 with Yeast Chaperones and Ybr137wp

To understand the molecular details of the interactions between Sgt2\_TPR and the C-terminal fragments of Ssa1 (PTVEEVD), Hsp82 (MEEVD), and Ybr137wp (SLEEDLNLD) we also performed NMR and ITC experiments. We acquired a complete set of NMR triple resonance experiments using the longer construct of Sgt2\_TPR (93–229) to facilitate the full backbone assignment (BMRB Accession Number: 27044). All residues were assigned except for Arg171. Reciprocal chemical shift perturbation (CSP) experiments were carried out by titrating the unlabeled peptides PTVEEVD, MEEVD and SLEEDLNLD into <sup>15</sup>N-labeled Sgt2\_TPR (93–229) up to a six-fold molar excess of PTVEEVD and MEEVD and a five-fold excess of SLEEDLNLD. The NMR backbone assignment of Sgt2\_TPR allowed us to identify the residues involved in the interactions in all three titrations (Figure S4, S5A, S6A). The CSP analysis exhibited a similar pattern for all three peptide titrations (Figure S7) and showed binding in a fast exchange regimen, with a number of peaks shifting non-linearly suggesting the formation of an intermediate during the titration. We analyzed the Sgt2\_TPR CSPs (**Figures 3A–C**) by applying a titration cut-off at 3 molar equivalents of the peptide for all peaks and then dividing them into two groups. The first group consisted of peak shifts between 0 and 3 molar equivalents and was named "first event," and the second group, called "second event," comprised peak shifts that occurred between 3 and 6 (or 5 in the case of SLEEDLNLD) molar equivalents (marked as red and black arrows respectively in **Figure 3B**, Figure S5B, S6B). The CSP of Sgt2\_TPR/PTVEEVD binding for the "first event," shown on our x-ray structure is shown in **Figure 3A** on the left, and the "second event" on the right. The most perturbed resides for the "first event" correspond to the residues at the binding interface in the x-ray complex structure and/or the neighboring residues.

We also characterized the binding between Sgt2\_TPR (93–229) and PTVEEVD, MEEVD, and SLEEDLNLD peptides by ITC (**Figure 3C**). The ITC results indicate a similar binding affinity for all three complexes with dissociation constants (Kd) of 9.04 ± 0.05µM for Sgt2\_TPR/PTVEEVD, 2.95 ± 0.30µM for Sgt2\_TPR/MEEVD and 1.53 ± 0.05µM for Sgt2\_TPR/SLEEDLNLD. The favorable enthalpy and entropy values obtained from ITC suggest that all complex formations were driven by the establishment of both hydrogen bonds and hydrophobic interactions (**Figure 3C**).

### DISCUSSION

In this study we provide high-resolution X-ray structures of the free Sgt2\_TPR domain and its complex with the last seven amino acids of Ssa1 (Hsp70). We also assign the backbone of Sgt2\_TPR using NMR spectroscopy and characterize the interaction between Sgt2\_TPR and the extreme C-terminal fragments of Hsc82 (Hsp90) and Ybr137wp. Our structural data clearly show that Ssa1 binds to the TPR domain of Sgt2 via a carboxylate clamp mechanism and we can predict a similar mode of binding for Hsc82 and Ybr137wp from our consistent ITC and NMR data in all three systems. Analysis of the three proteinpeptide complexes using the PDBePISA interactive tool indicates similarities. In all three complexes Glu4 and Glu5 are predicted to be involved in formation of hydrogen bonds and the terminal,

Asp7, can form a salt bridge with Arg171, Arg175, and Lys106. In addition, in the case of the SLEEDLNLD peptide there is potential for Glu3 to also be involved in a hydrogen bond formation with Ser148. There is a binding surface area of 514.3 Å<sup>2</sup> between chains A and D, and a difference in solvation energy of binding of −3.7 1iG kcal/mol with PTVEEVD. In modeled examples of MEEVD and SLEEDLNLD there are binding surface areas of 420.8 and 579.5 Å<sup>2</sup> , and differences in solvation energy of binding of −2.7 1iG kcal/mol and −2.2 1iG kcal/mol, respectively. There are many examples in the literature of carboxylate clamp mechanisms, most of them connecting the C-termini of heat shock proteins with different co-chaperones (Carrigan et al., 2004; Prasad et al., 2010; Panigrahi et al., 2014), but there are also non-chaperone examples which include the recognition of the proteasomal protein Rpn13 by SGTA (Thapaliya et al., 2016) and the interaction of Sgt2 with Ybr137wp presented here.

All carboxylate clamp interactions studied so far, including our recent and past investigations, describe dissociation constants in the low micromolar range by ITC or SPR (Scheufler et al., 2000; Brinker et al., 2002; Worrall et al., 2008; Thapaliya et al., 2016). In addition our NMR data suggest the presence of an intriguing dual-event binding mode during titrations and a widespread perturbation along the whole TPR domain. A detailed analysis of the titration revealed that in the first event only residues in the binding interface appear perturbed, while in the second event the perturbation is not localized to a specific interface. A similar scenario had previously been observed in the Rpn13 interaction with SGTA TPR, where signals all over the TPR were affected upon titration (Thapaliya et al., 2016). This behavior appears conserved for carboxylate clamp recognition whether synthetic peptides or recombinant proteins were used for the titration experiments. It likely



*Statistics for the highest-resolution shell are shown in parentheses.*

*CC*\* *is a derived quantity that links data and model and estimates the correlation of an observed data set with the underlying true signal.*

relates to the fact that the TPR domain helices suffer a subtle contraction to enclose the peptide in the TPR groove. The crystallographic structures we have obtained clearly show a slightly more compact conformation of the TPR where helices 1 and 7 are closer to each other in the complex structure than in the unbound TPR (See Figure S8 for a structural alignment). This observation was previously reported for a longer TPR motif (Zeytuni et al., 2011) and proposed for the TPR domains of co-chaperones (Panigrahi et al., 2014), where it was suggested that changes in the curvature of the cradle structure by concerted movement of the helices may be necessary for ligand binding.

The orientation of Hsp peptides in TPR structures varies between proteins. The conserved clamp mode of interactions is consistent, but there are some differences observed for the N-terminal parts of the peptides, which is not surprising given that the carboxylate clamp is the fixed point of attachment. The PTVEEVD peptide adopts an extended conformation within the Sgt2\_TPR groove similar to that observed in the structure of the HOP TPR1/GPTIEEVD complex (Scheufler et al., 2000). In contrast, structures of GPTIEEVD/Chip TPR [PDB: 3Q49 (Wang L. et al., 2011) and PDB: 4KBQ Zhang et al., 2005] show the peptide in a curled conformation lining the groove. The structures also vary in orientations of the peptide Pro1. In comparison, we also observe differences between the Pro1, Thr2, and Val3 sidechains in the two chains of our Sgt2\_TPR/PTVEEVD complex structure allowing for some flexibility in the association between TPR and Hsp peptides at the same interface. Sgt2\_TPR serves as a binding interface for transient interactions with a variety of chaperones and other proteins. However, the extended conformation of the peptides and their position within the TPR groove allows for a widespread contact surface with TPR domains thus supporting the specific recognition of short amino acid stretches with sufficient affinity to bind (Figure S6). The preceding residues to EEVD are also important for the binding affinity and it has been reported that trimming the peptide sequence to EEVD only, reduced the affinity by at least 10 times (Scheufler et al., 2000). Furthermore, Sgt2 is a homodimer and it can target a broad range of substrates by binding more than one protein simultaneously and bringing them into closer proximity promoting interactions.

Little is known about the role of Ybr137wp in the GET pathway except that it binds to Sgt2 at the same binding interface as heat shock chaperones and that it can influence TA membrane insertion by mediating interactions between Sgt2 and chaperones. Previous ITC binding experiments reported that one full-length Ybr137w decamer is capable of binding to five Sgt2\_TPR dimers with a dissociation constant (Kd) of 1.38 ± 0.09µM (Yeh et al., 2014). This is almost identical to the ITC results we obtained for the association of Sgt2\_TPR with the extreme C-terminal nine-residue Ybr137wp-derived peptide (K<sup>d</sup> of 1.53 ± 0.05), suggesting that the SLEEDLNLD fragment is sufficient for the interaction. Moreover, it has been shown that removing ESLEEDLNLD from the C-terminal of Ybr137wp abolished the interaction, confirming that this flexible C-terminal region is also necessary for the interaction (Yeh et al., 2014).

Further work is required to define the distinct role of Ybr137wp in ER delivery of tail-anchored membrane proteins and examine whether there is any link between this protein and the recently discovered SND targeting pathway in yeast. This alternative to the GET and SRP mechanisms, is proposed to act as a back-up targeting system (Aviram et al., 2016). It involves three proteins, localized at the ER or in the cytoplasm, Snd1 (encoded by YDR186C), Snd2 (encoded by ENV10, also known as YLR065C) and Snd3 (encoded by PHO88, also known as YBR106W), working together in a joint targeting pathway (Aviram et al., 2016). The function of Ybr137wp is also linked with altering the defect in TA protein delivery and cell viability derived from impairment of the GET system under starvation conditions.

Future investigations will improve our understanding of Ybr137wp function which will shed light on the importance of the carboxylate clamp interaction with Sgt2 that we delineate here.

#### METHODS

#### Plasmid Preparation

Gene fragments encoding the Sgt2\_TPR (residues 93–229 and 96–225 for the shorter construct) from S. cerevisiae were PCR amplified from synthetic cDNA (Life Technologies) and cloned into the BamHI/XhoI restriction sites of a home-modified pET28 vector which encodes an N-terminal thioredoxin A fusion protein followed by a hexahistidine tag and tobacco etch virus (TEV) protease cleavage site.

#### Protein Production

All plasmids carrying Sgt2\_TPR were transformed into E. coli BL21 (DE3) strain. Typically, protein expression was induced by adding 0.3–0.5 mM isopropyl-β-D-thiogalactopyranoside (IPTG) to cultures at OD<sup>600</sup> ≈ 0.8, followed by overnight incubation at 18◦C. For <sup>15</sup>-N-labeled proteins, growth was carried out in M9 media supplemented with labeled ammonium chloride (>98 % <sup>15</sup>N, Sigma-Aldrich) and/or glucose (>99% U-13C, Sigma-Aldrich). Harvested cells were resuspended in lysis buffer [20 mM potassium phosphate, pH 8.0, 300 mM NaCl, 10 mM Imidazole, 250µM tris(2-carboxyethyl)phosphine (TCEP)], supplemented with 1 mM phenylmethylsulfonyl fluoride (PMSF), and lysed by sonication or using a cell disruptor (Constant Systems Ltd). Cell debris and insoluble material were removed by centrifugation and overexpressed protein recovered from soluble fractions was purified using nickel affinity chromatography (HisTrapTM HP 5 ml, GE Healthcare). Recombinant proteins were eluted with buffer containing 300 mM imidazole, then dialyzed against cleavage buffer (20 mM potassium phosphate, pH 8.0 and 300 mM NaCl) and digested with homemade TEV protease (≈100µ g/ml) at 4◦C overnight. After TEV cleavage, a second nickel affinity chromatography step was performed to remove fusion protein, histidine tags, undigested protein, and TEV protease; the desired protein was then recovered in the flow through and loaded into a HiLoad 16/60 Superdex 75 column (GE Healthcare), previously equilibrated in buffer containing 10 mM potassium phosphate pH 6.0, 100 mM NaCl and 250µ M TCEP or 20 mM Tris-HCl pH 7.5. Proteins were concentrated using Vivaspin concentrators with 5K cut-off (Sartorius Stedin) and sample purity and homogeneity was checked by SDS-PAGE, mass spectrometry and NMR. The lyophilized peptides: PTVEEVD (corresponding to Ssa1 C-terminal; residues 634–640), MEEVD (corresponding to C-terminal of Hsp82; residues 705–709) and SLEEDLNLD (corresponding to C-terminal of Ybr137wp; residues 171–179) were purchased from Alpha BioScience (Birmingham, UK) and resuspended in water or an appropriate buffer before use. All peptides were purified and verified by HPLC and mass spectrometry.

#### NMR Titrations

Sgt2\_TPR (residues 93–229) and peptides used for NMR were prepared in 10 mM potassium phosphate pH 6.0, 100 mM NaCl and 250µ M TCEP buffer. Typically, <sup>1</sup>H-15N HSQC experiments were recorded for each titration point at 25◦C and CSP calculated for each amide signal using the following formula, where 1 δ 1H and 1 δ 15N are the chemical shift differences for the same amide in its free and bound spectra (δ free-δbound ) and for proton and nitrogen values respectively:

$$
\Delta \delta^{av} = \sqrt{\left(\left(\Delta \delta\_{1H}\right)^2 + \left(\Delta \delta\_{15N} / 5\right)^2\right) \cdot 0.5}
$$

CSP results were mapped onto the structures using the PyMOL software.

#### NMR Experiments

Protein samples at concentrations between 500 and 3,000µM were prepared in 10% D2O (Sigma Aldrich), 10 mM potassium phosphate pH 6.0, 100 mM NaCl and 250µM TCEP buffer. All NMR experiments were acquired in 5 mm NMR tubes at 25◦C on Bruker Avance spectrometers operating at 500 and 800 MHz equipped with cryoprobes, controlled by the TopSpin 3.1 software package. Backbone assignments were carried out using 3D experiments [HNCO, HNCA, HN(CA)CO, CBCA(CO)NH, and CBCANH] for Sgt2\_TPR. All NMR spectra were processed with NMRPipe (Delaglio et al., 1995) and analyzed with CcpNMR Analysis (Vranken et al., 2005).

### ITC

ITC measurements were performed at 25◦C using an ITC-200 MicroCal microcalorimeter (GE Healthcare) following standard procedures (Darby et al., 2014). Proteins were prepared in 10 mM potassium phosphate pH 6.0, 100 mM NaCl, 250µM TCEP. In each titration, 20 injections of 2 µL of peptide solution at a concentration of 500µM, were added to Sgt2\_TPR (residues 93–229) at 50µM in the reaction cell. Integrated heat data obtained for the titrations, corrected for heats of dilution, were fitted using a nonlinear least-squares minimization algorithm to a theoretical titration curve, using the MicroCal-Origin 7.0 software package. 1H (reaction enthalpy change in Kcal/mol), K<sup>b</sup> (equilibrium binding constant per mole), and n (molar ratio between the proteins in the complex) were the fitting parameters. The reaction entropy, 1S, was calculated using the 1G = −RT·lnK<sup>b</sup> (R = 8.314 J/(mol·K), T 298 K) and 1G = 1H −T 1S. Dissociation constants (Kd) are shown in the figure legends for each interaction.

### Crystallization

Sgt2\_TPR (residues 93–229) was concentrated to 35 mg/ml in 20 mM Tris-HCl pH 7.5 buffer and crystals were obtained after 4 days by the vapor-diffusion method at 293K using MRC plates in 0.1 M SPG, pH 6.0, 25% w/v PEG 1500 (PACT premier from Molecular Dimensions) at 20◦C (drop volume = 400 nl). In the case of Sgt2\_TPR (96–225)/PTVEEVD complex, protein/peptide complex was eluted from a HiLoad 16/60 Superdex 75 column and concentrated to 20 mg/ml in 20 mM Tris-HCl pH 7.5 followed by a further peptide addition (up to protein: peptide molar ratio of 1:3) prior to crystallization. The complex crystalized after 7 days by the vapor-diffusion method at 293 K in 0.2 M zinc Acetate, pH 7.2, 30% w/v PEG 3350. All crystals were harvested in reservoir solution with 20% glycerol before flash cooling in liquid nitrogen.

#### Data Collection and Processing

A complete dataset was collected from a single crystal on Diamond Beamline I04 for the free Sgt2\_TPR dataset and I03 for the Sgt2\_TPR/PTVEEVD complex using a Pilatus 6M-F detector and a single wavelength 0.920 Å. Data were processed using Xia2 (Winter et al., 2013) with scaling and merging using Aimless (McNicholas et al., 2011).

#### Structure Solution and Refinement

The crystal structure of Sgt2\_TPR (93–229) was determined by molecular replacement using Phaser (McCoy et al., 2007) with the human SGTA\_TPR crystal structure (PDB: 2VYI) used as a search model (57% sequence identity). This structure was then used as the search model to solve the Sgt2\_TPR (96– 225)/PTVEEVD complex. Both structures were refined using REFMAC5 (Murshudov et al., 2011) and PHENIX (Adams et al., 2010) with manual model building using Coot (Emsley and Cowtan, 2004). Free R-value of 4.9% was used as a crossvalidation method for Sgt2\_TPR and 4.8% for Sgt2\_TPR (96– 225)/PTVEEVD. Water molecules, Zn atoms and BO<sup>4</sup> atoms were fitted manually using Coot. The free Sgt2\_TPR structure was solved in space group P 21 21 21 with cell parameters: 36.86 Å (a) 50.76 Å (b) 67.12 Å (c) 90.00◦ (α) 90.00◦ (β) 90.00◦ (γ). The refined structure shows very good stereochemistry (statistics from the Molprobity (Chen et al., 2010, 2015) report are shown in **Table 1**). The complex structure was solved in space group P21 with cell dimensions: 45.49 Å (a) 61.09 Å (b) 55.25 Å (c) 90.00◦ (α)108.81◦ (β) 90.00◦ (γ) and statistics from the Molprobity (Chen et al., 2010, 2015) report are shown in **Table 2**).

#### AUTHOR CONTRIBUTIONS

EK, NE, AT, and RI conceived the ideas and designed experiments. EK, NE, and AT performed experiments. EK, NE, AT, JM, RM, SM, and RI analyzed data. All authors contributed toward writing the manuscript.

#### FUNDING

RI was supported by MRC New Investigator Research Grant: G0900936. RI is funded by BBSRC grants: BB/L006952/1 and BB/N006267/1. AT is funded by BBSRC grant: BB/J014567/1. NMR experiments were performed at the Centre for Biomolecular Spectroscopy, King's College London, established with a Capital Award from the Wellcome Trust. This work was supported by the Francis Crick Institute through provision of access to the MRC Biomedical NMR Centre. The Francis Crick Institute receives its core funding from Cancer Research UK

#### REFERENCES


(FC001029), the UK Medical Research Council (FC001029), and the Wellcome Trust (FC001029).

#### ACKNOWLEDGMENTS

The authors would like to thank Diamond Light Source for beamtime (proposals mx12579 and mx13597), and the staff of beamlines I03 and I04 for assistance with data collection. The dataset for the free Sgt2\_TPR was collected at I04 as part of proposal number mx12579 for the Imperial College London CSB BAG and datasets for Sgt2\_TPR/PTVEEVD complex were collected at I03 as part of proposal number mx13597 for King's College University BAG. We thank Dr. R. A. Atkinson (KCL) and Dr. G. Kelly (The Crick Institute) for his assistance with NMR experiments, and Dr. R. Yan (KCL) for insightful scientific discussions. We are grateful to Dr. J. M. Pérez-Cañadillas (Rocasolano Chemical Physical Institute, Madrid, Spain) for providing the home-modified pET28 vector and plasmid encoding TEV protease.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb. 2017.00068/full#supplementary-material

interaction with Get3. Proc. Natl. Acad. Sci. U.S.A. 107, 12127–12132. doi: 10.1073/pnas.1006036107


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Krysztofinska, Evans, Thapaliya, Murray, Morgan, Martinez-Lumbreras and Isaacson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Weak Molecular Interactions in Clathrin-Mediated Endocytosis

Sarah M. Smith, Michael Baker, Mary Halebian and Corinne J. Smith\*

*School of Life Sciences, University of Warwick, Coventry, United Kingdom*

Clathrin-mediated endocytosis is a process by which specific molecules are internalized from the cell periphery for delivery to early endosomes. The key stages in this step-wise process, from the starting point of cargo recognition, to the later stage of assembly of the clathrin coat, are dependent on weak interactions between a large network of proteins. This review discusses the structural and functional data that have improved our knowledge and understanding of the main weak molecular interactions implicated in clathrin-mediated endocytosis, with a particular focus on the two key proteins: AP2 and clathrin.

Keywords: clathrin, endocytosis, adaptor-protein, structural biology, molecular interactions

#### INTRODUCTION

#### Edited by:

*Rivka Isaacson, King's College London, United Kingdom*

#### Reviewed by:

*Eileen M. Lafer, University of Texas Health Science Center San Antonio, United States Aaron Neumann, University of New Mexico, United States*

\*Correspondence:

*Corinne J. Smith corinne.smith@warwick.ac.uk*

#### Specialty section:

*This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences*

Received: *21 August 2017* Accepted: *11 October 2017* Published: *14 November 2017*

#### Citation:

*Smith SM, Baker M, Halebian M and Smith CJ (2017) Weak Molecular Interactions in Clathrin-Mediated Endocytosis. Front. Mol. Biosci. 4:72. doi: 10.3389/fmolb.2017.00072* Biological processes are built on a complex interplay between proteins in the crowded, heterogeneous environment that exists within cells; and the functional protein interactions that are vital to these processes are often weak and transient. Insight into how such interactions are exploited in biological systems can help us understand how individual proteins contribute to functional networks and pathways.

One such pathway is clathrin-mediated endocytosis; a fundamental cellular process that serves to internalize cargo, that is—proteins or nutrients that need to be brought into the cell interior, and is implicated in numerous cellular functions including: nutrient uptake, membrane protein recycling, cell polarity, synaptic vesicle recycling and cell signaling. Defects in clathrin-mediated endocytosis have been linked to numerous pathological conditions such as Alzheimer's Disease, HIV/AIDS and hypercholesterolemia (Goldstein et al., 1985; McMahon and Boucrot, 2011; Zhang et al., 2011).

The main stages of clathrin-mediated endocytosis can be subdivided into 6 main steps: initiation, growth, stabilization, vesicle budding, scission and uncoating (summarized in **Figure 1A**). As the name suggests, this type of endocytosis is characterized by its reliance on a protein called clathrin which interacts with a large network of adaptor proteins during the formation of a clathrincoated vesicle and selection of cargo for internalization. Since clathrin cannot directly interact with the lipids or proteins of the plasma membrane (Maldonado-Báez and Wendland, 2006), adaptor proteins assist in the assembly of clathrin-coated vesicles by providing a link between clathrin and the membrane-bound cargo. The main adaptor protein that clathrin engages with at the plasma membrane is adaptor protein 2 (AP2). As well as binding to clathrin, AP2 also interacts with a significant number of binding partners which include receptors destined for internalization as well as other adaptor proteins that facilitate endocytosis (summarized in **Figure 1B**). AP2 is a member of a family of five heterotetrameric complexes. These complexes contain 4 types of subunit: two large (∼100 kDa), one medium (∼50 kDa), and one small (∼17 kDa).

Key stages in clathrin-mediated endocytosis, such as receptor recruitment and assembly of the clathrin coat, appear to rely on weak interactions that are based on recognition of short peptide sequences. This review discusses how these weak molecular interactions are exploited by the crucial endocytic components: AP2 and clathrin.

FIGURE 1 | (A) Assembly and disassembly of clathrin-coated pit. Adaptor proteins associate at the membrane through interactions with phosphoinositides. AP2 and clathrin associated sorting proteins (CLASPs), such as AP180, interact with these membrane moieties, and once bound to the membrane, subsequently recruit clathrin triskelions to initiate lattice assembly. Recruitment of other adaptor proteins (e.g., Eps15, epsin, CALM/AP180) is required for stable lattice growth and vesicle closure. Dynamin, assisted by actin polymerization when the membrane is under tension, drives membrane scission and coated-vesicle release. Hsc70, recruited by the J-domain protein auxillin, mediates clathrin uncoating and release of a free vesicle, primed to fuse with a target membrane. (B) Key components involved in the initiation of clathrin-mediated endocytosis. Sites of active endocytosis are characterized by the accumulation of the key components: adaptor proteins, cargo, lipids and clathrin. Extracellular ligands (gold) are internalized by virtue of their signal-motif-bearing transmembrane receptor (blue) being recognized and bound by AP2 (or CLASP) (purple) at the intracellular side of the plasma membrane. CLASPs and AP2 bind to the PIP2 moieties of the inner membrane (orange). These proteins also serve to recruit individual clathrin triskelions (green) to the active endocytic site where their subsequent polymerization results in the formation of the clathrin coat.

### AP2 INTERACTIONS—THE MOLECULAR BASIS OF RECEPTOR RECRUITMENT AND CLATHRIN COAT FORMATION

AP2 assists receptor internalization through several routes. It interacts directly with two types of internalization motifs (LL and (Y-X-X-8) (8 = hydrophobic residue) found within the cytoplasmic domains of integral membrane protein receptors via its σ (LL) and µ2 (Y-X-X-8) subunits (Ohno et al., 1995; Owen and Evans, 1998; Owen et al., 2001; Collins et al., 2002; Kelly et al., 2008; Jackson et al., 2010). It is also associated with receptors indirectly by binding to other adaptors which are themselves (directly) associated with particular receptors, e.g., LDL receptor with autosomal recessive hypercholesterolemia (ARH) and Gprotein coupled receptors (GPCRs) with arrestin.

The first receptor internalization motif to be identified was YXX8 (Ohno et al., 1995). Surface plasmon resonance (SPR) experiments revealed that the YXX8 motif binds to the µ2 subunit of AP2 with affinities between 10 and 70µM (Boll et al., 1996; Rapoport et al., 1997). Owen and Evans (1998) gave a structural explanation for the affinity between the aforementioned tyrosine-based motifs and AP2. A 2.7 Å crystal structure of the signal binding domain of µ2 (residues 158– 435) complexed with internalization signal peptides from EGFR (Sorkin et al., 1996) and TGN38 (Bos et al., 1993; Humphrey et al., 1993) revealed that hydrophobic pockets accommodate both the tyrosine and leucine residue of the sequence motif. Upon the target peptide binding, these pockets are positioned such that 3 additional H-bonds are made between the backbone of the peptide and the AP2, resulting in β-strand formation. A similar mechanism of increased binding affinity upon correct recognition of key side chains has also been shown in other cases (Lowe et al., 1997).

The tyrosine residue of the YXX8 sequence motif forms significant interactions with the binding pocket. For example there are hydrophobic interactions between the tyrosine ring and Trp<sup>421</sup> and Phe174. In addition, the tyrosine hydroxyl engages in a network of hydrogen bonds with Asp176, Lys203, and Arg<sup>423</sup> . The bulky, hydrophobic residue (8) at position Y+3 of the internalization motif is also a major determinant of µ2 binding (Ohno et al., 1995; Boll et al., 1996), and binds in a cavity lined with aliphatic residues (**Figure 2**). Leu, Phe, Met or Ile residues at the Y+3 position could be accommodated in such cavity owing to the size and flexibility of side chains in the pocket.

Collins et al. (2002) obtained the structure of the AP2 core complexed with polyphosphatidylinositol headgroup mimic, inositolhexakisphosphate (IP6); which revealed two potential polyphosphatidylinositol binding sites: one on α and one on µ2. Interestingly, the YXX8 binding motif (which localizes to the Cterminus of µ2) was occluded by part of the β2 trunk (**Figure 2**). This conformation of AP2 suggested to the authors a mechanism by which AP2 operates via an open or closed conformation in order to interact with motifs presented at the cell membrane.

This then raised the question of what the "open" conformation of AP2 might look like. Data showed that the distance between the end of a protein's transmembrane helix and a YXX8 motif requires only seven amino acids in order to confer efficient internalization (Rohrer et al., 1996). In the "closed" AP2 core structure revealed by Collins et al, the YXX8 binding site is ∼65 Å from the membrane surface; therefore, AP2 must undergo a significant conformational change not only to expose the YXX8 binding motif, but to also ensure that it is in close enough proximity to the transmembrane cargo.

In 2008, Kelly et al. (2008) revealed the "open" conformation of AP2 upon crystallization of its core region bound to a peptide from CD4 (T-cell cell-surface antigen protein). Analysis of the crystal structure showed that the peptide bound to the core region in an extended conformation, with the LL moiety shown to bind 2 adjacent hydrophobic pockets on the σ2 subunit (**Figure 3**). SPR experiments showed that the CD4 LL-motif bound to WT AP2 with a Kd of 0.85µM –considerably higher than the affinity previously shown for YXX8 bound to the AP2 core. Comparison of this ligand-bound, "open" crystal structure with the previously

published "closed" AP2 structure (Collins et al., 2002), showed that for the LL motif to bind, the N-terminus of β2 must be displaced from the surface of σ2, in order to expose the hydrophobic binding pocket (**Figure 3**).

Whilst the "unblocking" of the LL motif binding site was explained by minor conformational changes in AP2 core structure, (Kelly et al., 2008), the YXX8 motif-binding site remained blocked. As mentioned above, the AP2 core must undergo substantial conformational changes to permit binding of membrane-embedded YXX8-containing cargo. To gain molecular insight into the large conformational change of AP2, Jackson et al. (2010) solved the crystal structure of a form of AP2 whereby both LL- and YXX8-motif binding sites are occupied. Driven partly by the phosphorylation of Thr156 on µ2 (Ricotta et al., 2002), and the electrostatic attraction of the highly positive electrostatic surface of the C-terminal region of this domain (C-µ2) to the negatively charged lipid head groups of the membrane, C-µ2 moves to the orthogonal face of the complex, resulting in the LL-motif, YXX8-motif and phosphatidyl inositol-4,5-bisphosphate (PtdIns4,5P2) –binding sites becoming coplanar on the surface of AP2 and therefore suitably positioned for contacting various motifs and/or signals at the plasma membrane. The adoption of an "open" AP2 conformation would therefore cause the β2 subunit to move out of the way and no longer occupy the motif binding sites.

Revelation of the "open" and "closed" conformations of the AP2 core structure was a significant milestone, providing mechanistic insight into how this adaptor protein is able to interact with internalization motifs found on the cytoplasmic tails of receptors. We have so far discussed how these interactions occur at the membrane, but in order for internalization to occur, the coated vesicle itself must form. This process of coat formation is driven by interactions between AP2 and its network of binding partners which bind to the appendage (or "ear") domains of α2 adaptin and β2-adaptin. Here, weak interactions play a role as it was found that a number of adaptor proteins binding to the α2-appendage of AP2 did so via short linear motifs with weak binding affinities. An early crystal structure of this appendage revealed that the domain interface contains tightly packed and mostly hydrophobic residues. Hydrophobic surface potential analysis revealed a single candidate protein-binding sites that was centered around residue, W840 (Owen et al., 1999).

Three linear motifs were found to bind the α2-appendage domain. These were DP[FW] (Owen et al., 1999; Brett et al., 2002), FXDXF (Collins et al., 2002), and WXX[FW]X[DE] (Ritter et al., 2003; Jha et al., 2004; Walther et al., 2004). Peptides containing these linear motifs were shown to bind the α2-appendage with relatively low affinities: 120µM, 30– 50µM and 10µM, respectively (Owen et al., 1999; Edeling et al., 2006). Furthermore, structural studies showed that peptides corresponding to these motifs bound to the α2-appendage in an extended conformation (Brett et al., 2002; Mishra et al., 2004; Praefcke et al., 2004; Ritter et al., 2004; **Figure 4**). Both the DPF/DPW and FXDXF motif bind to the α2-appendage through an overlapping site in the platform subdomain (Brett et al., 2002), whereas the WXX[FW]X[DE] motif was shown to interact with the sandwich subdomain (Praefcke et al., 2004; Ritter et al., 2004; **Figure 4**). This additional, distinct peptide binding site on the sandwich subdomain of the α2-appendage could permit multiple different motifs to bind the appendage, or, could allow multiple motifs of the same type to simultaneously bind, which would increase the avidity of the interaction (Walther et al., 2004).

The β2-appendage domain of AP2 was shown to possess a very similar bilobal structure to the α-appendage (Owen et al., 1999; Traub et al., 1999), with an N-terminal sandwich subdomain that is rigidly attached to a C-terminal platform subdomain (Owen et al., 2000). Also, as with the α-appendage, there was a single patch of highly hydrophobic surface potential on the β2 appendage platform subdomain that indicated a potential ligand interaction site. Charged residues adjacent to the hydrophobic pocket (such as R834, K842, E849, R879, E902, R904, and K917), could provide specificity for ligand-motif binding where the strength of the interaction was predominantly derived from hydrophobic interaction(s). For example, an abundance of positively charged arginine residues could confer electrostatic complementarity to a ligand rich in negatively charged amino acid side chains.

The β2-appendage domain binds a group of proteins that also bind the α2-appendage domain: AP180, epsin and eps15. Sequence analysis has shown that there is virtually no sequence homology between these proteins/ligands, except that they each contain multiple DPF/W sequences so the authors proposed that the DØF/W motifs are likely to mediate binding to the β2-appendage (Owen et al., 2000). Interestingly, these proteins bind AP2 appendage domains with differing affinities:

β2-subunit to move outwards, resulting in its N-terminus being expelled, and consequently exposing the hydrophobic binding pocket and allowing the LL-containing peptide to bind. However, the µ2 subunit remains closely associated with the β2 subunit and is therefore unable to bind YXX8 motifs. α2 subunit—purple. µ2

subunit—blue. β2 subunit—orange. σ2 subunit green. PDB IDs: 2VGL from Collins et al. (2002) and, 2JKR from Kelly et al. (2008).

the α2-appendage binds appreciable amounts of amphiphysin and has epsin as its high affinity ligand (Owen et al., 1999; Traub et al., 1999); whereas the highest affinity ligand for β2 appendage domain is eps15, with amphiphysin showing no significant signs of binding (Owen et al., 2000). This suggests that the context of the DPF/W motifs is also a factor in the interaction of these proteins with the AP2 appendage domains.

Owen et al. (2000) also showed that the β2 appendage together with its hinge region, is able to bind clathrin and also displace AP180, epsin and eps15 that is already bound to the domain. The binding region identified on the β2 appendage is larger and more open than the α2-appendage binding domain, which may explain its ability to preferentially bind clathrin. In light of these data, the author's proposed a model for CCV formation: in the cytosol, distant from regions of active endocytosis (Gaidarov et al., 1999; Roos and Kelly, 1999), AP2 appendage domains bind to DØF/W motif-containing accessory proteins such as AP180, epsin or eps15. In this region, the clathrin concentration is low (Goud et al., 1985; Wilde and Brodsky, 1996; Gaidarov et al., 1999) and would therefore be unable to compete with the aforementioned accessory proteins for AP2 binding. Conversely, at sites of active clathrin-mediated endocytosis, high clathrin concentrations would enable clathrin to compete effectively with DPF/W motif-containing accessory proteins for binding to the β2-appendage of AP2. Once bound, clathrin would be able to polymerize and consequently form a lattice, and recruit more clathrin.

A subset of the AP2 appendage binding accessory proteins are also able to bind cargo. These proteins, termed clathrinassociated sorting proteins (CLASPs), increase the catalog of endocytic cargo that can be recruited by AP2 beyond transmembrane proteins bearing cytoplasmic internalization motifs. Two key examples of the use of CLASPS are low density lipoproteins (LDL) and GPCRs, which are internalized by the CLASP proteins, ARH or Dab2 (Traub, 2005; Maurer and Cooper, 2006), and β-arrestin (Lefkowitz and Shenoy, 2005), respectively.

The binding of ARH to AP2 is highly selective for the β2 appendage (He et al., 2002; Laporte et al., 2002; Mishra et al., 2002). The 1.6 Å crystal structure of a β2-appendage in complex with an ARH-derived peptide (252DDGLDEAFSRLAQSRT) (Edeling et al., 2006), revealed a completely different mode of interaction in comparison to all other known appendage ligands. The ARH peptide adopted an α-helical conformation which bound a deep groove on the top of the β2-appendage (**Figure 4**). Analysis of the helix-binding region showed that Leu262 of the ARH peptide (termed [FL] pocket) was accommodated in a hydrophobic pocket of the β2 platform subdomain. Phe259 of the ARH peptide fitted into an adjacent, complementary hydrophobic pocket on the β2-appendage which the authors denote the "[F] pocket," and the side chain of the residue Arg266 extends along a small channel on the surface of the β2 subdomain ([R] pocket) which forms hydrogen bonds with acidic residues Glu902 and Glu849. For these F, [FL] and [R] pocket interactions to occur, the α-helical motif must fit into its binding groove, therefore providing the specificity for binding. In agreement with this, a 2.5 Å crystal structure of the β2-appendage co-complexed with a Eps15 peptide and β-arrestin confirmed that the core motif for interaction with this AP2 appendage is: DxxFxxFxxxR, and exhibits an alpha helical conformation (Schmid et al., 2006).

Furthermore, β-arrestin, which has already been shown to bind the β2 appendage (Laporte et al., 2000; Kim and Benovic, 2002; Milano et al., 2002), displays significant sequence similarity with the β2-appendage binding motif of ARH at its C-terminus. Subsequent sequence analysis and mutagenesis of this C-terminal peptide region demonstrated the importance of the FXX[FL]XXXR motif in binding the platform subdomain of the β2-appendage (Edeling et al., 2006). Isothermal titration calorimetry (ITC) measurements confirmed that this region (383DDDIVFEDFARQRLKG) of β-arrestin binds the β2-appendage with a Kd of 2.6µM; a value very similar to the Kd of ARH peptide of 2.4µM (Mishra et al., 2005). Such affinity values are comparatively higher than those for a YXX8 motif binding to the µ2 subunit of AP2 (Boll et al., 1996; Rapoport et al., 1997).

What's more, protein database searching revealed that mammalian epsins 1 and 2 also possess FXX[FL]XXXR motifs located in their unstructured region, which ITC experiments confirmed to bind the β2-appendage (Edeling et al., 2006). Further analysis showed that epsin, ARH and β-arrestin also contain acidic residues N-terminal to the proximal phenylalanine and thus the β2-appendage binding motif is more accurately described as: [DE]nX1–2FXX[FL]XXXR.

Taken together, these data reveal fundamental differences in the mode of interaction between the β2 platform domain and CLASPs compared to other appendage-ligand interactions. Instead of numerous, avidity-based interaction motif repeats, β-arrestin, ARH and epsin contain only a single [DE]nX1– 2FXX[FL]XXXR motif, which adopts an α-helical conformation to bind the β2 appendage. Therefore, not only are there charge and hydrophobic components to the interaction between ligand and AP2, but extra specificity is conferred by the requirement that the CLASP motif folds into a helix with the interacting residues on one face of the helix.

In addition to high affinity interactions between AP2 appendages and accessory proteins bearing a single appendage binding motif, α2- and β2-appendages also engage in high avidity interactions. SPR experiments between immobilized α2- or β2-appendages and the motif domain of Eps15, showed very tight interactions such that an off-rate could not be measured (Schmid et al., 2006). It was assumed that these tight interactions were due to the presence of multiple appendage interaction sites in a single protein domain of Eps15. Therefore, if appendages are linked/bound to the same surface there is a high avidity for ligand interaction that is much stronger than the sum of individual affinities (Praefcke et al., 2004). So in the context of clathrin-mediated endocytosis, such an environment would be akin to "assembly-zones," where AP2 is clustered at the membrane, presenting multiple appendages that are available for accessory protein binding. Proteins with multiple appendage interaction sites will not only aid adaptor clustering, but the presentation of many juxta-positioned motifs leads to an increased affinity for the adaptor appendage. Therefore, individual weak affinity interactions between AP2 and its ligands can make significant contributions to protein-protein interactions, providing there are multiple copies.

Schmid et al. (2006) proposed that clathrin coated pit (CCP) formation proceeds as a result of high avidity interactions of accessory proteins being replaced by the weak interactions of the clathrin coat with adaptors; meaning that initially low affinity (and therefore readily reversible) interactions between cargo and adaptors, between adaptors and accessory proteins, and between accessory proteins and clathrin, are used to build the network.

#### CLATHRIN-ADAPTOR INTERACTIONS MEDIATED BY SHORT PEPTIDE MOTIFS

A pivotal step upon the recruitment of clathrin to sites of endocytosis is the interaction between individual clathrin triskelia and an array of accessory proteins that assist in the formation of a clathrin-coated pit. Clathrin has more than 20 binding partners and interacts with most of these via a 7 bladed beta-propeller domain at its N-terminus (**Figures 5A,B**). Interactions between the clathrin N-terminal domain (TD) and peptides corresponding to multiple binding motifs are in the micromolar range. Thus, weak interactions also feature in the role of clathrin as well as AP2 in endocytosis.

The mode of binding of adaptor proteins for clathrin was investigated by Dell'Angelica et al. (1998) using a combination of GST pull-down assays and mutagenesis. Through this they identified a segment of residues (SLLDLDDFN817–825) in the β3-appendage of AP-3 that contributed to binding to the clathrin TD. Subsequent sequence analysis also identified similar amino acid residues in other adaptor proteins that have been shown to mediate interaction(s) with clathrin; namely—amphiphysin II (Ramjaun and McPherson, 1998), segments of arrestin3 (Krupnick et al., 1997) and in the clathrin-binding region of β1 and β2 (Shih et al., 1995). Alignment of these sequences enabled the definition of a motif for clathrin-binding, comprised of acidic and bulky hydrophobic residues, L(L, I)(D, E, N)(L, F)(D, E), termed the "clathrin box" motif (**Figures 5B,C**). Although they vary between proteins, clathrin-box motifs are highly-conserved and are found in many proteins known to interact with the TD, for example—AP1, epsin, AP180 and amphiphysin (Shih et al., 1995; Drake et al., 2000; Kirchhausen, 2000).

FIGURE 5 | Location of the clathrin heavy chain N-terminal domain (TD) and the location of the adaptor binding sites. (A) Clathrin forms a polymerized lattice structure around the growing vesicle in concert with adaptor proteins. The functional monomer, the triskelion, is formed of a trimer of heavy chains (∼190 kDa, orange) with a smaller light chain (∼25 kDa Pink) located along the top edge near the dimerization domain. The TD (cyan) is the primary binding location for adaptor proteins, and is located on the inside of the cage closest to the plasma membrane. (B) The TD is a 7-bladed β propellar that has 4 known sites for binding adaptor proteins. Site 1 between blades 1 and 2 is known as the clathrin box site (L8X8[DE]), (turquoise); Site 2 situated in the center of the propellar is known as the W-box (PWXXW), (blue); Site 3 is the Arrestin-box ([LI][LI]GXL), (Pink); and Site 4 is the Royle Box which as yet has no defined interaction sequence, (purple). The numbers indicate the blade number. Peptides or protein bound with the 4 sites are indicated in the following four panels: (C) Clathrin box site of β2 adaptin (CGDLLNLDLG) bound to site 1; (D) βarrestin1L peptide (ALDGLLGG) bound to site 3; (E) amphiphysin peptide (TLPWDLWTT) bound to site 2; (F) an amphiphysin peptide bound to site 4. Structures are derived from PDB codes: 3IYV (A) (Fotin et al., 2004), 5M5R (B,C) (Muenzner et al., 2017), 3GD1 (D) (Kang et al., 2009), 1UTC (E) (Miele et al., 2004), 5M5T (F) (Muenzner et al., 2017).

A number of X-ray structures of the clathrin TD provided both its tertiary structure and the binding location of peptides from several different adaptor proteins, revealing 3 independent binding sites for clathrin binding partners on this relatively small (40 kDa) (ter Haar et al., 2000) domain.

Almost two decades ago, ter Haar et al. (1998) showed that the clathrin TD comprised a 7-bladed beta-propeller structure linked to a series of short alpha helices which formed the start of the clathrin "leg" region. ter Haar et al. (2000) then determined structures of complexes of clathrin TD with peptides derived from adaptors β-arrestin 2, and the β-subunit of AP3. The residues contacting the TD in both structures were consistent with the five-residue clathrin box motif identified previously: L8X8[DE] (where x denotes any amino acid, 8 denotes a bulky hydrophobic residue and [DE] is a glutamate or aspartate). Both structures revealed very similar peptide interactions with each peptide binding in an extended conformation in a groove between blades 1 and 2 of the propeller structure (**Figures 5B,C**). The sharing of the same binding site by both peptides was surprising given previous evidence suggesting that β-arrestin 2 and AP2 could bind at different sites on the TD (Goodman et al., 1997).

The situation became more complex when a 2.3 Å crystal structure of clathrin TD bound to a peptide of amphiphysin 1 revealed a second TD-binding motif, PWXXW (termed "the Wbox"), to bind at a site remote from the "clathrin-box" binding site (Miele et al., 2004; **Figure 5B**). The presence of a second motif-binding site had previously been suggested by biochemical data which indicated that the binding sequence, PWDLW, could bind to the TD without competing with the canonical L8X8[DE] clathrin-box motif (Ramjaun and McPherson, 1998; Slepnev et al., 2000; Drake and Traub, 2001).

Unlike the clathrin-box motif (which adopts an extended conformation when bound to the TD), the bound W-box was compact and helical, buried in a solvent-exposed cavity of complementary shape on the membrane-proximal "top" surface of the TD; a location spatially distinct from where the clathrinbox peptides bind (Miele et al., 2004; **Figure 5E**). Affinity measurements showed that the W-box peptide binds the TD with a similar affinity (Kd of 28µM) to that of the clathrin-box peptide (Kd of 22µM).

Finally, a third, spatially distinct adaptor binding site was identified by Kang et al. (2009), who showed that an extended surface loop of the arrestin 2 long isoform occupied a site between blades 4 and 5 of the TD, which binds peptides with motif [LI][LI]GxL– termed the "arrestin-box" (**Figures 5B,D**).

The above crystal structures revealed the location of adaptor protein binding sites to the TD; however, the role of these interactions in coat assembly has been difficult to define. The structures obtained so far are of peptides corresponding to clathrin-binding motifs co-crystallized with the clathrin TD. It would be interesting to know how TD binding to the full-length clathrin-binding domains of these proteins compares but this will be hard to achieve by crystallography (ter Haar et al., 2000; Miele et al., 2004) owing to the unstructured nature of these clathrin binding regions. AP180, implicated in clathrin assembly, is one such example. It has a 33 kDa N-terminal ANTH domain, which is involved in membrane binding; and a largely unstructured 58 kDa C-terminal region that is responsible for clathrin-binding and assembly. More specifically—the C-terminal region region binds to the TD of the clathrin heavy chain (Morgan et al., 2000), and self-homology analyses of this region showed that it contains 12 repeats, each ∼23 aa in length, and containing a single DLL/DLF sequence per repeat. The large number of clathrin binding motifs along the length of the AP180 sequence suggests that organization of AP180 binding to clathrin must go beyond a straight forward 1:1 interaction between a single DLL motif and the clathrin TD.

#### MORE COMPLEX INTERACTIONS—MULTIPLE DLL MOTIFS

The potential for complex binding interactions between clathrin binding motifs and clathrin TD led Zhuo et al. (2010) to further investigate the binding of DLL and DLF motifs in AP180 to clathrin TD. Zhuo et al. (2010) demonstrated that the DLL and DLF sequences within the clathrin binding site are critical for clathrin binding, and bind clathrin TD relatively weakly, with Kd values in the ∼2 × 10−<sup>4</sup> M range. The weak binding of these sites to the clathrin TD and the observation that chemical exchange kinetics are in the intermediate to fastexchange regimen (Schlosshauer and Baker, 2004) indicate that both association and dissociation rates for these interactions are rapid, with dissociation rates in the range of 2 × 10<sup>3</sup> and 4 × 10<sup>3</sup> s −1 , and association rates in the range of 1 × 10<sup>7</sup> to 2 × 10<sup>7</sup> M−<sup>1</sup> s −1 .

In light of these data, the authors were able to expand on previous models for how AP180 mediates clathrin coat assembly. AP180 binds to the membrane via its ANTH domain, resulting in its unstructured, flexible C-terminal region being exposed to the cytoplasm and available for binding any clathrin molecules encountered. Although the clathrin binding site of AP180 binds clathrin weakly with rapid dissociation rates, the likelihood of clathrin diffusing away is minimized given that each AP180 molecule contains up to 12 clathrin binding sites; therefore, if a clathrin molecule unsuccessfully binds one site, it is possible that it can interact with the many other clathrin binding sites. What's more, the rapid dissociation rates mean that each triskelion is able to move and reorient itself, enabling interactions with other triskelia to be established. Those clathrin-clathrin interactions that occur will determine both the geometry and stability of the clathrin lattice. In this way, weak binding by multiple clathrin triskelia to binding sites dispersed throughout the AP180 sequence allows efficient recruitment of clathrin to endocytic sites and dynamic assembly of the clathrin lattice. This example of weak, multi-valent binding in combination with intrinsic disorder of a protein binding partner is able to create a highly dynamic mode of protein-protein interaction.

#### COMPLEXITY OF BINDING TO MULTIPLE CLATHRIN TERMINAL DOMAIN SITES

The crystal structures of ter Haar et al. (2000), Miele et al. (2004), and Kang et al. (2009) suggested that clathrin-box motif, Wbox motif and arrestin splice loop 5 peptides bind uniquely to individual sites on the TD, respectively, giving a total of 3 binding sites. However, data published since then has suggested that such a situation is likely to be an oversimplification. For example, yeast epsin (Ent2p) was shown to bind a TD where clathrin-box and W-box binding sites were mutated. Additionally, deletion of the Ent2p C-terminal clathrin-box sequence eliminated Ent2p binding to the TD (Collette et al., 2009). Together, these data indicate that clathrin-box sequences are able to bind the TD at site(s) distinct from site 1 or 2. In fact, a fourth adaptor binding site was identified by Willox and Royle (2012). This study found that mutating all 3 binding sites did not block clathrin/AP2 mediated endocytosis in human cell lines whereas deleting the TD inhibited endocytosis. Using an in silico approach they were able to identify a conserved patch located ∼120◦ relative to the other binding sites. This site encompasses the end of strand d of blade 7 and the helical segment in the loop connecting blades 7 and 1 (**Figures 5B,F**). Mutating E11 to K on this 4th site, in combination with mutations to the other 3 sites, resulted in the same phenotype as shown by deletion of the TD. The identity of the motif conferring binding to this site remains undefined.

A deeper understanding of adaptor binding to the four, distinct adaptor-binding sites on the clathrin TD must account for observations that three out of four of the aforementioned binding sites can be mutationally eliminated without causing loss of CME (Willox and Royle, 2012).

In an effort to gain insight into the ambiguities regarding adaptor/accessory protein binding to clathrin TD, Zhuo et al. (2015) adopted a solution-based NMR approach to study the interaction of clathrin TD with clathrin-box peptides derived from AP2 adaptor protein and the accessory protein, AP180. Results showed that these peptides simultaneously bound the clathrin-box site, the W-box site and the β-arrestin splice loop site of a single TD with a similar, low affinity (Kd values in the range of 800–900µM). The high promiscuity and stoichiometry of binding of peptide to the TD could be a reflection of the functional redundancy of these sites, and could also be important for the dynamic reorganization of the clathrin TD during endocytosis.

In agreement with biochemical data that showed clathrin only precipitated in GST-binding assays upon immobilization of a high density of clathrin-box peptides to a GST-resin (Drake et al., 2000; Drake and Traub, 2001), the weak molecular interactions between clathrin-box peptides and the clathrin TD (Zhuo et al., 2015) means that multiple interactions are required for a stable association of adaptor/accessory protein with clathrin. Furthermore, these data also suggest that each TD can bind up to 3 such peptides, which not only increases the potential avidity of peptide-TD interaction, but also offers an explanation as to why individual binding sites in the TD can be mutationally eradicated without significantly compromising CME (Lemmon and Traub, 2012). Also, it has been proposed that weak molecular interactions between TD and peptides would facilitate the dynamic reorganization of clathrin during lattice assembly (Zhuo et al., 2010). The temporal regulation of this event and the fact that it involves transfer of clathrin between different adaptor and accessory proteins during the process of internalizing cargo (Drake et al., 2000; McMahon and Boucrot, 2011), enables the development of our understanding of clathrin-mediated endocytosis. The authors (Zhuo et al., 2015) propose that the differing affinities and number of clathrin binding sequences in an adaptor/accessory protein could be an important factor in aiding clathrin transfer: tighter binding, or more clathrin binding sequences could displace a protein that has weaker or fewer clathrin binding elements.

Muenzner et al. (2017) endeavored to investigate the suggested potential degeneracy of clathrin binding, (Willox and Royle, 2012; Zhuo et al., 2015) by resolving high resolution structures of clathrin TD complexed with cellular and viral peptide motifs. In contrast to previous crystallographic structures (ter Haar et al., 2000; Miele et al., 2004), where the co-complexed peptide was shown only to bind a single site of the clathrin TD, the structures resolved by Muenzner and colleagues demonstrated that 2 distinct sequence motifs (arrestin-box and the clathrinbox), can bind the arrestin box binding site of clathrin TD.

Furthermore, the authors also note that the sequences capable of binding the Royle box are somewhat variable (amphiphysin I clathrin-binding motif peptide: ETLLDLDFLE and hepatitis D virus large antigen peptides: SDILFPADS and SPRLPLLES), preventing the unambiguous identification of a consensus binding sequence. Thus, they suggest that the model of "1 consensus motif binds a single peptide-binding site on the clathrin TD" may require revision since binding could rely on the peptide's structural environment upon contacting the TD (Muenzner et al., 2017).

The fact that a clathrin TD is capable of simultaneously binding multiple adaptors emphasises the dynamic nature of clathrin-adaptor interactions. The authors (Muenzner et al., 2017) go on to discuss that differences in the affinity of proteinprotein interactions come as a result of differing rates of dissociation (Pollard, 2010). Weak molecular interactions, such as those between clathrin and its adaptor proteins (Shih et al., 1995; Zhuo et al., 2015), are on the order of approximately 1 per second (Pollard, 2010). Given that the timeframe of complete clathrin-coated pit formation and disassembly is ∼90 s (Loerke et al., 2009), we would anticipate that adaptors undergo rapid cycles of binding and dissociation from clathrin, which would enable the recruitment of many different adaptor proteins to a given clathrin TD. Also, the promiscuity of clathrin motif binding would permit a single adaptor protein that contains multiple clathrin interaction motifs (e.g., AP180 Zhuo et al., 2010), to simultaneously bind multiple sites on clathrin TD, consequently increasing the affinity of the interaction.

## CONCLUSION

This review of the structural and functional experiments that investigated the binding between cargo, adaptor and accessory proteins, as well as clathrin, has demonstrated the different ways weak molecular interactions are exploited in clathrin-mediated endocytosis.

AP2 is a key regulatory factor in clathrin-mediated endocytosis, and its activation commences upon recruitment and subsequent low-affinity interactions with PtdIns4,5P2 of

FIGURE 6 | A diagram detailing the AP2 and clathrin binding motifs present in a number of adaptor proteins with diverse structure and function. Motifs are listed in the key on the right along with their binding location on AP2 or clathrin. The other domains detailed in the figure are as follows: ANTH, AP180 N-Terminal Homology Domain, Arr, Arrestin Domain; CC, coiled-coil domain; EH, Epsin Hand; ENTH, Epsin N-Terminal Homology Domain; J, J-domain; SH3, SRC Homology domain 3; PH, Plextrin Homology domain; PR, Proline Rich domain; PTB, Phosphotyrosine Binding domain; N-BAR, N-terminal Bin/Amphipysin/Rvs domain; UIM, Ubiquitin Interacting Motif.

the plasma membrane (Höning et al., 2005). Phosphorylation of Thr156 of the µ2 domain causes AP2 to adopt an "open" conformation allowing it to interact with the large network of other accessory proteins and clathrin.

The evolution of AP2 to adopt "open" and "closed" conformations allows this protein to spatially and temporally control different stages of cargo internalization, from the point of cargo recognition and sorting, to downstream clathrin-coated pit formation.

Together, the AP2 core and its appendage domains bind their respective ligands via motifs (summarized in **Figure 6**), which engage in weak molecular interactions. This network of low-affinity protein interactions provides not only high avidity and specificity, but also reversibility of protein interactions that allows for rapid exchange of binding partners, accounting for the dynamic nature of CME in vivo (Avinoam et al., 2015).

Whilst low-affinity interactions are common throughout clathrin-mediated endocytosis, AP2 is also able to engage in high affinity interactions with some accessory proteins. These interactions are conferred by the requirement that a ligand motif binds in an α-helical conformation, as opposed to an extended conformation that is more commonly used. These two different modes of interaction between AP2 appendages, along with the multiple variants in binding motifs, explain how AP2 is able to act as a central hub for interactions between adaptors and clathrin. However, a greater understanding of how these motifs interact with and compete with each other for AP2 would greatly enhance our understanding of how adaptors and cargo are spatially and temporally regulated in vivo.

The clathrin TD has been identified as the major interaction site on clathrin for adaptor proteins. Initial crystallographic structures identified 3 potential binding sites for specific adaptor motifs (**Figure 5B**), suggesting that adaptors would bind to discrete locations on the TD (ter Haar et al., 2000; Miele et al., 2004; Kang et al., 2009). However, recent studies both in vivo and in vitro have identified a 4th binding site (**Figures 5B,F**) and present evidence that the binding of these motifs is much more degenerate than previously expected, with peptides of a given sequence found to bind in more than one location (Willox and Royle, 2012; Zhuo et al., 2015; Muenzner et al., 2017). As with AP2-adaptor interactions, a greater understanding of the relative affinities of these motifs or associated proteins would give us better insight into how adaptor recruitment is regulated in CME.

To conclude, clathrin-mediated endocytosis is a versatile pathway, not just in terms of the diversity of cargos that can be internalized, or in the large number of accessory and adaptor proteins used, but it also in the pivotal role of weak molecular interactions orchestrating and controlling the internalization of specific cargo and its delivery to early endosomes.

## AUTHOR CONTRIBUTIONS

All authors (SS, MB, MH, and CS) fulfill the "author criteria" of: Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; and Drafting the work or revising it critically for important intellectual content; and Final approval of the version to be published; and Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

### ACKNOWLEDGMENTS

SS and CS are funded by BBSRC Research grant BB/N008391/1. CS was also supported by a Royal Society Leverhulme Trust Senior Research Fellowship LT150036. MH is funded by the MRC grant number MR/J003964/1. MB thanks the BBSRC Midlands Integrative Biosciences Training Partnership (MIBTP) for support.

### REFERENCES


long-splice isoform of synaptojanin 1, SJ170. J. Biol. Chem. 279, 2281–2290. doi: 10.1074/jbc.M305644200


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Smith, Baker, Halebian and Smith. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Use of NMR to Study Transient Carbohydrate—Protein Interactions

#### Pedro M. Nieto\*

Glycosystems Laboratory, Instituto de Investigaciones Científicas, cicCartuja, CSIC/USE, Seville, Spain

Carbohydrates are biologically ubiquitous and are essential to the existence of all known living organisms. Although they are better known for their role as energy sources (glucose/glycogen or starch) or structural elements (chitin or cellulose), carbohydrates also participate in the recognition events of molecular recognition processes. Such interactions with other biomolecules (nucleic acids, proteins, and lipids) are fundamental to life and disease. This review focuses on the application of NMR methods to understand at the atomic level the mechanisms by which sugar molecules can be recognized by proteins to form complexes, creating new entities with different properties to those of the individual component molecules. These processes have recently gained attention as new techniques have been developed, while at the same time old techniques have been reinvented and adapted to address newer emerging problems.

#### Edited by:

Rivka Isaacson, King's College London, United Kingdom

#### Reviewed by:

Gyula Batta, University of Debrecen, Hungary Peter J. Simpson, Imperial College London, United Kingdom

> \*Correspondence: Pedro M. Nieto pedro.nieto@iiq.csic.es

#### Specialty section:

This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences

> Received: 08 January 2018 Accepted: 26 March 2018 Published: 11 April 2018

#### Citation:

Nieto PM (2018) The Use of NMR to Study Transient Carbohydrate—Protein Interactions. Front. Mol. Biosci. 5:33. doi: 10.3389/fmolb.2018.00033 Keywords: NMR, protein–carbohydrate interaction, STD-NMR, transfer-NOESY, transient interactions

#### INTRODUCTION

Glycobiology, which can be defined as the study of the structure, chemistry, biosynthesis, and biological functions of glycans and their derivatives, is fundamental in many critical biological processes (Varki, 1993, 2017). A challenge of structural glycobiology is to reconcile the large variety of 3D shapes that carbohydrates can assume with the high degree of selectivity found between closely related glycans (DeMarco and Woods, 2008). In addition, the binding constants between individual carbohydrates and proteins are generally low. Under these conditions, most of the interactions are in the fast exchange region of the NMR chemical shift scale and a single averaged set of signals is detected. This behavior can be exploited for structural elucidation. The rationale behind transient or transfer experiments is that if the equilibrium is fast enough and the property is dependent on the correlation time, it is possible to observe properties characteristic of the bound state in the averaged ligand signals, due to the larger correlation time of the bound form (Hyde et al., 1980; Feeney, 2000; Meyer and Peters, 2003).

In order to understand the evolution of NMR methods for the study of glycans, the scarcity of simple methods for obtaining isotopically-labeled glycans needs to be kept in mind. When such methods are available, the use of isotope-filtered/edited NMR experiments should be considered. When there is no such alternative, however, transfer techniques need to be employed. These can be applied to systems in fast equilibrium in the relaxation time scale and to properties weighted by the correlation time. Under these circumstances, the average of the NMR property obtained is biased toward the minor population of the bound carbohydrate due to its larger correlation time, even in the presence of an excess of free ligand. This is the basis of the transfer NOE technique and its analogs (Hyde et al., 1980; Ni, 1994). The advent of methodological improvements, monodimensional analogs with better signal to noise ratio, has effectively relaunched the transfer techniques (Stott et al., 1997). Other parameters that can be used to study molecular complexes in a transient state are T1sel (selective longitudinal relaxation time), or T<sup>2</sup> (transversal relaxation time) and ligand-detected <sup>1</sup>H relaxation dispersion because they depend on the correlation time, which is a function of the size of the molecule. Theoretical descriptions of T<sup>1</sup> and T<sup>2</sup> and of their application to the analysis of binding constants have been reviewed by Stockman and Dalvit (2002) and Peng et al. (Lepre et al., 2004).

A further experiment that benefits from the difference in the correlation times is the STD (saturation transfer difference) experiment (Mayer and Meyer, 1999; Meyer and Peters, 2003). This experiment uses the faster transfer of saturation from the receptor, caused by its long correlation time, that is further transferred to the ligand within the complex and that finally is observed in signals corresponding to the free-state. Since its formulation, the STD technique has experienced a rapid growth, both in its range of applications and in the number of labs using it. Finally, WaterLOGSY uses protonated water molecules to distinguish those that are in fast exchange from those that are buried in the interface with the receptor and therefore reflects the interaction surfaces between receptor and ligand.

### NMR CHARACTERISTICS OF FREE CARBOHYDRATES

In general, the principal sources of glycans are via chemical synthesis or natural product isolation. In each case, they are almost completely restricted to non-isotopically-labeled compounds. As, from an NMR viewpoint, there are few experimental restrictions for unlabeled-carbohydrates, the analysis conditions must nevertheless be carefully documented and quantified.

### Coupling Constants

Three-bond H – H coupling constants depend on the dihedral angles, and consequently they are key to determining the ring conformation. In general, most hexoses are monoconformational and are generally in a <sup>4</sup>C<sup>1</sup> or <sup>1</sup>C<sup>4</sup> chair conformation. Due to the cyclic nature of the sugars, many redundant interprotonic three-bond coupling constants are available and it is relatively simple to define the conformation from the Cremer–Pople polar coordinates (Cremer and Pople, 1975). Residual dipolar coupling constants provide also data about the relative orientation of two vectors in partially oriented media (Martin-Pastor and Bush, 2001). In carbohydrates, few examples of conformational equilibrium are known; however, one such example is the iduronate residue, which can exist in at least three conformations, <sup>1</sup>C4, <sup>4</sup>C1, and <sup>2</sup> SO, easily distinguishable by their coupling constants. Generally, internal iduronate residues are found in fast equilibrium between <sup>1</sup>C<sup>4</sup> and <sup>2</sup> S<sup>O</sup> (**Figure 1A,B**) (Mulloy et al., 1993). This equilibrium is fundamental in the molecular recognition events in which heparin is involved (Canales et al., 2005) and also has been studied by residual dipolar coupling (Jin et al., 2009)

### NOE, Distances: The ISPA (Isolated Spin Pair Approach)

NOE (and other related parameters such as ROE, transverse NOE, and off-resonance ROE)-based distance constraints are often used to study the glycosidic angle behavior and can thus be used for the determination of the 3D shapes of saccharides. The scarcity of experimental restrictions in the analysis of carbohydrates implies a more accurate quantification of the NOE. Fortunately, for small molecules, distances are easily quantifiable from the NOE by means of an analysis of the growing NOE curve at several mixing times (Macura et al., 1986). Then, assuming equal motional behaviors for an undetermined distance and for a known reference, the initial NOE growth rate (cross-relaxation rate) relationship is inversely proportional to the sixth power of the interprotonic distance (Neuhaus and Williamson, 2000).

Applications of this methodology have increased since the arrival of modern selective 1D NOESY methods based on the double pulsed field gradients spin echo technique (Stott et al., 1997). This scheme has advantages over the bidimensional analogs; it is faster and has better resolution. Moreover, as the protons are relaxing by T1sel instead of T1, the growing curves have longer linearity, and thus better-fitting and more precise results can be obtained (Hu and Krishnamurthy, 2006; Munoz-Garcia et al., 2013). The technique has also been applied to off-resonance ROESY, so as to obtain reference-independent distances. Using this approach, it has been possible to obtain the interglycosidic distances for a strongly anisotropic heparin hexasaccharide in which the interprotonic correlation time depends upon the orientation of the vector relative to the axis of the molecule (Munoz-Garcia et al., 2013).

### NMR CHARACTERISTICS OF BOUND SMALL GLYCANS

The ability of NMR to observe and analyze specific signals of individual atoms, focusing analysis on a particular aspect of a complex without the need to solve its entire structure, is the most important consideration.

A bound ligand has NMR properties that are governed by its geometry in the bound state and these can therefore be different to those of the ligand in its free state. In a situation of fast equilibrium in the NMR timescale, the magnitude of the parameter that is measured corresponds to the weighted average of the NMR properties of the free and the bound states. For the special case of correlation time-dependent properties, the weight of the properties derived from the complexed ligand is so large that it is predominant in the averaged values, even at low molar ratio (Ni, 1994; Neuhaus and Williamson, 2000).

#### Transfer NOE (TR-NOE)

Most conformational analyses of sugars have been based on NOE data. For the determination of the structures of ligands bound to receptor proteins, transfer NOE (TR-NOE)-based experiments are an excellent tool (Ni, 1994). In suitable conditions, a fast equilibrium in the NMR timescale is established between the free ligand (a small to medium-sized fast-tumbling molecule with a short correlation time) and the ligand bound to the high-molecular-weight receptor (a large and slow-tumbling molecule with a long correlation time) and therefore behaving as the receptor, with a large and fast-growing negative NOE. This averaged situation is reflected in the observation of a large-molecule negative NOE in the signals of the ligand (see **Figures 1C,D**). In general, modifications of the magnetic field strength or changes in the sample conditions such as temperature

or viscosity can be used for fine adjustment if the free ligand is not clearly in the fast-tumbling regime.

In such a case, it is possible to obtain the bound ligand NOE by the analysis of the ligand signals (Ni, 1994; Neuhaus and Williamson, 2000). As a result, the qualitative analysis considering the changes between the free ligand conformation and that of the bound ligand is direct. On the other hand, the quantitative interpretation of the spectrum suffers from the potential effects of spin diffusion. In this situation, complete relaxation matrix calculations (CORCEMA or Mardigras (Borgias and James, 1990; Jayalakshmi and Krishna, 2002) should be used.

In principle, any experiment that measures interprotonic relaxation rates can be used for the analysis of the structure of the bound ligand. Thus, ROESY has been used to improve the experiment, taking advantage of its lower sensitivity to spin diffusion (Poveda et al., 1997). As the ROESY experiment suffers from some spurious effects, transverse-ROESY (Hwang and Shaka, 1992), is better suited as it is less prone to spinlock artifacts. In addition, the quiet-NOESY experiment has been used. In this experiment, in the middle of the mixing time a biselective signal inversion is applied and a selective band inversion pulse is included, which avoids diffusion through the use of inverted spins (Zwahlen et al., 1994).

Special mention should be made of the monodimensional analogs (Stott et al., 1997). By selecting a single signal, not only is a better resolution achieved due to the change from 2D to 1D but a new dependence on T1sel is created and the NOE exhibits a longer linearity, a feature that is particularly useful when precise distances are required (Hu and Krishnamurthy, 2006).

#### STD-NMR

The STD-NMR (Mayer and Meyer, 1999) experiment consists of the difference between two experiments. It is undertaken with low-power irradiation during the relaxation delay on a sample in equilibrium that comprises a large excess of ligand(s) (from 10:1 up to 1,000:1) relative to the receptor (a large molecule), which is present at low concentration (nM to µM). In one experiment, recorded as the reference, dummy irradiation far from the signals is performed, while in the other experiment some signals from the receptor are selectively irradiated. If binding occurs, magnetization from the receptor is transferred to the ligand through close contacts with the receptor in the complex, and the effect will emerge in the difference experiment (Mayer and Meyer, 1999; Meyer and Peters, 2003).

Since its formulation, several applications have been explored for STD-NMR. The first application proposed for the technique was ligand screening; this consists of the deconvolution of a library of potential binding molecules by using several compounds at once in each experiment (Henrichsen et al., 1999). The molecules that bind better will show STD signals. Epitope mapping was the next application proposed. The experiment can be used to determine the relative importance of diverse regions of the ligand in the interaction with the receptor (Mayer and Meyer, 2001). The technique can also be used for affinity constant evaluation. The first application in which calculation of a binding constant was described used the Cheng and Prusoff equation, since the STD values are biased by the relaxation properties of the ligand proton being considered (Meyer and Peters, 2003). Latterly, the initial growth rate of the STD affinity factor has been used, in order to avoid this dependence upon the relaxation of the protons, see **Figure 1E** (Angulo et al., 2010). One of the potential complications in the STD is peak overlap that can make difficult the interpretation of the results. One of the solutions relies on adding another transfer step, then the peaks will be spread in two dimensions: STD-TOCSY (Mayer and Meyer, 1999) and STD-HMQC (Vogtherr and Peters, 2000). Similarly, spectral editing can also be performed using other nuclei <sup>15</sup>N or <sup>19</sup>F (Kövér et al., 2007; Diercks et al., 2009).

Using methods based on the relaxation matrix, it is possible to calculate the theoretical STD of a given complex. Thus, a quantitative analysis of the ligand within the complex can be carried out using a 3D structure. Based on a 3D model of the complex, using some NMR parameters and considering the saturation time and the irradiation frequency among others, CORCEMA-ST can estimate the different values of saturation transferred from the protein to a particular proton of the ligand (Jayalakshmi and Krishna, 2004, 2005). An example is given in **Figure 2**. This methodology has also been extended to the case of multiple binding modes within the same binding site (Angulo et al., 2008) and to the iterative refinement of a complex structure (Jayalakshmi and Krishna, 2005).

### Selective T<sup>1</sup> (T1sel) and T<sup>2</sup>

T1sel and T<sup>2</sup> also depend on correlation times and in principle they can be used to study association processes for fast equilibrium between ligand and receptor, observing the

ligand signals at low concentration of the receptor. These values can be used for the analysis of binding constants by performing a titration (García-Jiménez et al., 2017). A series of T1sel /T<sup>2</sup> values are obtained at different ratios of ligand to receptor, and by fitting these to the variation in the ligand/receptor ratio, K<sup>D</sup> or IC<sup>50</sup> can be extracted. Recently the application of <sup>1</sup>H relaxation dispersion measurements has been described for weak binding systems using both T<sup>2</sup> and T1<sup>ρ</sup> (Moschen et al., 2015; Trigo-Mouriño et al., 2017).

#### WaterLOGSY

The WaterLOGSY experiment (Dalvit et al., 2000) takes into account explicitly the water molecules surrounding the macromolecular complex. It is easy to implement and consists essentially of a selective excitation of the water molecules in a sample prepared in H2O, followed by a NOESY mixing period and a water suppression scheme for detection (Dalvit et al., 2001).

This experiment traces the mechanisms that follow the magnetization from excited water to ligand protons. It can be a direct transfer of magnetization from bulk water to ligand, through exchangeable protons from the receptor or via water molecules buried into the receptor interface. These molecules have tumbling times closer to the receptor than to the bulk water or ligand. Due to the slow correlation times of the bound ligand or the buried water molecules the peaks through the two later mechanisms will have the opposite sign as the direct transfer of magnetization of the free compounds (Stockman and Dalvit, 2002).

The WaterLOGSY competition experiment is used for ligand library deconvolution, in which several relay pathways are used constructively, transferring bulk water magnetization to the ligand in a selective manner (Dalvit et al., 2002). In this experiment, the resonances of non-binding compounds appear with opposite sign, corresponding to NOE in the narrowing limit, and they tend to be weaker than those of the interacting ligands.

#### NEW DEVELOPMENTS

A recent application of the WaterLOGSY is the LOGSY titration (Geist et al., 2017). From the slopes of the WaterLOGSY vs the protein concentration curves, LOGSY-titration factors can be obtained, that can be used for epitope mapping alternative to STD-AF0. In this case, a great advantage is that isolated water molecules are always in the narrowing limit of the NOE curve.

Complementary information can be also obtained using paramagnetic tags attached to carbohydrates. At least two methods can be exploited: PCS, where changes in chemical shifts are produced and PRE, where additional relaxation is caused. In both cases the effects are proportional to the distance to the paramagnetic center. The paramagnetic center can be either a Lanthanide (Gao et al., 2016; Canales et al., 2017) or nitroxide carrying spin labels like TEMPO.

### INTEGRATING THE RESULTS. FROM NMR DATA TO 3D STRUCTURES

The process of deriving a 3D structure from NMR data is not straightforward and it requires the application of force field modeling, with experimental constraints taken from the experimental data. For carbohydrates, accurate distances have to be used and complemented by high-quality theoretical data; this is because an accurate description of the flexibility is required. This often implies the use of explicit solvent MD (molecular dynamics) calculations for the free ligand.

Several force field programs have been developed for carbohydrates, the most widely used being Amber, making use of the parameters for carbohydrates developed by Woods (GLYCAM) (Kirschner et al., 2008); CHARMM (Brooks et al., 2009), optimized by Mackerel; and GROMACS (Hess et al., 2008). These force fields are compatible with other force fields developed for proteins and peptides and can be used in mixed systems calculations.

Using the experimental NMR data from a complex, we could obtain a 3D structure of the ligand within the complex using transfer NOESY evaluation, or at least qualitative data relative to the potential conformational changes upon binding (Ni, 1994). Finally, CORCEMA-ST software is capable of calculating the STD values for individual protons (Krishna and Jayalakshmi, 2008), and of verifying the bound ligand conformation. STD0- AF have been used for the definition of the structure, position, and conformation of the ligand within the complex (Enríquez-Navas et al., 2011). This approach can also be used to tackle more complex situations, such as multiple binding modes in one or more than one site (Angulo et al., 2008). Finally, it has been used to refine NMR structures, using STD-NMR intensity-restrained optimization (Jayalakshmi and Krishna, 2005).

### FUTURE PERSPECTIVES

NMR provides tools for the analysis and study of low-resolution 3D structures of protein–carbohydrate complexes. A particular advantage is that there are no upper limits to the size of the receptor and the analyses can be performed even with complete cells. In order to improve the reliability of the results, however, efforts should be made to achieve an integrated semi-quantitative method combining all the transfer NMR techniques that can deliver spatial resolution, namely transfer NOESY, STD-NMR, and WaterLOGSY. Recently, promising new methods based on PCS (Pseudo Contact Shifts) have emerged. In the case of the lanthanide paramagnetic tags using PCS they can be applied to protein–carbohydrate complexes with the advantage of the larger dispersion of the signals induced by the lanthanide linked to the carbohydrate (Canales et al., 2017) or the protein (Gao et al., 2016).

#### AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

#### FUNDING

Research described in this paper has been performed with financial support from MINECO grants (CTQ2012-32605, CTQ2015-70134-P), Junta de Andalucía (FQM-1303), cofinanced by European Regional Development Funds (ERDF). The author acknowledge support of the publication fee by the CSIC Open Access Publication Support Initiative

#### REFERENCES


through its Unit of Information Resources for Research (URICI).

#### ACKNOWLEDGMENTS

The author is grateful to his colleagues (J. L. de Paz, J. Rojo, S. Gil-Caballero, M.J. Garcia, and M. Torres) for their careful reading of the manuscript and their participation in the studies cited.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Nieto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Structural and Functional Insights Into Lysostaphin–Substrate Interaction

Helena Tossavainen<sup>1</sup> , Vytas Raulinaitis <sup>2</sup> , Linda Kauppinen<sup>3</sup> , Ulla Pentikäinen3,4,5 , Hannu Maaheimo<sup>6</sup> and Perttu Permi 1,3 \*

<sup>1</sup> Department of Chemistry, Nanoscience Center, University of Jyvaskyla, Jyvaskyla, Finland, <sup>2</sup> Program in Structural Biology and Biophysics, Institute of Biotechnology, University of Helsinki, Helsinki, Finland, <sup>3</sup> Department of Biological and Environmental Science, University of Jyvaskyla, Jyvaskyla, Finland, <sup>4</sup> Institute of Biomedicine, University of Turku, Turku, Finland, <sup>5</sup> Turku Centre for Biotechnology, Turku, Finland, <sup>6</sup> VTT Technical Research Centre of Finland Ltd., Espoo, Finland

Lysostaphin from Staphylococcus simulans and its family enzymes rapidly acquire prominence as the next generation agents in treatment of S. aureus infections. The specificity of lysostaphin is promoted by its C-terminal cell wall targeting domain selectivity toward pentaglycine bridges in S. aureus cell wall. Scission of these cross-links is carried out by its N-terminal catalytic domain, a zinc-dependent endopeptidase. Understanding the determinants affecting the efficiency of catalysis and strength and specificity of interactions lies at the heart of all lysostaphin family enzyme applications. To this end, we have used NMR, SAXS and molecular dynamics simulations to characterize lysostaphin structure and dynamics, to address the inter-domain interaction, the enzyme-substrate interaction as well as the catalytic properties of pentaglycine cleavage in solution. Our NMR structure confirms the recent crystal structure, yet, together with the molecular dynamics simulations, emphasizes the dynamic nature of the loops embracing the catalytic site. We found no evidence for inter-domain interaction, but, interestingly, the SAXS data delineate two preferred conformation subpopulations. Catalytic H329 and H360 were observed to bind a second zinc ion, which reduces lysostaphin pentaglycine cleaving activity. Binding of pentaglycine or its lysine derivatives to the targeting domain was found to be of very low affinity. The pentaglycine interaction site was located to the N-terminal groove of the domain. Notably, the targeting domain binds the peptidoglycan stem peptide Ala-D-γ-Glu-Lys-D-Ala-D-Ala with a much higher, micromolar affinity. Binding site mapping reveals two interaction sites of different affinities on the surface of the domain for this peptide.

Keywords: lysostaphin, NMR structure, pentaglycine, peptidoglycan, protein dynamics, SH3b domain, Staphylococcus aureus, substrate binding

## INTRODUCTION

The gram-positive pathogen Staphylococcus aureus causes numerous diseases, ranging from minor skin abscesses to meningitis and even toxic shock syndrome (Lowy, 1998). S. aureus bacteremia is estimated to be fatal in up to 30 % of cases (van Hal et al., 2012) and annually incur billions in overall medical costs just in the US (Lee et al., 2013) and hundreds of millions of additional hospitalization

#### Edited by:

Maria Rosaria Conte, King's College London, United Kingdom

#### Reviewed by:

Filippo Prischi, University of Essex, United Kingdom Vladimir I. Polshakov, Lomonosov Moscow State University, Russia

> \*Correspondence: Perttu Permi perttu.permi@jyu.fi

#### Specialty section:

This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences

> Received: 19 April 2018 Accepted: 12 June 2018 Published: 03 July 2018

#### Citation:

Tossavainen H, Raulinaitis V, Kauppinen L, Pentikäinen U, Maaheimo H and Permi P (2018) Structural and Functional Insights Into Lysostaphin–Substrate Interaction. Front. Mol. Biosci. 5:60. doi: 10.3389/fmolb.2018.00060 costs in the EU (Köck et al., 2010). The problem is exacerbated by the pathogen being carried by one third of human population (Chambers and Deleo, 2009) and its vigor to develop resistance against drugs mobilized for its treatment. Hence, hospitalacquired S. aureus cases are particularly severe.

For seven decades treatment of S. aureus infections primarily relied on antibiotics interfering with the synthesis of its cell wall, specifically, the penicillin methicillin and the glycopeptide vancomycin. Inhibitors of protein synthesis have been recently added to the arsenal of therapeutic means (Brickner et al., 2008). However, the historic propensity of bacteria to rapidly develop resistance to small molecules impels the conception of new treatment paradigms (Taubes, 2008). Today S. aureus appears as one of the 12 most health-threatening pathogens in the list of resistant bacteria compiled by WHO to promote development of new antibiotics (WHO Mediacentre, 2017).

Pentaglycine cross-bridges of peptidoglycan (PG) are a distinct feature of the S. aureus cell wall and provide innate selectivity as targets (Vollmer et al., 2008). Some members of the Staphylococcus genus have taken advantage of this feature and secrete bactericidal enzymes which cleave this cross-bridge to lyse competing S. aureus (Schindler and Schuhardt, 1964; Browder et al., 1965; Sugai et al., 1997), with their own intrinsic resistance engineered by serine replacements of the glycine residues (Thumm and Götz, 1997; Tschierske et al., 1997). Scission of pentaglycine is executed by endopeptidases of the lysostaphin family, which also includes S. aureus inherent autolytic enzymes that maintain the integrity of the cell wall during cell life cycle (Ramadurai et al., 1999; Ercoli et al., 2015; Raulinaitis et al., 2017a).

A modular organization is common to many peptidoglycan hydrolases (Szweda et al., 2012). Mature lysostaphin consists of an N-terminal catalytic (CAT) domain, and a C-terminal cell wall targeting (CWT) domain (Baba and Schneewind, 1996). Its X-ray crystal structure has been reported recently (Sabala et al., 2014). The CAT domain forms a barrel fold typical for the MEROPS M23 family zinc-dependent endopeptidases, which bear conserved catalytic as well as Zn2+-coordinating amino acid residues (Rawlings et al., 2008). A fourteen-residue linker connects the CAT domain to the CWT domain. The latter assumes the fold corresponding to SH3b domains (Lu et al., 2006). The SH3b domains are the bacterial counterpart of the ubiquitous SH3 domains found in eukaryotes (Ponting et al., 1999; Whisstock and Lesk, 1999). These are small interaction modules present in proteins of signaling pathways, which interact with proline-rich sequences in binding partners (Saksela and Permi, 2012). The canonical PXXP binding site on the surface of SH3 domains is partially blocked in the SH3b domains, and the pentaglycine interaction site of SH3b domains has been located to a groove structure formed by a 20-residue N-terminal extension not present in the smaller SH3 domains (Lu et al., 2006; Hirakawa et al., 2009; Gu et al., 2014). Despite the linker, evidence for CAT– CWT domain interaction in vitro has been presented, although the same study underscored the benefits of flexibility between the domains in vivo (Lu et al., 2013).

Surprisingly little is known about the details of the interaction between lysostaphin and PG. Biochemical characterization has revealed that the CWT domain is indispensable for lysostaphin specificity, and that an intact pentaglycine cross-bridge is crucial for the CWT–PG interaction (Gründling and Schneewind, 2006). The interaction can be inhibited by PG fragments containing multiple cross-linked murein subunits, but not with smaller murein mono- or dimers. Lipoproteins, cell wall-anchored proteins and wall teichoic acids appear not to have a role in the interaction (Gründling and Schneewind, 2006). Similarly, PG binding of Staphylococcus capitis ALE-1 CWT domain, which is highly similar in sequence and structure to the lysostaphin CWT domain, is dependent on the length and amino acid composition of the cross-bridge, showing strong preference for Gly<sup>5</sup> bridges (Lu et al., 2006). The complex structure of lysostaphin CWT domain with pentaglycine peptide is a landmark in the characterization of the SH3b–PG interaction (PDB ID 5LEO). In this structure pentaglycine resides in the N-terminal groove, consistent with previous mutational, computational and NMR studies.

Medical applications of recombinant lysostaphin are advancing and some have entered clinical trials (Nelson et al., 2012). Lysostaphin CWT domain fusion proteins have also been considered as a promising source of alternative chemotherapeutics (Schmelcher et al., 2012; Osipovitch and Griswold, 2015; Jagielska et al., 2016). In addition, lysostaphin has been engineered to suppress its immunogenic potential (Blazanovic et al., 2015). It is clear that the key factor in the success of all lysostaphin family enzyme applications is understanding the determinants affecting efficiency of catalysis and strength and specificity of interactions. To this end, we have used NMR, SAXS and molecular dynamics (MD) simulations to characterize lysostaphin structure and dynamics, to address the inter-domain interaction, the enzyme-substrate interaction as well as the catalytic properties of pentaglycine cleavage in solution.

### MATERIALS AND METHODS

### Production and Purification of Lysostaphin and Its CWT Domain

Gene for mature lysostaphin (residues 251-493) was synthesized de novo at GenScript (NJ, USA) and cloned into pGEX-2T vector (GE Healthcare Life Sciences) at the BamH1 and EcoR1 cloning sites. Resulting plasmid was transformed into E. coli BL21 (DE3) strain. DNA encoding the CWT domain (residues 402-493) was amplified (primers 5′ -CGCGGATCCTGGAAAACCAATAAAT ATGGCACGC-3′ and 3′ -CGCGAATTCTTATTTGA TGGTGC CCCACAGG-5′ ) using this plasmid as template and was cloned into pGEX-2T vector separately at the BamH1 and EcoR1 cloning sites as well.

Protein expression and purification was carried out as described in detail previously (Raulinaitis et al., 2017a). Overnight cell cultures were inoculated into a fresh medium. LB medium was used for non-labeled protein production and when labeled protein was required for NMR data collection, cells were grown in minimal resource M9 medium (40 mM Na2HPO4, 22 mM KH2PO4, 12 mM NaCl, 1 mM MgSO4, 100µM CaCl2, 10µM FeSO4, 0.001 % w/w biotin, and 0.1 mg/ml ampicillin, pH 7.0). Upon cells reaching OD600, protein expression was induced with isopropyl β-D-1-thiogalactopyranoside (IPTG), 0.5 mM final concentration, and further incubated for 3 h. Harvested cells were disrupted by sonication, and soluble fraction was loaded on Glutathione Sepharose 4 Fast Flow resin (GE Healthcare). Lysostaphin was released from resin-bound glutathione tag by thrombin and protein was further purified by gel filtration on HiLoad 16/60 Superdex 75 column (GE Healthcare Life Sciences). Removal of innately bound metal ions was carried out with EDTA, which was thereafter dialyzed away in PBS buffer and protein was concentrated with Amicon <sup>R</sup> 5000 MWCO centrifugal filters.

#### NMR Spectroscopy

Spectra for chemical shift assignment and structure determination were acquired at 308 K on a Varian INOVA 800 MHz spectrometer equipped with a cryogenically cooled triple resonance probehead equipped with an actively shielded z-gradient. Resonance assignment was carried out with the following set of experiments: <sup>1</sup>H, <sup>15</sup>N HSQC, <sup>1</sup>H, <sup>13</sup>C HSQC, <sup>1</sup>H, <sup>13</sup>C CT-HSQC, HNCACB, CBCA(CO)NH, (H)CC(CO)NH, H(CCO)NH, HBHA(CO)NH, (HB)CB(CGCD)HD, (HB)CB(CGCDCE)HE, <sup>1</sup>H, <sup>15</sup>N NOESY-HSQC, <sup>1</sup>H, <sup>13</sup>C NOESY-HSQC in 7% D2O/93% H2O, and <sup>1</sup>H, <sup>13</sup>C HSQC-NOESY in 100% D2O. The latter five spectra were employed for the assignment of the aromatic residues as well as the source of distance restraints for structure determination. Additional structural restraints were derived from the chemical shifts in the form of TALOS-N (Shen and Bax, 2013) dihedral angle restraints and applied for residues in secondary structures as well as H2O to D2O exchange experiment in the form of hydrogen bond restraints for amides experiencing slow exchange. Histidine tautomeric state could be derived from the peak pattern in a <sup>1</sup>H, <sup>15</sup>N HMBC spectrum or the Cδ2-Cε1 chemical shift difference for four out of nine histidines (Barraud et al., 2012). Spectra were processed with NMRPipe (Delaglio et al., 1995) and analyzed with Sparky (T. D. Goddard and D. G. Kneller, University of California, San Francisco). The automated NOE peak assignment-structure determination routine of CYANA 2.1 (López-Méndez and Güntert, 2006) was used to generate an ensemble of 20 lowest target energy structures which were further refined with AMBER 16 (Case et al., 2005) in explicit solvent. Fifteen lowest AMBER energy structures were chosen to represent the solution state structure of lysostaphin.

The pentaglycine binding site on the CWT domain was determined by comparing the <sup>1</sup>H, <sup>15</sup>N, and <sup>1</sup>H, <sup>13</sup>C HSQC spectra of full-length lysostaphin and CWT domain acquired in the presence and absence of a large amount, ∼170 molar excess, of KGGGGG and GGGGGK hexapeptides (Sigma-Aldrich). Binding characteristics of a PG peptide mimic were determined by titration of the CWT domain with A-D-EKGGGGGA-D-EK-D-A and comparing <sup>1</sup>H, <sup>15</sup>N HSQC spectra acquired at approximate peptide to CWT domain molar ratios of 0, 0.5, 1, 2, 5, 10, and 24. Similarly, the pentapeptide A-D-γ-EK-D-A-D-A (Bachem) and muramyl dipeptide N-acetylmuramyl-Lalanyl-D-isoglutamine hydrate (Carbosynth Ltd) were titrated to a CWT domain sample, and <sup>1</sup>H, <sup>15</sup>N HSQC spectra were collected at approximate peptide to CWT domain molar ratios of 0, 0.5, 1, 2, 4, 8, 16, 32, 45, and 60. CSPs, calculated as 1δ =(1δH2+(0.154 ×1δN)<sup>2</sup> ) 1/2 , were plotted as a function of the molar ratio of titrated ligand to CWT domain. Dissociation constants for individual residues were obtained by nonlinear least squares fitting, as implemented in the program xcrvfit (www.bionmr.ualberta.ca/bds/software/xcrvfit). Site-specific dissociation constants were calculated as averages of dissociation constants of individual residues within the site. In all binding site studies protein to zinc concentration ratio was 1:1.

CAT domain–SH3 domain interaction was probed by replacing zinc with paramagnetic manganese (1:1 protein to cation ratio) and observing peak bleaching in a <sup>1</sup>H, <sup>15</sup>N HSQC spectrum.

<sup>15</sup>N T1, T<sup>2</sup> and heteronuclear NOE spectra were acquired for one- and three-zinc bound lysostaphin in the presence and absence of substrate pentaglycine on a Bruker AVANCE III HD 800 MHz spectrometer, equipped with a TCI <sup>1</sup>H/13C/15N cryoprobe. For T<sup>1</sup> relaxation delays of 20, 60, 100, 200, 400, 600, 900, 1,400, 2,000, and 2,600 ms were used. For T<sup>2</sup> the delays were multiples of loop length 16.96 ms, with the loop counter set to 1, 2, 3, 4, 5, 6, 7, 8, 10, and 12. The recycle delay was set to 3.1 and 2.6 s for the T<sup>1</sup> and T<sup>2</sup> experiments, respectively. All spectra were acquired as pseudo 3D spectra and Fourier transformed with NMRPipe. Peak intensities were fitted to decaying exponential function as implemented in Sparky. Heteronuclear NOE values were calculated as the intensity ratios of peaks from a pair of spectra measured with and without <sup>1</sup>H presaturation during the recycle delay. The relaxation delay was set to 5 s. The CWT domain relaxation spectra were recorded on a Bruker AVANCE III 500 MHz spectrometer. For T1, relaxation delays of 20, 60, 100, 200, 400, 600, 800, 1,200, 1,600, and 2,000 ms. The recycle delay was set to 2.5. For T2, the loop counter was set to 1, 2, 4, 8, 10, 12, 14, and 16 and the recycle delay was 1.5 s. Heteronuclear NOE spectra were acquired as for full-length lysostaphin.

Pentaglycine cleavage by lysostaphin was monitored by <sup>1</sup>H NMR as described previously (Raulinaitis et al., 2017a). Reactions took place in phosphate buffer saline (PBS), pH 7.2, with >90% D2O by volume as a solvent and final volumes were 600 µL. Initial pentaglycine (Sigma-Aldrich) and lysostaphin concentrations were 1 mM and 2.25µM, respectively. Metal ions (ZnCl2, CoCl2, CuCl2, and MnCl2) were added as needed. Residual activity of lysostaphin "apo" form was taken into account in activity calculations. Measurements were carried out by using a Bruker Avance III 600 MHz spectrometer, equipped with a QCI <sup>1</sup>H/13C/15N/31P cryoprobe and a SampleJet automated sample changer. The preheater of the sample changer was used for sample incubation when product formation was measured as a time series with 3-h intervals for 48 h. Single time-point samples were incubated in an incubator for 21 h and reactions were quenched by heating at 90◦C for 15 min before proton spectra acquisitions at 37◦C. The data were processed with Bruker TopSpin 3.5 software.

#### MD Simulations

MD simulations in explicit solvent were performed with AMBER 16 (Case et al., 2005) using the ff14SB force field. Lysostaphin CYANA ensemble structures 1, 5, 7, and 14 were selected as targets of simulations. The topology and coordinate files were generated with the LEaP program in AMBER 16. The zinc ion was modeled using the cationic dummy atom approach (Pang, 1999; Pang et al., 2000). The protein molecule was placed in a cubic box with a minimum solute-box distance of 10 Å, and solvated with TIP3P water molecules. Nine chloride ions were added to maintain electrical neutrality. After minimization, heating and equilibration of the system, the production 30 ns MD simulations were performed with periodic boundary conditions at 300 K. The temperature was maintained by using the Langevin thermostat, whereas the pressure was kept at 1 bar using the Berendsen barostat (Berendsen et al., 1984). The time step was set to 2 fs. Long-range electrostatic interactions were treated using the Particle Mesh Ewald method (Darden et al., 1993) with a cut-off of 10 Å. Bond lengths involving hydrogen atoms were constrained by SHAKE (Ryckaert et al., 1997). Analyses of the trajectories were carried out with CPPTRAJ (Roe and Cheatham, 2013) and VMD (Humphrey et al., 1996).

#### SAXS

BM29 beamline, ESRF, Grenoble, France was used for collecting the SAXS data on a PILATUS 1M image plate using sample to detector distance of 2.9 m and a wavelength of 1.0 Å (momentum transfer range 0.01 < q < 5 nm−<sup>1</sup> ). The measurements were carried out at 288 K in PBS buffer. Three different protein concentrations (1.0, 3.0, and 5.0 mg/ml) were used in the data acquisition. PRIMUS (Konarev et al., 2003) in ATSAS software package (Franke et al., 2017) was utilized in the data processing. To assess the conformational variability of lysostaphin, EOM (Tria et al., 2015) was used. In EOM calculations, lysostaphin CAT domain (residues 251-384) and CWT domain (residues 410- 493) of the NMR structure reported here, were used as rigid bodies, and residues 385-409 as random coil. 10 000 randomized conformations were generated for lysostaphin based on amino acid sequence and 3D-structure of CAT and CWT domains of lysostaphin using both random coil and native options in EOM. The scattering profiles of these randomly generated conformations were compared, and representative 20 structures whose scattering curve fits to the experimental scattering curve were selected by EOM algorithm. EOM run was repeated 100 times to provide statistical data about lysostaphin radius of gyration (R<sup>g</sup> ) and maximum dimensions (Dmax) distribution.

### RESULTS AND DISCUSSION

### Lysostaphin Catalytic Groove Embracing Loops are Highly Mobile

The solution NMR structure of lysostaphin was determined based on distance restraints from NOESY spectra, ϕ/ψ dihedral restraints derived from chemical shifts using TALOS-N (Shen and Bax, 2013) and hydrogen bond restraints obtained from a H/D exchange experiment (**Figure 1A** and **Table 1**). In line with the X-ray crystal structure (Sabala et al., 2014), lysostaphin is structurally arranged into two domains. The domains are connected by a fourteen-residue linker, which is devoid of persistent structure. The domains have random mutual orientations in the ensemble of structures. The CAT domain shares the structural features of a MEROPS M23 metallopeptidase domain with a β-sheet core and four loops forming the bottom and the rims of the catalytic groove, respectively (Rawlings et al., 2008). In one end of the groove reside the catalytic residues surrounding a zinc cation. The CWT domain has the SH3b domain, barrel-like all β-sheet fold.

While the core of the CAT domain is well-defined, and nicely superimposable with the crystal structure (RMSDs of 0.7 and 1.3 Å for backbone and heavy atoms, respectively), the catalytic groove embracing loops and the catalytic histidines are conformationally dispersed, which results in a poorer overall structural match in these regions. The conformational spread in the NMR ensemble is the result of solvent exposure and flexibility of residues in these regions (**Figure 1B** and Supplementary Figure 1). Fast amide proton exchange with solvent is likely to be present in the loops, because these do not display or have severely broadened peaks in a <sup>1</sup>H, <sup>15</sup>N HSQC spectrum. Additionally, the catalytic histidines probably exhibit solvent and/or conformational exchange, because their signals were not observed in a histidine-region <sup>1</sup>H, <sup>15</sup>N HMBC spectrum, showing cross-peaks between Nδ1, Nε2, Hδ2, and Hε1. Although carbon-bound Hδ2 or Hε1 do not exchange with solvent, they can be influenced by solvent or conformational exchange elsewhere in the aromatic ring. This manifests as additional line-broadening of histidine ring protons and disappearance of catalytic histidine correlations from the HMBC. In addition to exchange, microto milli-second time scale motions are present in the catalytic groove (**Figure 2** and **Table 2**), as identified by high R1R<sup>2</sup> products derived from <sup>15</sup>N relaxation measurements (Kneller et al., 2002).

Catalytic groove conformational dynamics is evident also from MD simulations. The root mean square fluctuation, RMSF, emphasizes the flexible nature of three of the groove embracing loops (**Figure 1B** and Supplementary Figure 1). Consistent with the NMR ensemble, the outer half of the N-terminal loop, residues G270-G277, is highly dynamic, whereas the inner half is restrained by transient backbone hydrogen bonds within the loop, Y267-G281, G268-H279 as well as the H279Nε2–Zn2<sup>+</sup> interaction. The loop encompassing residues N368-Q376 is the least flexible. It is transiently stabilized by backbone S371-T374 and side chain S369-T374 hydrogen bonds, the former twisting the loop into a three-residue 3<sup>10</sup> helix.

The MD simulation reveals several possible mechanisms contributing to the disappearance of the catalytic histidine HMBC signals (Supplementary Figure 1). Firstly, while the two histidine rings remain approximately coplanar during the simulation, they slide relative to each other. Secondly, the aromatic ring of F285, next to H360, is able to rotate. Thirdly, the loop closest to the catalytic histidines, residues T353-Y355 thereof in particular, adopt a multitude of conformations at a fluctuating distance from the catalytic histidines. Conformational exchange between all these different structural arrangements is likely to broaden the HMBC signal beyond detection. Moreover,

perhaps most importantly, H329 and H360 are transiently accessible to solvent, leading to signal broadening through exchange between tautomeric, ionic and H/D states. Notably, the outward orientation of H360 aromatic ring in some of the NMR ensemble structures appears to be energetically unfavorable as during the MD simulations the ring turns toward the catalytic groove.

Of the M23 family endopeptidases, for which the structure has been determined, lysostaphin CAT domain has the highest amino acid sequence identity with the sequences of LytM and LytU, 49 and 44%, respectively. LytM and LytU are S. aureus autolytic endopeptidases taking part in PG remodeling by cleaving the pentaglycine cross-bridges. The three structures are remarkably similar. Backbone atom RMSDs over the β-sheet core are 0.7 and 0.9 Å for LytM and LytU, respectively, when overlaid with the lysostaphin CAT domain. Besides structural correspondence they share a similar dynamic behavior with a stable core and mobile catalytic site and surrounding loops (Raulinaitis et al., 2017a,b).

The CWT domain is compact, stabilized by extensive hydrogen bonding within loops as well as between loops and the rest of the molecule, and high similarity is found between the solution and crystal structures. Backbone and heavy atom RMSDs are 0.5 and 1.1 Å, respectively, for all residues in the closest matching pair. Some dynamic features are, however, observed. Micro- to millisecond time scale motions are present in five residues, K412, T428, R476, G486, and L488 and four residues, G401, K406, N466, S467 are prone to fast exchange with solvent, all in loops (**Figure 1B**).

#### Lysostaphin Domains Do Not Interact in Vitro

No inter-domain NOE peaks were observed in the NOESY spectra, strongly indicating that the two domains do not possess a stable interaction interface. The lack of chemical shift differences between the isolated CWT domain and the CWT domain in fulllength lysostaphin (Supplementary Figure 2) provides additional solid evidence for non-interacting domains. Nevertheless, we further inspected the possibility of domain interaction by replacing the CAT-bound zinc cation with manganese and analyzing peak bleaching in a <sup>1</sup>H, <sup>15</sup>N HSQC spectrum. Unpaired electrons in isotropic paramagnetic probes such as manganese induce transverse paramagnetic relaxation enhancement (PRE) leading to line-broadening, affecting peak intensity, up to 35 Å from the paramagnetic center depending on the paramagnetic probe (Clore and Iwahara, 2009). In Mn2+-bound lysostaphin peaks disappeared or broadened only in the CAT domain (Supplementary Figure 3), at a maximum average distance of 23.6 Å from the paramagnetic center, leaving peaks from the far-most backside of this domain intact. Conceivably, as the linker allows such an arrangement, the domains could interact through this backside CAT domain region without the CWT domain peaks experiencing line-broadening.

TABLE 1 | NMR restraints and structural statistics for the ensemble of 15 lysostaphin conformers of least restraint violations tabulated for the whole enzyme and domains separately.


<sup>a</sup>Backbone includes Cα, Cβ, N, and H atoms, except the N-terminal amide. For side chains, excluded are the highly exchangeable groups (Lys, amino, Arg, guanido, Ser/Thr/Tyr hydroxyl, His δ1/ε2), as well as all unprotonated carbons and nitrogens. <sup>b</sup>Ordered residues: 253–270, 279–308, 313–350, 361–387, 402–465, 468–492. Computed using PSVS (Bhattacharya et al., 2007).

The PRE experiment remaining inconclusive, we turned to the <sup>15</sup>N relaxation data and overall rotational correlation times (τc), which can also be used to untangle the confines of inter-domain mobility. The τcs derived from <sup>15</sup>N R2, R<sup>1</sup> rates are notably different for the CAT and CWT domains, 7.8 and 6.0 ns at 35◦C, respectively (**Table 2**). Also, the τ<sup>c</sup> of the CWT domain in fulllength lysostaphin is significantly higher than that of the isolated CWT domain, 4.1 ns. These indicate that there is a clear motional hindrance by the neighboring domain in full-length lysostaphin. There still is, however, substantial flexibility, considering that a unified complex encompassing both domains has a predicted τ<sup>c</sup> of 13.4 ns.

To further investigate the structure and flexibility of lysostaphin in solution, SAXS data were collected. An ensemble optimization (EOM) method was employed to analyse the SAXS data. In EOM calculations, lysostaphin CAT (residues 251-384) and CWT (410-493) domains of the NMR structure reported here were used as rigid bodies, and residues 385- 409 as random coil. EOM runs yielded reproducible ensembles neatly fitting the experimental data with χ 2 -values around 1.4 (**Figure 3C**). The results clearly show that lysostaphin adopts two different conformations in solution; a more favored compact subpopulation with average R<sup>g</sup> and Dmax of 27 and 82 Å, respectively, and a less favored extended subpopulation, with average R<sup>g</sup> and Dmax of 37 Å and 110 Å (**Figures 3A,B,D**). The Dmax of the predominant conformation is consistent with an average domain separation of 22–23 Å. This is considerably shorter than the length encompassed by a fully extended

FIGURE 2 | <sup>15</sup>N R1, <sup>R</sup>2, heteronuclear NOE and R1R<sup>2</sup> data of one-zinc (white circles), three-zinc (black circles) and three-zinc lysostaphin in the presence of <sup>∼</sup><sup>140</sup> molar excess of KG<sup>5</sup> (red circles). Error bars have been omitted for clarity. Average errors are: 1Zn CAT 0.01 s−<sup>1</sup> (R1), 0.19 s−<sup>1</sup> (R2), 0.03 (HetNOE) and 0.29 s−<sup>2</sup> (R1R2); 1Zn CWT 0.01 s−<sup>1</sup> (R1), 0.10 s−<sup>1</sup> (R2), 0.01 (HetNOE) and 0.16 s−<sup>2</sup> (R1R2); 3Zn CAT 0.01 s−<sup>1</sup> (R1), 0.20 s−<sup>1</sup> (R2), 0.03 (HetNOE) and 0.27 s−<sup>2</sup> (R1R2); 3Zn CWT 0.01 s−<sup>1</sup> (R1), 0.11 s−<sup>1</sup> (R2), 0.01 (HetNOE), and 0.16 s−<sup>2</sup> (R1R2); 3Zn+KG<sup>5</sup> CAT 0.02 s−<sup>1</sup> (R1), 0.26 s−<sup>1</sup> (R2), 0.06 (HetNOE) and 0.40 s−<sup>2</sup> (R1R2); 3Zn+KG<sup>5</sup> CWT 0.01 s−<sup>1</sup> (R1), 0.08 s−<sup>1</sup> (R2), 0.04 (HetNOE) and 0.14 s−<sup>2</sup> (R1R2). In the R1R2 panel residues experiencing a large change upon addition of excess zinc are marked with asterisks and highlighted in red in the CAT domain structure.


Residues with significant contribution from internal motions were not included in the calculation of averages (Tjandra, 1996).

the NMR structure were used as rigid bodies in EOM calculations. Spheres represent amino acids in a random coil linker.

conformation of the linker containing two prolines, ∼42 Å. This raises the possibility of the presence of secondary structure in the linker or interactions of the linker with one or other domain. Neither is, however, supported by secondary chemical shifts or NOE peaks. Moreover, significantly lower R1R<sup>2</sup> products and heteronuclear NOEs found for linker residues K389-W402 with respect to the values in the CAT and CWT domains indicate that the linker is highly flexible. However, two sets of peaks were assigned for linker residues V394-P398, indicating the presence of two preferred conformations within this stretch, likely due to P398 cis/trans isomerism. This potentially offers a functional benefit to lysostaphin as it allows sampling of a larger conformational space and readjustment of domain orientations upon scission of pentaglycine bridges near its anchoring site (Aitio et al., 2012).

A dynamic inter-domain interaction in lysostaphin has been presented previously (Lu et al., 2013). Although the long flexible linker indeed allows the spatial proximity of the domains, there are no indications of such interaction in our data. The linker easily allows cleavage of adjacent and nearby pentaglycine bridges in PG near the bridge that serves as an anchor. Considering the architecture and dimensions of the PG lattice (Kim et al., 2015), formation of a CAT and CWT "envelope" around the very same target pentapeptide, which is also fixated between perpendicular peptidoglycan stem peptides that it connects, is very improbable due to steric restrictions. The current data remain in support of the flexible arrangements between partly independent domains.

### Binding of a Second Zinc by the Catalytic Histidines Reduces Lysostaphin Catalytic Activity

Proton NMR allows for direct and real-time reaction monitoring devoid of intricacies of cellular suspension (Raulinaitis et al., 2017a). Scission of pentaglycine by lysostaphin was monitored for 48 h by <sup>1</sup>H NMR and under conditions of this study its catalytic rate appeared to be limited only by saturation with the substrate (Supplementary Figure 4). The substrate turnover rate by the enzyme in the first 3 h of the reaction is estimated to be in the order of 0.006 s−<sup>1</sup> .

We used the method to assess lysostaphin catalytic activity dependence on different metal cofactors (**Figure 4**). Samples were first incubated for 21 h and then quenched by heating, followed by acquisition of a <sup>1</sup>H spectrum. Results indicate that

metal cations Zn2+, Co2+, and Cu2<sup>+</sup> are suitable cofactors of lysostaphin, whereas Mn2<sup>+</sup> is a weak cofactor. The activity increases when Cu2<sup>+</sup> is present in excess. This behavior might be explained by Cu2<sup>+</sup> being a better cofactor with a lower affinity, the enzyme reaching full metal cation saturation at higher cation concentrations. M23 family endopeptidases are recognized as zinc-dependent enzymes, yet other metal ions have been found to partially restore their activity (Firczuk et al., 2005; Wang et al., 2011) and recently Co2<sup>+</sup> ions have been shown to yield a hyperactive LytU (Raulinaitis et al., 2017a).A partial inhibition of lysostaphin activity is observed in the excess of zinc. Even under the inhibition by an extra zinc ion, the enzyme retains its fold, as demonstrated by its <sup>1</sup>H, <sup>15</sup>N HSQC spectrum displaying features of a structured protein, very similar to those of the one-zinc form (Supplementary Figure 5). Notably, chemical shift perturbations, CSPs, are observed for peaks of residues in the catalytic groove as well as the surrounding loops. Changes in the dynamics are also observed (**Figure 2**). The majority of the largest changes occur for residues having the largest exchange contribution in the one-zinc form, located in the vicinity of the catalytic site. At these sites, the R1R<sup>2</sup> products decrease, which indicates reduction of exchange in the µs-ms timescale, and structure stabilization. Loop amides remain unprotected from fast solvent exchange, though, as no additional peaks appear into the <sup>1</sup>H, <sup>15</sup>N HSQC spectrum. Conclusively, two additional cross-peak sets appear into a <sup>1</sup>H, <sup>15</sup>N HMBC spectrum with <sup>15</sup>N chemical shifts corresponding to those of a zinccoordinating nitrogen (Banci et al., 1998). Hence, binding of a second zinc, coordinated by the catalytic histidines, reduces catalytic activity.

The inhibitory effect of second zinc binding was also demonstrated and explained by the arrest of catalytic histidines in the autolytic LytU, where the process may have a regulatory role (Raulinaitis et al., 2017a). Whereas it is conceivable for an intracellular autolysin, such regulation by inhibition would be irrational for an extracellular bactericide like lysostaphin and is likely physiologically irrelevant.

#### Pentaglycine Interacts Transiently With the N-Terminal Groove in the CWT Domain

NMR suits particularly well to study weak or transient molecular interactions. We performed CWT domain binding site mapping with hexapeptides G5K and KG<sup>5</sup> (**Figure 5A** and Supplementary Figure 6). The presence of lysine significantly enhanced the solubility of the peptide, which was essential in order to reach the large excess of ligand needed to achieve noticeable CSPs. The two pentaglycine derivatives produced similar amide peak CSPs when present in ∼170 molar excess. Ligand interaction with CWT was independent of the presence of the CAT domain, as deduced by the similar CSPs observed for the CWT domain in full-length lysostaphin in the presence of ∼140 molar excess of KG<sup>5</sup> (data not shown).

Largest peak shifts, 1δ ≥ 0.10 ppm, in the presence of G5K were observed for N405, Y411, G430-F432, V452, M453, and Y472. Residues in strands β1 and β2 and the loop in between (K403-K412) form the bottom of the pentaglycine binding groove, and residues G430-F432 from the loop between strands β3 and β4 and Y472 in strand β7 the ceiling of the groove. Binding site mapping demonstrates CSPs for a delimited set of residues, without secondary, rearrangement-induced perturbations. Our data are nicely in accordance with the crystal structure of the lysostaphin CWT–pentaglycine complex (PDB ID 5LEO), which reveals that only very small structural rearrangements, namely few sidechain reorientations, occur upon binding of pentaglycine. The dissociation constant of the interaction is estimated to be ultra-weak, > 10 mM, in line with those observed for other pentapeptide–SH3b-type CWT domain interactions (Gu et al., 2014; Benešík et al., 2017).

The <sup>15</sup>N relaxation data show interesting differences between free and pentaglycine-bound three-zinc lysostaphin. Both domains in the complex have larger τcs than those in the free three-zinc form, 8.6 vs. 7.9 ns for the CAT and 6.3 vs. 5.9 ns for the CWT domains (**Table 2**). The difference is significant and cannot be attributed to the change in molecular weight only, considering that an increase of ∼430 Da in molecular weight is predicted to increase τ<sup>c</sup> by 0.2 ns only. Exchange between the complex and free forms could be the source of the larger τcs, and indeed many local changes in the R1R<sup>2</sup> products are observed. However, there is an equal number of positive and negative changes with values larger than <R1R<sup>2</sup> > ±1 SD, and the average R1R<sup>2</sup> products remain the same for both domains. We gather that the additional rise can be interpreted in terms of transient inter-domain interaction in the presence of the ligand, or as a change in the degree of anisotropy of molecular tumbling induced by peptide or inter-domain interaction. Future studies in alignment media will help to derive information about relative domain orientation and inter-domain motions. The linker region shows

FIGURE 5 | Peptide binding sites as revealed by NMR titrations. CSPs induced by G5K (A), A-D-EK-GGGGG-A-D-EK-D-A (B), and A-D-γ-EK-D-A-D-A (C) mapped onto the CWT domain structure. The surfaces are colored by the associated dissociation constant, the scale of which is given on the top of the figure. In (A), the location of the bound pentaglycine is given from a structural superimposition of the free NMR structure and the X-ray CWT–substrate complex with PDB ID 5LEO. In (C), between the high- and low affinity sites lies a group of residues which demonstrated a curved peak movement (see Supplementary Figure 4).These are shown in gray. (D) Conserved residues in S. aureus-targeting CWT domains mapped onto the lysostaphin CWT domain structure. Conservation was determined from a sequence alignment with ClustalW (Larkin et al., 2007) of lysostaphin CWT-like S. aureus-targeting sequences in the UniProt database (The Uniprot Consortium, 2017). In red are shown fully conserved residues and in orange residues with strongly similar properties. Conserved residues for which large CSPs were observed are labeled in italics, whereas conserved residues for which smaller CSPs were observed (0.06–0.09 ppm, not colored in A-C for clarity) are labeled in underlined italics. Sequence alignment is given in Supplementary Figure 5.

similar dynamics in the absence and presence of the peptide, revealing that this stretch remains flexible, loosely connecting the domains.

### The CWT Domain Shows Higher Affinity Toward the Stem Peptide

We reasoned that in vivo the CWT–pentaglycine interaction is likely to be bolstered by auxiliary contacts from the glycan backbone or the stem peptides in the PG lattice (for a description of the chemical structure of PG see Vollmer et al., 2008). We found that the CWT domain had no significant affinity for the muramyl dipeptide N-acetylmuramyl-L-A-D-γ-Q hydrate as deduced by the very small CSPs observed at ∼80 molar excess of the peptide (data not shown).

Titration of the CWT domain with a synthetic peptide A-D-EKGGGGGA-D-EK-D-A, which would resemble the stemcross-bridge-stem sequence in native PG, resulted, instead, in significant CSPs at considerably lower concentration ratios, ∼24 molar excess. In addition to CSPs of residues in the pentaglycine groove, CSPs were observed for two groups of peaks with distinct binding affinities. If viewing the CWT domain the pentaglycine groove laying horizontally toward the viewer, the N-terminus of the bound pentaglycine on the right (PDB ID 5LEO), then K412-S413, S415-A416, V459, T491, and I492 form an elongated patch on the right side of the domain (**Figure 5B**). K412 is located between the pentaglycine and the elongated sites. These residues have an average K<sup>D</sup> of ∼2.3 mM. Residues I424-I425 and R427 form a small patch on the left side of the CWT domain with an average K<sup>D</sup> of ∼10.5 mM. W489 is located between the elongated and the small sites, but with a K<sup>D</sup> of ∼2.3 mM belongs to the elongated site. The peptide is not large enough to reach simultaneously the three surfaces. Considering that residues in the pentaglycine site have a much higher affinity toward this peptide than to pentaglycine or KG5/G5K (average K<sup>D</sup> of ∼3.2 mM) we suspect that one peptide molecule binds via its tail to the elongated site, and boosts the affinity of the pentaglycine site for the pentaglycine moiety of the peptide.

Titration with a shorter peptide A-D-γ-EK-D-A-D-A resulted in a complex peak behavior with three distinct patterns of peak position movement (**Figure 6**). Cross-peaks of residues L410, K412-S415, V452, W489, and I492 moved linearly upon peptide addition and reached saturation at ∼32 molar excess of peptide. The dissociation constant was determined to be ∼0.3 mM, notably lower than that for pentaglycine or the longer peptide. These residues mostly overlap with the elongated site of the previous titration (**Figure 5C**). Cross-peaks of the second group of residues also moved linearly, but began prominent movement and reached saturation at later steps of the titration, resulting in a dissociation constant of ∼3 mM. These residues form a patch on the surface, which covers the small site of the previous titration (I425, R427, and L473) and has extensions of polar residues (R433, S434, Q437, and S438). The third group of peaks exhibited non-linear peak movement. For some of these residues the CSP are as large as for those in the two other groups, which rules out unspecific multiple binding as the cause of nonlinearity. Instead, as these residues are located between the first two surface patches, and include residues in the hydrophobic core, a conformational rearrangement brought about binding to the high-affinity site might be responsible for the curved titrations. Core residues V461 and V475 are probably involved in this rearrangement because their methyl peaks substantially shift in the <sup>1</sup>H spectra along the titration (Supplementary Figure 7).

The importance of R296 and W358 in ALE-1 SH3b binding to S. aureus PG has been demonstrated by selective mutations (Lu et al., 2006). Binding of ALE-1 R296A and W358A mutants was reduced 3- and 2-fold, respectively, as compared to wild type binding. These residues correspond to R427 and W489 of lysostaphin CWT, belonging to the lower- and higher affinity sites, respectively, although structurally very close on the surface of CWT. R427 and W489 are strictly conserved in S. aureustargeting SH3b domains (**Figure 5D**, Supplementary Figure 8). Based on the titration results and residue conservation, other residues which are likely to have an impact on CWT PG binding other than those in the pentaglycine binding groove are E414 and I492 of the high-affinity site.

We generated model structures of the CWT domain in complex with pentaglycine-stem peptide fragments (Supplementary Figure 9) with the assumption that the pentaglycine moiety occupies the N-terminal binding groove as in the structure 5LEO. The models indicate that the stem peptides on either side of pentaglycine seem to be too short to reach the CSPs observed in the presence of the A-D-γ-EK-D-A-D-A peptide. Also, CSPs in the region of the linkage between pentaglycine and the stem peptide are missing from the CWT domain surface. We gather that the observed CSPs might arise from a structural rearrangement occurring upon binding (see before), or be the result of stem peptide contacts present within the PG architecture, that is from stem peptides not directly linked to the pentaglycine recognized by the N-terminal groove. According to the parallel-stem architecture proposed by Kim et al. (2015) for S. aureus PG, contacts with stem peptides from further up or down in the densely packed glycan chains are probable.

These results clearly support the idea of formation of affinityheaving accessory contacts with the PG network. Altogether it is advisable to consider using larger PG fragments in SH3b interaction studies in order to overcome the very low affinity of these domains toward pentaglycine (Gu et al., 2014; Benešík et al., 2017).

CWT–PG association studies have demonstrated that CWT binding to S. aureus cells cannot be blocked with excess of pentaglycine (Gründling and Schneewind, 2006), which is in accord with the extremely low affinity found for the pentaglycine–CWT interaction in the present study. The authors also demonstrated that inhibition of binding was achieved only with PG fragments containing multiple cross-linked murein subunits, not with smaller murein mono- or dimers. This suggests that the CWT domain affinity for S. aureus PG has to be higher than that observed here for the five-residue peptide, ∼300µM. It also suggests that the PG lattice structure is needed for maximal affinity, only cross-linked fragments with multiple murein moieties being capable of reproducing this structure. In

other words, binding to the pre-arranged PG lattice is likely to be entropically more favorable than binding, through induced folding, to a disordered peptide nonetheless containing the same interaction sites.

### CONCLUSIONS

This study reveals details on structure and dynamics within and between the lysostaphin domains, on metal ion preference of the CAT domain, as well as characterizes substrate binding sites and affinities of the CWT domain. Remarkably, the affinity of the CWT domain was found to be much higher toward the stem peptide than toward pentaglycine. Also, the CWT domain affinity for pentaglycine increased when the latter was annexed to a stem peptide-like moiety. These findings underscore the importance of auxiliary contacts in CWT PG recognition. The stem peptide analogs interacted with two surfaces on both sides of the N-terminal pentaglycine binding groove of the CWT domain. Residues E414, R427, W489, and I492 are predicted to be crucial to the PG interaction. Taking into account the PG architecture, rather than just the pentaglycine bridge, is imperative to appreciate the factors contributing to the catalytic activity of the CAT domain, to recognize the structural components influencing the interaction between lysostaphin and PG and to engineer novel lysostaphin- or CWT-based antimicrobials.

## DATA AVAILABILITY STATEMENT

The assigned chemical shifts and coordinates of lysostaphin have been deposited in the BioMagResBank (http://www.bmrb.wisc. edu/) and the PDB (https://www.rcsb.org/) with accession codes 34121 and 5NMY, respectively.

## AUTHOR CONTRIBUTIONS

PP conceived and designed the research. VR and LK produced and purified the proteins. VR and HM performed enzyme activity studies. UP performed SAXS measurements and SAXS data analysis. HT carried out MD simulations. HT, LK, and PP conducted CWT domain NMR studies. HT and PP performed full-length lysostaphin NMR studies. All authors wrote the paper.

#### FUNDING

The study was supported by the grants from Academy of Finland (number 288235 to PP and 28348 to UP) and Sigrid Juselius foundation to PP and the Helsinki Graduate Program in Biotechnology and Molecular Biology and Centre for International Mobility (CIMO) to VR.

#### REFERENCES


#### ACKNOWLEDGMENTS

We acknowledge CSC-IT Center for Science Ltd. for the allocation of computational resources. ESRF is thanked for providing the beamline BM29 access.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb. 2018.00060/full#supplementary-material


**Conflict of Interest Statement:** HM was employed by company VTT Technical Research Centre of Finland Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Tossavainen, Raulinaitis, Kauppinen, Pentikäinen, Maaheimo and Permi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Molecular Bases of the Dual Regulation of Bacterial Iron Sulfur Cluster Biogenesis by CyaY and IscX

Salvatore Adinolfi1†‡, Rita Puglisi 1‡, Jason C. Crack <sup>2</sup> , Clara Iannuzzi 1,3, Fabrizio Dal Piaz <sup>4</sup> , Petr V. Konarev 5,6, Dmitri I. Svergun<sup>7</sup> , Stephen Martin<sup>8</sup> , Nick E. Le Brun<sup>2</sup> and Annalisa Pastore1,9 \*

<sup>1</sup> The Wohl Institute, King's College London, London, United Kingdom, <sup>2</sup> School of Chemistry, University of East Anglia, Norwich Research Park, Norwich, United Kingdom, <sup>3</sup> DBBGP, Università degli Studi della Campania "L. Vanvitelli," Naples, Italy, <sup>4</sup> Dipartimento di Medicina, Chirurgia e Odontoiatria "Scuola Medica Salernitana"/DIPMED, Universita' di Salerno, Fisciano, Italy, <sup>5</sup> A.V. Shubnikov Institute of Crystallography of Federal Scientific Research Centre "Crystallography and Photonics" of Russian Academy of Sciences, Moscow, Russia, <sup>6</sup> National Research Centre "Kurchatov Institute," Moscow, Russia, <sup>7</sup> European Molecular Biology Laboratory, Hamburg, Germany, <sup>8</sup> The Frances Crick Institute, London, United Kingdom, <sup>9</sup> Department of Molecular Medicine, University of Pavia, Pavia, Italy

#### Edited by:

Irene Diaz-Moreno, Universidad de Sevilla, Spain

#### Reviewed by:

Antonio J. Díaz Quintana, Universidad de Sevilla, Spain Michael Sattler, Technische Universität München, Germany

#### \*Correspondence:

Annalisa Pastore annalisa.pastore@crick.ac.uk

#### Present Address:

Salvatore Adinolfi, Dipartimento di Scienza e Tecnologia del Farmaco, Universita' di Torino, Italy

†

‡ These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences

Received: 09 November 2017 Accepted: 22 December 2017 Published: 02 February 2018

#### Citation:

Adinolfi S, Puglisi R, Crack JC, Iannuzzi C, Dal Piaz F, Konarev PV, Svergun DI, Martin S, Le Brun NE and Pastore A (2018) The Molecular Bases of the Dual Regulation of Bacterial Iron Sulfur Cluster Biogenesis by CyaY and IscX. Front. Mol. Biosci. 4:97. doi: 10.3389/fmolb.2017.00097 IscX (or YfhJ) is a protein of unknown function which takes part in the iron-sulfur cluster assembly machinery, a highly specialized and essential metabolic pathway. IscX binds to iron with low affinity and interacts with IscS, the desulfurase central to cluster assembly. Previous studies have suggested a competition between IscX and CyaY, the bacterial ortholog of frataxin, for the same binding surface of IscS. This competition could suggest a link between the two proteins with a functional significance. Using a hybrid approach based on nuclear magnetic resonance, small angle scattering and biochemical methods, we show here that IscX is a modulator of the inhibitory properties of CyaY: by competing for the same site on IscS, the presence of IscX rescues the rates of enzymatic cluster formation which are inhibited by CyaY. The effect is stronger at low iron concentrations, whereas it becomes negligible at high iron concentrations. These results strongly suggest the mechanism of the dual regulation of iron sulfur cluster assembly under the control of iron as the effector.

Keywords: desulfurase, enzyme activity, frataxin, isc operon, iron chaperone, iron sulfur cluster

## INTRODUCTION

Iron and sulfur are elements essential for life thanks to their unique redox properties. Yet, they are highly toxic. An efficient way to store them in cells in a nontoxic form is through formation of iron sulfur (Fe-S) clusters, labile prosthetic groups involved in several essential metabolic pathways (for a review Roche et al., 2013; Py and Barras, 2014). Assembly of Fe-S clusters is carried out by highly conserved machines which, in prokaryotes, are encoded by the suf, nif, and isc operons. Isc is the most general machine with highly conserved orthologs in eukaryotes.

In E. coli, the isc operon contains eight genes (i.e., iscR, iscS, iscU, iscA, fdx, hscA, hscB, and iscX; Zheng et al., 1998). Among the corresponding gene products, the most important players are the cysteine desulfurase IscS (EC 2.8.1.7), which converts cysteine to alanine and IscS-bound persulfide (Flint, 1996), and IscU, a transient scaffold protein which forms a complex with IscS (Agar et al., 2000; Urbina et al., 2001). The last component of the machine according to the order of genes in the operon is IscX (also known as YfhJ), a small acidic protein about which very little is known (Tokumoto and Takahashi, 2001; Tokumoto et al., 2002). IscX, the Cinderella of the isc operon, is not essential, in contrast to the other isc proteins (Tokumoto and Takahashi, 2001). It is exclusively present in prokaryotes and in eukaryotes of the Apicomplexa, where it is highly conserved (Pastore et al., 2006). Based on its phylogenetic occurrence, IscX seems to depend on the presence of IscS whereas the reverse is not the case (Pastore et al., 2006).

The structure of E. coli IscX consists of a classical helix-turnhelix fold often found in transcription regulators (Shimomura et al., 2005; Pastore et al., 2006). In vitro studies have shown that IscX is able to bind IscS, thus suggesting a role for IscX as a molecular adaptor (Pastore et al., 2006; Shi et al., 2010; Kim et al., 2014). IscX also binds to iron, although with an affinity at least one order of magnitude weaker than that of CyaY, through a negatively charged surface, the same by which it recognizes IscS (Pastore et al., 2006). An exposed negatively charged iron binding surface which overlaps with the surface of interaction with IscS is also a feature present in another protein, CyaY (the ortholog in bacteria of the eukaryotic frataxin). This protein has attracted much attention because in humans it is associated with Friedreich's ataxia (Pastore and Puccio, 2013). In contrast to IscX, frataxins are proteins highly conserved from bacteria to high eukaryotes (Gibson et al., 1996) and are essential in eukaryotes (Li et al., 1999). In prokaryotes, CyaY is external to the isc operon but has extensively been implicated in Fe-S cluster assembly (Huynen et al., 2001). We have in the past shown that CyaY is an IscS regulator, which dictates the enzymatic assembly of Fe-S clusters (Adinolfi et al., 2009; Prischi et al., 2010a). Intriguingly, IscX and CyaY compete for the same site on IscS (Prischi et al., 2010a; Shi et al., 2010; Kim et al., 2014). A genetic interaction between CyaY and IscX was also demonstrated by a recent study which has validated a role of IscX as a new bona fide Fe-S cluster biogenesis factor (Roche et al., 2015). In some species, CyaY and IscX seem to replace each other.

These data raise the compelling question of whether there could be a functional link between CyaY and IscX, which could both elucidate the function of IscX and explain why these two proteins compete for the same binding site. Given the high conservation between prokaryotic and eukaryotic frataxins and desulfurases, answering this question could also inspire new studies on the regulation of eukaryotic frataxin. Using a complementary approach which makes use of enzymology, cross-linking, and structural methods, we provide conclusive evidence indicating that IscX is a modulator of CyaY that switches off the inhibitory properties of CyaY as a function of the iron concentration. At low iron concentrations, IscS is under IscX control which has no inhibitory capacity. At high iron concentrations the system becomes controlled by CyaY. Based on our results, we suggest a general scheme that supports a role of CyaY as an iron sensor and provides the first testable indications concerning the function of IscX.

#### MATERIALS AND METHODS

#### Protein Production

All proteins used are from E. coli. Their sequences were subcloned in a pET-24d vector modified as fusion proteins with His-tagged glutathione-S-transferase (GST). The constructs were expressed in BL21(DE3). Bacteria expressing IscU were grown in Luria Broth enriched medium containing 8.3 mg ZnSO<sup>4</sup> (Prischi et al., 2010b) to stabilize its fold. The proteins were purified as previously described (Musco et al., 2000; Pastore et al., 2006) by affinity chromatography using Ni-NTA agarose gel and cleaved from the His-GST tag by Tobacco virus protease (TEV) protease. The mixture was reloaded on Ni-NTA gel to separate the His-tagged GST. This was further purified by gelfiltration chromatography on a Sephadex G-75 column. All the purification steps were performed in the presence of 20 mM βmercaptoethanol. The protein purities were checked by SDS-PAGE and by mass-spectrometry of the final product. Protein concentrations were determined by absorbance at 280 nm using ε280 nm of 41,370 or 19,480 or 28,990 M−<sup>1</sup> cm−<sup>1</sup> for IscS or IscX or CyaY, respectively.

### Biolayer Interferometry

All experiments were performed in 20 mM HEPES (pH 7.5), 150 mM NaCl, 2 mM TCEP, 0.5 mg/mL BSA on an Octet Red instrument (ForteBio, Inc., Menlo Park, CA) operating at 25◦C. Streptavidin coated biosensors with immobilized biotinylated IscS were exposed to different concentrations of IscX (0– 50µM). Alternatively, binding affinity between IscS and IscX was evaluated by competition with CyaY. For this experiment, streptavidin coated biosensors with immobilized biotinylated CyaY were exposed to 10µM IscS at different concentrations of IscX (0–60µM). Apparent K<sup>d</sup> values were estimated by fitting the response intensity as a function of the concentration. Since BLI does not give information about reaction stoichiometry, the data were fitted to a simple 1:1 binding model using non-linear least squares methods (Levenberg Marquardt algorithm).

#### Enzymatic Assays

All enzymatic experiments to form Fe-S clusters on IscU were performed in an anaerobic chamber (Belle technology) under nitrogen atmosphere. The kinetics were followed at 456 nm as a function of time by absorbance spectroscopy using a Cary 50Bio Varian spectrophotometer. The initial rates were measured by incubating 1µM IscS, 50µM IscU, 250µM Cys, 2 mM DTT, and 25µM Fe2<sup>+</sup> in 50 mM Tris-HCl (pH 7.5) and 150 mM NaCl. When added CyaY was 5µM. The reactions were initiated after half an hour incubation by addition of 1µM IscS and 250µM Cys. Each measurement was repeated at least five times on different batches of proteins. The data were always reconfirmed by CD performed under the same conditions.

#### Cross Linking Assays

A mixture of 8µM IscS was mixed with increasing molar ratios of IscX (0–80µM) in PBS buffer. Bis[sulfosuccinimidyl]suberate was added to the protein to a final concentration of 2.5 mM. The reaction mixture was incubated at room temperature for 30 min and quenched by adding Tris-HCl (pH 8) to a final concentration of 20 mM. The quenching reaction was incubated at room temperature for 15 min. Identification of cross-linked peptides was performed using a classic mass-based peptide mapping approach (see Supplementary Material). A further confirmation of the identity of the cross-linked peptides was achieved using the StravoX software, performing the "shuffle sequences but keep protease site" decoy analysis.

### Mass Spectrometry under Non-denaturing Conditions

Solutions of IscS (3µM) were mixed with increasing concentrations of IscX (0–16 IscX/IscS molar ratios). Samples were incubated at room temperature for 5 min before being loaded in a 500 µl gas-tight syringe (Hamilton) and infused directly in a Bruker micrOTOF-QIII mass spectrometer (Bruker Daltonics) operating in the positive ion mode. The ESI-TOF was calibrated online using ESI-L Low Concentration Tuning Mix (Agilent Technologies) and subsequently re-calibrated offline in the 4,000–8,000 m/z region. MS data were acquired over the m/z range 4,000–8,000 continuously for 10 min. For LC-MS, an aliquot of IscS or IscX was diluted with an aqueous mixture of 2% (v/v) acetonitrile, 0.1% (v/v) formic acid, and loaded onto a Proswift RP-1S column (4.6 × 50 mm, Thermo Scientific) attached to an Ultimate 3000 uHPLC system (Dionex, Leeds, UK). Processing and analysis of MS experimental data were carried out using Compass DataAnalysis v4.1 (Bruker Daltonik). Neutral mass spectra were generated using the ESI Compass v1.3 Maximum Entropy deconvolution algorithm. Titration data were fitted using the program DynaFit (BioKin, CA, USA). For further details see Supplementary Material.

#### SAXS Measurements

SAXS data were collected on the EMBL P12 beamline on the storage ring PETRA III (DESY, Hamburg, Germany; Blanchet et al., 2015). Solutions of IscX-IscS complexes were measured at 25◦C in a concentration range 0.5–10.0 mg/mL at 1:1, 1:2, 1:20, and 1:40 molar ratios. The data were recorded using a 2 M PILATUS detector (DECTRIS, Switzerland) at a sampledetector distance of 4.0 m and a wavelength of λ = 0.124 nm, covering the range of momentum transfer 0.04 < s < 5.0 nm−<sup>1</sup> (s = 4π sinθ/λ, where 2θ is the scattering angle). No measurable radiation damage was detected by comparison of 20 successive time frames with 50 ms exposures. The data were averaged after normalization to the intensity of the transmitted beam and the scattering of the buffer was subtracted using PRIMUS (Konarev et al., 2003). For further details on data treatment see Supplementary Material.

### RESULTS

### IscX Has Different Effects on the Kinetics of Cluster Formation as a Function of Concentration

We started by exploring the effect of IscX on the rates of enzymatic assembly of the Fe-S cluster on IscU using an assay in which the cluster forms, under strict anaerobic conditions, through IscS-mediated conversion of cysteine to alanine and persulfide and is reconstructed on IscU (Adinolfi et al., 2009). We performed the experiment at increasing concentrations of IscX in the range 1-50µM using 1µM IscS, 50µM IscU, 250µM Cys, 2 mM DTT, and 25µM Fe2+. We did not observe significant variations of the kinetics (**Figures 1A,B**) up to ca. 10 µM, a range in which the initial rates in the presence or absence of IscX were superposable, revealing that IscX does not have inhibitory effects under these conditions. Higher concentrations of IscX (from 20 to 50µM, i.e., higher IscX/IscS ratios) led instead to inhibition of the reaction. Similar observations were

FIGURE 1 | Effect of increasing concentrations of IscX on the enzymatic kinetics of Fe-S cluster formation on IscU. (A) Kinetics of cluster formation at increasing concentrations of IscX as measured by absorbance at room temperature. (B) Plot of the initial rates at increasing concentrations of IscX. Negative values are due to imperfect subtraction from the baseline and should be considered as zero (full inhibition). Only the initial part of the kinetics for each IscX concentration point was considered. The slope of these linear regions (for each IscX concentration) was used as the initial velocity. A molar extinction coefficient of 10.5 mM−<sup>1</sup> cm−<sup>1</sup> was assumed to convert the absorbance in concentration of Fe-S cluster produced. (C) Kinetics followed by CD. The assays were carried out using 1µM IscS, 50µM IscU, 250µM Cys, 2 mM DTT, and 25µM Fe2<sup>+</sup> and increasing concentrations of IscX as indicated. For comparison, the experiment was also repeated in the presence of 5µM CyaY. The error bars are not shown for sake of clarity but were typically within 5% or less.

made using circular dichroism (CD) (**Figure 1C**). The results at higher IscX/IscS ratios are in agreement with a previous study (Kim et al., 2014) in which the kinetics were performed at high concentrations of both Fe2<sup>+</sup> (125µM) and IscX (25µM), but the results at lower IscX/IscS ratios are new and surprising. Why does IscS have a different behavior at low and high ratios? There are different hypotheses which could explain these observations. First, the effect we observed at high IscX/IscS ratios could be explained by considering that IscX binds to iron and at high iron:IscX molar ratios (i.e., 80:1) aggregates (Pastore et al., 2006; Prischi et al., 2010a). This could lead to a loss of available iron which could be sequestered by IscX. However, it was noted that aggregation occurs only when the assay is carried out at very low ionic strength. When salt is present, as was the case here, aggregation is not observed. Second, at low ratios the IscX occupancy on IscS could be too low to detect an effect because the complex has low affinity. Third, there could be a secondary binding site for IscX on IscS which is occupied only when the primary site is fully occupied. Only the 2:1 IscX-IscS complex would behave as an inhibitor.

#### The Presence of IscU Does Not Influence the Affinity of IscX for IscS

To test the complex occupancy, we reconsidered the dissociation constants. We had previously estimated by fluorescence labeling and calorimetry dissociation constants (Kds) of 12 and 20µM for the binary IscX-IscS and CyaY-IscS complexes respectively (Pastore et al., 2006; Prischi et al., 2010a). The IscX-IscS complex should thus be >50% populated under the conditions of our cluster formation assay. However, the presence of IscU could in principle modify these affinities as it is observed for binding to IscS when both IscU and CyaY are present. Also, although CyaY and IscX compete for the same site of IscS, the IscS dimer could have, at low IscX-IscS ratios, simultaneous occupancy of both CyaY and IscX. We used Biolayer Interferometry (BLI), a technique which can measure weak molecular interactions between several partners, to test this possibility. When we immobilized IscS on the surface and titrated with IscX only, we obtained a K<sup>d</sup> of 8 ± 3µM for the IscX-IscS binary complex, in excellent agreement with the previous measurements (data not shown). When we immobilized CyaY on the surface, saturated with IscS and titrated with increasing quantities of IscX we obtained a K<sup>d</sup> of 6 ± 2µM for the IscX-IscS complex (**Figures 2A,B**). Thus, the presence of CyaY on IscS does not modify the affinity of IscX for the desulfurase, and so major allosteric effects or binding cooperativity between the two proteins are unlikely. When we tested binding of IscX to the IscU-IscS complex (immobilizing IscS saturated with IscU), we obtained a K<sup>d</sup> of 8 ± 2µM, a value comparable to the one obtained in the absence of IscU (**Figures 2C,D**). These data indicate that the presence of IscU bound to IscS does not affect the affinity for IscX and that the affinity of IscX for IscS is higher than that of CyaY. Thus, the effect observed at different IscX-IscS ratios cannot be ascribed to insufficient occupancy.

### IscX Has Two Different Binding Sites on IscS

BLI proved inconclusive toward the presence of a secondary binding site. This could either be because we did not explore sufficiently high molar ratios, or because BLI depends on there being a significant change in the distance between the sensor's internal reference layer and the solvent interface. For formation of some complexes this can be rather small and thus undetectable. We thus used electrospray ionization (ESI) mass spectrometry (MS) under non-denaturing conditions, a technique which, if optimized, provides direct information on all complexes present in a solution. The m/z spectrum (4,100–6,300 m/z) of IscS displayed well-resolved charge states (Figure S1A). The deconvoluted spectrum of IscS revealed a major peak at 91,035 Da consistent with the presence of dimeric IscS (predicted mass 91,037 Da) (Figure S1B and Table S1). Occasionally, shoulder peaks at +32 Da intervals on the high mass side of the peak were observed; these are likely due to the presence of one or more sulfane sulfur atoms, as previously observed for other proteins (Crack et al., 2017). An unknown adduct at +304 Da relative to the main IscS peak was also observed (Figure S1B). Addition of IscX to IscS to an 8:1 IscX/IscS molar ratio resulted in significant changes in the m/z spectrum. A complex pattern of new charge states, superimposed over those of IscS, were observed, consistent with the presence of IscX-IscS complexes (Figure S2). The deconvoluted spectrum (spanning 90 to 125 kDa) revealed the presence of four new species, corresponding to dimeric IscS in complex with up to 4 IscX molecules (at intervals of ∼7.9 kDa) (**Figure 3A**). The observed mass of IscX was 7935 Da, +76 Da higher than the predicted mass of 7859 Da. LC-MS showed that a minor amount (<1%) of IscX was detected at the expected mass of 7859 Da, suggesting a covalent modification of the majority of the IscX protein. This is most likely due to adduct formation of IscX with β-mercaptoethanol during purification to give a mixed disulfide, which would have a predicted mass of 7935 Da (7859 + 78 − 2 = 7935 Da). Collision induced dissociation (CID) data supported this conclusion (Figure S3).

When increasing concentrations of IscX were added to IscS, we observed the gradual formation of dimeric IscS complexes containing 1 to 4 IscX monomers (**Figures 3B–E**). The complex of IscS with a single IscX protein, (IscX)(IscS)2, formed readily at low levels of IscX ([IscX]/[IscS]≈0.3) and maximized at [IscX]/[IscS]≈4 (**Figure 3B**). The (IscX)2(IscS)<sup>2</sup> complex was detectable at a [IscX]/[IscS] ratio≈1 and reached maximum abundance by [IscX]/[IscS]≈8 (**Figure 3C**). (IscX)3(IscS)<sup>2</sup> and (IscX)4(IscS)<sup>2</sup> complexes were evident at [IscX]/[IscS] ratios ≈4 and ≈8, respectively (**Figures 3D–F**). The data were analyzed according to a sequential binding model. The resulting fit of the data revealed that binding of the first two IscX molecules to form (IscX)(IscS)<sup>2</sup> and (IscX)2(IscS)<sup>2</sup> occurs with a similar affinity, K<sup>d</sup> = 13.7 ± 0.4µM, consistent with the binding constant obtained by fluorescence and BLI. Binding of the third and fourth IscX molecules [to form (IscX)3(IscS)<sup>2</sup> and (IscX)4(IscS)2] was also found to occur with comparable affinity, with a K<sup>d</sup> = 170.0 ± 10µM, that is significantly lower than the primary site. That the binding behavior is well described by two pairs of dissociation

constants indicates that the IscS dimer contains two equivalent pairs of IscX binding sites, with one primary (high affinity) and one secondary (low affinity) site per IscS protein.

This evidence conclusively supports the presence of two binding sites for IscX on IscS.

### Mapping the Two Binding Sites by Cross-Linking

We next used cross-linking experiments to map the IscX binding sites on IscS, attempting to transform the noncovalent interaction in a stable covalent bond. In this assay, a cross-linking agent was added to link covalently proteins in close spatial proximity (Watson et al., 2012). We used bis[sulfosuccinimidyl]suberate (BS3), a cross-linking agent that reacts with primary amino groups up to ∼11.5 Å apart, to mixtures of IscS and IscX at different molar ratios. The reaction produced two covalently attached protein complexes with apparent molecular weights of 52 and 60 kDa, tentatively identified on PAGE as 1:1 and a 2:1 IscX-IscS complexes respectively (**Figure 4**). Mass spectrometry (MS) confirmed the presence of only these two proteins in both bands. Identification of the cross-link nature was achieved by subjecting to trypsin digestion the two species, as well as the isolated IscS. The obtained peptide mixtures were then analyzed by MALDI/MS (Table S2). Comparison between the spectra acquired for IscS and for the two complexes allowed us to detect a signal at m/z 3121.5 ± 0.8 in the lower band which corresponds to the peptide 85–101 of IscS bound to the 1–9 region of IscX (theoretical mass 3120.6 Da). This value is uniquely compatible with a BS3 mediated cross-link involving Lys4 of IscX and Lys92, Lys93, or Lys95 of IscS. This identification was confirmed by HR LC-MS analyses and MS/MS data (Figures S4, S5). In the upper band, we could identify the same fragment together with another one (m/z 2424.2 ± 0.5), which involves cross-linking between Lys101 of IscS and Lys4 of IscX (theoretical mass 2423.3 Da). As a control, we used an IscS mutant (IscS\_R220/223/225E) which abolishes the interactions of IscS with IscX and CyaY (Prischi et al., 2010a; Kim et al., 2014). We also used IscS\_K101/105E which includes one of the lysines implicated in cross-linking with IscX. Formation of the IscX-IscS complex was strongly affected in both cases: binding was almost completely abolished with IscS\_R220/223/225E, whereas the secondary binding site was barely detectable with the IscS\_K101/105E mutant (**Figure 4**). As a control, when the same cross-linker was used with mixtures of CyaY and IscS only one band was observed (data not shown). These results map the positions of the two binding sites on IscS.

### Structural Characterization of the Secondary Binding Site of IscX on IscS

Since it is difficult to assess the significance of these binding sites without a working structural model, we used small-angle X-ray scattering (SAXS) to grasp the shape of the 1:1 and 2:1

FIGURE 3 | ESI-MS investigation of complex formation between IscS and IscX. (A) Deconvoluted mass spectrum of IscS over the mass range 90–125 kDa, showing the presence of the IscS dimer (black spectrum). Addition of IscX at an 8:1 excess gave rise to a series of IscX-IscS complexes in which the IscS dimer is bound by 1–4 IscX protein molecules (red spectrum). (B–E) Deconvoluted mass spectra at increasing ratios of IscX to IscS showing the formation/decay of the four IscX-IscS complexes, as indicated. A +340 Da adduct species is present in each of the spectra, including that of the IscS dimer, indicating that it originates from IscS. The precise nature of the adduct is unknown, but it is likely to arise from two β-mercaptoethanol hetero-disulfides per IscS (4 × 76 = 304 Da), as the protein is in a solution containing 20 mM β-mercaptoethanol. (F) Plots of relative intensity of the four IscX-IscS complexes, as indicated, as a function of IscX concentration. Solid lines show fits of the data to a sequential binding model for 1–4 IscX per IscS dimer. IscS (3µM) was in 250 mM ammonium acetate, pH 8. Note that abundances are reported relative to the most abundant species, which is arbitrarily set to 100%.

FIGURE 4 | Cross-linking experiments to identify the interacting sites on IscX and IscS using different relative concentrations of IscS and IscX. (A) Experiments using wild-type IscS (8µM) and IscX at molar ratios 1:10, 1:7.5, 1:5, 1:2.5, and 1:1 from left to right. The last two lanes on the right are the controls carried out using isolated IscS and IscX. As expected the IscS dimer disassembles in SDS and runs as a monomer of ca. 45 kDa. (B) The same as in A but with the IscS\_R220/223/225E mutant. (C) The same as in A but with the IscS\_K101/105E mutant. The markers are indicated on the right.

FIGURE 5 | SAXS measurements. (A) X-ray scattering patterns for IscX-IscS at molar ratios of 1:1 (at 5 mg/ml solute concentration), 2:1 (at 3 mg/ml), 20:1 and 40:1 (both at 0.5 mg/ml). The experimental data are displayed as dots with error bars. The scattering from a mixture of two rigid body models (1:1 and 2:1 IscX-IscS complexes) obtained by SASREFMX (for data with 1:1 molar ratio) and the fits by mixtures of unbound IscX, IscS dimers, and 1:1/2:1 rigid body complexes (for data at 2:1, 20:1, and 40:1 molar ratios) as obtained by OLIGOMER are shown by solid lines. The plots display the logarithm of the scattering intensity as a function of the momentum transfer. The distance distribution functions are indicated in the insert. (B) Backbone representation of the 2:1 IscX-IscS complex. The two IscS protomers are shown in different shades of gray. Residues which are either involved in cross-linking or have an effect on binding are indicated. The catalytic loop of IscS is indicated in purple. (C) Same as in (B) in a full atom representation. For reference, IscU (in green) is included (3LVL) to clarify the relative positions of the components. (D) Comparison with the model previously obtained for the ternary complex CyaY-IscS-IscU (Shi et al., 2010). Light and dark gray represent each of the protomers of the IscS dimer.

complexes. We recorded the scattering intensity patterns I(s) as a function of the momentum transfer s and the computed distance distribution function of IscX-IscS data (at 1:1 molar ratio, 5 mg/ml; **Figure 5A**). The symmetric shape of the distance distribution function p(r) (**Figure 5A, inset**) is typical of a globular protein. The Dmax value of the IscX-IscS complex (11.0 ± 0.5 nm) is comparable with that found for the CyaY-IscS complex and significantly lower than that observed for the IscU-IscS complex (12.1 nm). This is compatible with an arrangement in which IscX does not bind along the main axis of the IscS dimer but laterally increasing the IscS shape globularly (Shi et al., 2010). The experimental Rg and excluded volume (VPorod) (3.09 ± 0.04 and 135 ± 10 nm<sup>3</sup> , respectively) are almost equal to the IscS homodimer (3.09 ± 0.04 and 136 ± 10 nm<sup>3</sup> , respectively) which suggests a partial dissociation of the complex (Table S3). The molecular mass (92 ± 5 kDa) is also lower than that expected for a 1:2 IscX-IscS complex with full occupancy (117 kDa). This is in agreement with the presence of unbound species in solution due to the weak binding constants. The Rg and VPorod significantly drop when excesses of IscX were used (3.01 ± 0.03 nm/56 ± 5 nm<sup>3</sup> for the 20:1 ratio and 2.75 ± 0.03 nm/31 ± 4 nm<sup>3</sup> for the 40:1 ratio). The higher the volume fraction of small species in the mixture (i.e., IscX molecules), the smaller is the average estimated excluded volume of the system. To further prove complex formation upon addition of IscX, all scattering curves recorded at different concentrations and different stoichiometries were analyzed using a modelindependent Singular Value Decomposition (SVD) approach (Golub and Reinsch, 1970). SVD clearly points to the presence of four components contributing to the scattering data in agreement with the presence of free IscS, free IscX, and their 1:1 and 2:1 complexes (Figure S6). To construct a low resolution model of the IscX-IscS complex, rigid body modeling was used taking into account the possibility of 1:1 and 2:1 complexes. The interaction sites of IscS and IscX determined by NMR and cross-linking experiments were used as distance restraints between IscX and the possible binding sites of IscS (Pastore et al., 2006; Kim et al., 2014). Multiple runs of rigid body modeling program SASREFMX starting from random initial configurations with applied P2 symmetry yielded models of the IscS/IscX complexes fitting well the scattering data taken at a 1:1 molar ratio with a discrepancy χ <sup>2</sup> = 1.7. The presence of unbound (free) components was taken into account by using free IscS dimers and IscX monomers as independent components in addition to the SASREFMX models. Their relative volume fractions were further refined with the program OLIGOMER to best fit all the available data (**Table S4**). The volume fraction of free IscS decreases at a large excess of IscX consistent with an almost full occupancy of IscX. While it is difficult to distinguish between the fitting for 1:1 and 2:1 complexes, the fits assuming both types of complexes are somewhat better than those obtained assuming the presence of 2:1 complexes only. We estimated ca. 60–70% of 2:1 IscX-IscS complexes. The structure of the 1:1 complex is in excellent agreement with a previous model obtained by the same hybrid technique (Kim et al., 2014), IscX sits in the catalytic pocket of IscS contributed by both protomers of the IscS dimer (**Figures 5B,C**). This is the site where also CyaY binds (Prischi et al., 2010a; Kim et al., 2014; **Figure 5D**) thus explaining the direct competition between the two molecules. Experimental validation of this binding site has been supported by mutation studies of IscS (Shi et al., 2010). The second site is close but shifted toward the IscU-binding site.

The presented models of 2:1 and 1:1 IscX:IscS complexes should be treated as tentative given the potential ambiguity of SAXS modeling and the limitations of rigid body approach assuming that no conformational rearrangements are observed upon complex formation. Still, the obtained models permitted us to neatly fit the complete SAXS data set recorded at different conditions, agree with the cross-linking data and also offer a logical explanation for the observed effects. We have previously demonstrated by molecular dynamics that binding of CyaY in the primary site restricts the motions of the catalytic loop which is thought to transfer persulfide, bound to Cys237, from the catalytic site to IscU (di Maio et al., 2017). As compared to CyaY, IscX is smaller and would not be sufficient to block the movement of the catalytic loop as long as only the primary site of IscX is occupied. In this position, IscX simply prevents CyaY from binding, having higher affinity, but does not act as an inhibitor. However, binding of another IscX molecule with its N-terminus close to K101, as observed by cross-linking, can strongly increase the steric hindrance and result in inhibition of iron sulfur cluster formation resulting in an effect similar to that observed for CyaY.

### Understanding the Competition between IscX and CyaY

We then explored more closely the conditions under which IscX competes with CyaY (Shi et al., 2010; Kim et al., 2014). We first repeated the enzymatic assays keeping the conditions unchanged (1µM IscS, 50µM IscU, 250µM Cys, 2 mM DTT, and 25µM Fe2+), adding CyaY (5µM) and progressively increasing the concentration of IscX in the range 0–50µM. Below 10µM (i.e., below 2:1 IscX-CyaY and 10:1 IscX-IscS molar ratios), where IscX has no effect on IscS activity as demonstrated above, the experiment results in a progressive rescuing of CyaY inhibition (**Figure S7**), in agreement with a competition between the two proteins for the same binding site on IscS. The effect of IscX becomes noticeable at substoichiometric concentrations as compared to CyaY, as expected from the respective dissociation constants. However, even under the maximal condition of rescuing activity (10µM), IscX is able to reach only ca. 70% of the rate compared to a control experiment performed in the absence of CyaY where there is no inhibition. This reflects the fact that, at relatively high IscX/IscS ratios, the secondary binding site, which also leads to inhibition, starts being populated probably favored by a conformational change or an electrostatic rearrangement. Above a 10:1 IscX/IscS ratio, increase of the IscX concentration results in progressively marked inhibition (**Figure 6A**). When we titrated instead the system with CyaY (0-20 µM), keeping IscX fixed at 5µM, we started observing a decrease of the intensities at 1:1 IscX/CyaY molar ratios and a marked inhibition for higher CyaY ratios (**Figure 6B**).

We then put our results within the context of the protein concentrations observed in E. coli. Several independent studies have reported these under different growth conditions (Ishihama et al., 2008; Taniguchi et al., 2010; Li et al., 2014; Schmidt et al., 2016). Under non-stress conditions, both CyaY and IscX are usually, but not always, present at substoichiometric ratios as compared to IscS, with IscX in excess over CyaY (Li et al., 2014). Different conditions can however remarkably change the molar ratios. We thus repeated the enzymatic assay using concentrations comparable to the non-stress conditions to verify that IscX controls the reaction under these conditions (4.1µM IscS, 250µM Cys, 2 mM DTT, 1.0µM IscX, 0.7µM CyaY, and 25µM Fe2+). We used IscU in a large excess (50µM) because it is the reporter. We preferred to use this protein rather than another reporter because the IscU binding site on IscS does not interfere with CyaY or IscX (Pastore et al., 2006; Prischi et al., 2010a). At these ratios we do not observe inhibition indicating that the reaction is under the control of IscX (**Figure 6C**). Taken together, these results fully confirm the competing role of IscX and CyaY.

#### The Modulation Effect of IscX and CyaY on Cluster Formation on IscU Is Iron Dependent

Finally, we tested the effect of the competition between CyaY and IscX on Fe-S cluster formation on IscU as a function of increasing iron concentrations keeping the other concentrations unchanged (1µM IscS, 50µM IscU, 250µM Cys, 2 mM DTT) because we had already observed that CyaY inhibition is enhanced by iron (Adinolfi et al., 2009; **Figures 7A,B**). As a control, we compared the variation of the initial rates as a function of iron concentration in the presence of IscU or CyaY individually. Cluster formation in the presence of the acceptor IscU only increased with the concentration of iron. The presence of CyaY (5µM) drastically reduced the rate of cluster formation. The difference between the

IscX:CyaY ratios. (B) As in (A) but with the concentration of IscX fixed to 5µM and various concentrations of CyaY. (C) Enzyme kinetics carried out with concentration ratios similar to those observed in the cell (4.1µM IscS, 250µM Cys, 2 mM DTT, 1.0µM IscX, 0.7µM CyaY, and 25µM Fe2+).

initial rates in the absence and in the presence of CyaY increased at increasing concentrations of iron. Addition of IscX to the reaction together with CyaY (1 and 5µM respectively) generated a "rescuing" effect but only at low iron concentrations. Under these conditions, the effect reached a maximum around 15µM iron. At higher iron concentrations, the curve obtained in the presence of both CyaY and IscX became similar to that obtained with CyaY alone. This implies that, at lower iron concentrations, IscX has higher affinity for IscS than does CyaY, impairing the ability of CyaY to inhibit cluster formation. The effect is reversed at higher iron concentrations where CyaY must acquire a higher affinity.

To dissect whether iron dependence is controlled by CyaY, IscX or both proteins together, we fixed the concentration of IscX to 1µM and repeated the experiments varying the concentrations of Fe2<sup>+</sup> from 10 to 100µM in the absence of CyaY and in the presence or in the absence of IscX. We observed that the initial rates increased at increasing Fe2<sup>+</sup> concentrations but were independent of the presence of IscX (**Figure 7C**). Thus, the interaction of IscX to IscS is Fe2<sup>+</sup> independent in this range of concentrations. This is at variance with CyaY whose interaction with IscS is clearly iron mediated, as previously demonstrated (Adinolfi et al., 2009): the same experiment carried out with CyaY showed a marked difference in the absence and presence of CyaY which increases with iron.

We thus must conclude that IscX acts as a modulator of CyaY inhibition, resulting from the ability of CyaY to sense iron and bind to IscS more tightly at high iron concentrations.

#### DISCUSSION

Since its discovery (Beinert, 2000), iron-sulfur cluster biogenesis has rapidly become an important topic of investigation both because it is a molecular machine at the very basis of life and because of its crucial role in an increasing number of human diseases (Rouault, 2012). The study of this topic in the bacterial system is also specifically important given the important role that iron-sulfur cluster in bacterial infections (Bridwell-Rabb et al., 2012). Much progress has been made in understanding this metabolic pathway but several questions have so far remained unanswered. Why does IscS bind different proteins using the same surface and how is recognition regulated? What is the role of IscX in the isc machine? What is the relationship between IscX and CyaY? We have now tentative answers to these questions.

Previous studies carried out at high IscX concentrations (>10µM) led to the conclusion that IscX, like CyaY, is an inhibitor of cluster formation (Kim et al., 2014). We could reproduce these results but demonstrated that they strongly depend on the relative and absolute concentrations used in vitro. At high IscX-IscS ratios (i.e., >10:1), the inhibitory effect of IscX is the consequence of a secondary binding site on IscS which becomes populated only once the primary site is saturated. At low IscX-IscS ratios, IscX has no effect on cluster formation but efficiently competes with CyaY and modulates its strong inhibitory power also at substoichiometric ratios.

This is an important result which changes our perspective on IscX: this protein is not just "another frataxin-like protein" but the modulator of the inhibitory function of CyaY in bacteria; it is what silences CyaY. Our data are fully consistent with and explain in molecular terms a previous

FIGURE 7 | Iron dependence of the relative effect of CyaY and IscX on cluster assembly enzymatic kinetics on IscU. (A) Initial rates of kinetics of cluster formation on IscU as a function of iron concentration in the presence of IscU only (gray circle), IscU and CyaY (black rectangles) and IscU, IscX, and CyaY (gray triangles). (B) The same as in (A) but expanded for clarity. (C) Assays carried out in the absence and in the presence of IscX (1µM) at increasing concentrations of iron (10µM, 25µM, 50µM, 80µM) to assess whether IscX is affected by iron. No CyaY was added to the assay. All experiment contained 1µM IscS, 50µM IscU, 250µM Cys, 2 mM DTT.

report that conclusively demonstrated a genetic interaction between CyaY and IscX and an additive positive effect on cluster maturation upon deletion of both genes (Roche et al., 2015).

Although the tentative model of 2:1 IscX-IscS complex constructed from SAXS and cross-linking experiments has its limitations, it clearly supports the presence of a secondary binding site and explains the enzymatic assays. The two proteins compete for the same binding site on IscS but IscX is appreciably smaller than CyaY. In the complex with only the primary site occupied, IscX can thus allow free movement of the catalytic loop, which transports the persulfide from the active site to IscU (di Maio et al., 2017). Occupancy of the secondary binding site would interfere instead with the loop movement by steric hindrance, thus producing an inhibitory effect. We also demonstrated that the effect of IscX on cluster formation depends on the iron concentration and that the iron sensor is CyaY and not IscX, whose effect on IscS is iron independent.

We can thus suggest a model based on our results which fully explains the role of both proteins (**Figure 8**). At low iron concentrations, cluster formation would be under the control of IscX and thus continue unperturbed. At higher concentrations, CyaY would have higher affinity for IscS and thus control the cluster formation rate. Additionally, it is likely that, under iron stress conditions, CyaY is up regulated (or IscS down regulated). Independent studies have indeed reported the concentrations of proteins in E. coli (Ishihama et al., 2008; Taniguchi et al., 2010; Li et al., 2014; Schmidt et al., 2016) under different growth conditions. Under normal conditions, IscX and CyaY are in substoichiometric ratios as compared to IscS with CyaY being also lower than, or equimolar, to IscX. The concentrations of the components can however vary considerably (up to 6–8 times) (Ishihama et al., 2008; Taniguchi et al., 2010; Li et al., 2014; Schmidt et al., 2016) according to the conditions, suggesting a tight regulation in which the ratios between the concentrations of IscS, IscX, and CyaY are fine-tuned according to the cellular conditions. While we started investigating the possibility of a secondary binding site of IscX on IscS in the attempt to explain a different behavior at high and low molar ratios, it is tempting to speculate that also the secondary binding site could under given conditions have a physiologic significance: even just a partial occupancy could further modulate the regulation of IscS.

Such a system thus represents a sophisticated way to control the enzymatic properties of IscS: a tight regulation is required for maintaining the number of Fe-S clusters formed under tight control at all times, to match the number of apoproteins present. If this balance is lost, there will be too many "highly reactive" Fe-S clusters not being delivered which will be degraded.

From an evolutionary perspective, our results explain the presence of IscX in prokaryotes and not in most eukaryotes: bacterial IscS is a fully active enzyme, whereas the eukaryotic ortholog Nfs1 is inactive in the absence of its activators frataxin and the eukaryotic specific Isd11 and ACP (Foury and Cazzalini, 1997; Calabrese et al., 2005). As demonstrated in a recent paper, this behavior can be explained by a different quaternary assembly of the desulfurase complex with Isd11 mediated by

the acyl carrier protein in this structure (Foury and Cazzalini, 1997; Calabrese et al., 2005). These results indicate a different regulation of Fe-S cluster biogenesis in eukaryotes compared to prokaryotes. We propose the mechanistic basis of the bacterial regulation based on our results. We can also speculate that, as a regulator of the inhibitory function of CyaY in bacteria, IscX could have disappeared in the passage from prokaryotes to eukaryotes where the desulfurase was deactivated making the double repression unnecessary. This hypothesis would fully explain the difference in the function of frataxin between prokaryotes and eukaryotes, and suggests that IscX is the evolutionary "missing link."

In conclusion, our results place the role of IscX into a new and completely different perspective and provide one of the most elegant examples of a double regulation of an enzyme in which inhibition and counter-inhibition are achieved by exploiting a triple-component system in which two proteins compete for the same binding site on an enzyme depending on the concentration levels of an effector, iron. Similar examples are not rare in nature and have been already described, for instance, in muscles where calcium is able to discriminate between different conformational states of tropomyosin and troponin I to determine different

#### REFERENCES

Adinolfi, S., Iannuzzi, C., Prischi, F., Pastore, C., Iametti, S., Martin, S., et al. (2009). Bacterial frataxin CyaY is the gatekeeper of iron-sulfur cluster formation catalyzed by IscS. Nat. Struct. Mol. Biol. 16, 390–396. doi: 10.1038/nsm b.1579

conformational states of myosin. Through our findings, we can account for the previous literature and suggest a new perspective for the quest to fully elucidate the roles of frataxin and IscX and their functions in the isc machine.

### AUTHOR CONTRIBUTIONS

SA and RP: carried out the enzymatic work; CI and SM: the BLI measurements; FD, JC, and NL: the mass spectrometry analysis; PK and DS: the SAXS measurements; AP: wrote the paper and coordinated the research.

#### ACKNOWLEDGMENTS

This work was supported by MRC (grant number U117584256) and BBSRC (grant BB/P006140/1).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb. 2017.00097/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer AD and handling Editor declared their shared affiliation.

Copyright © 2018 Adinolfi, Puglisi, Crack, Iannuzzi, Dal Piaz, Konarev, Svergun, Martin, Le Brun and Pastore. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Intramolecular Fuzzy Interactions Involving Intrinsically Disordered Domains

#### Miguel Arbesú\*, Guillermo Iruela, Héctor Fuentes, João M. C. Teixeira and Miquel Pons\*

BioNMR Laboratory, Inorganic and Organic Chemistry Department, University of Barcelona, Barcelona, Spain

Structural disorder is an essential ingredient for function in many proteins and protein complexes. Fuzzy complexes describe the many instances where disorder is maintained as a critical element of protein interactions. In this minireview we discuss how intramolecular fuzzy interactions function in signaling complexes. Focussing on the Src family of kinases, we argue that the intrinsically disordered domains that are unique for each of the family members and display a clear fingerprint of long range interactions in Src, might have critical roles as functional sensor or effectors and mediate allosteric communication via fuzzy interactions.

#### Edited by:

paramagnetic relaxation enhancement

Irene Diaz-Moreno, Universidad de Sevilla, Spain

#### Reviewed by:

Antonio J. Díaz Quintana, Universidad de Sevilla, Spain Monika Fuxreiter, University of Debrecen, Hungary

#### \*Correspondence:

Miguel Arbesú miguelarbesu@gmail.com Miquel Pons mpons@ub.edu

#### Specialty section:

This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences

> Received: 02 February 2018 Accepted: 03 April 2018 Published: 30 April 2018

#### Citation:

Arbesú M, Iruela G, Fuentes H, Teixeira JMC and Pons M (2018) Intramolecular Fuzzy Interactions Involving Intrinsically Disordered Domains. Front. Mol. Biosci. 5:39. doi: 10.3389/fmolb.2018.00039

A large majority of proteins are build from domains, classically defined as functional autonomous folding units. In a typical divide-and-conquer approach, the structure-function analysis proceeds through the characterization of the individual domains followed by the study of their mutual interactions. This approach makes a clear distinction between the "functional" domains and the linkers separating them.

Keywords: fuzzy complexes, intrinsically disordered proteins, Src family kinases, allostery, fuzzy domains,

The same strategy is taken in the analysis of the also very abundant multiprotein complexes, in which the individual proteins are considered the building blocks (equivalent to domains of multidomain proteins). Key components of multiprotein complexes are scaffolding proteins, which would play the role of linkers in multidomain proteins.

The current view of protein-protein interactions is quite dynamic and intrinsically disordered regions (IDR) are increasingly recognized as key players.

In this minireview we shall summarize some important aspects of (intermolecular) protein binding by disordered proteins and extend them to the case of interdomain (i.e., intramolecular) binding using the c-Src family of kinases as an example.

#### PROTEIN INTERACTIONS BY INTRINSICALLY DISORDERED PROTEINS

Intrinsically disordered proteins (IDP) or proteins with long IDR form a significant portion of the proteome of eukaryotes and are specially prevalent in signaling and regulation complexes (Iakoucheva et al., 2002).

Protein complexes involving IDRs span a wide range of affinities and lifetimes as well as specificities (Tompa et al., 2015). A recent analysis of K<sup>d</sup> value statistics in the curated DIBS database of IDR-folded protein complexes (Schad et al., 2017) confirms a wide range of affinities spanning from the subnanomolar to the milimolar regimes.

In their review on experimental thermodynamic data from binary protein complexes involving IDPs or ordered proteins, Teilum et al. (2015) found that the 1G ◦ values of their data sets involving IDPs were on average only 2.5 kcal mol−<sup>1</sup> less stable than the values from complexes between ordered proteins. In the two sets, isothermal enthalpyentropy compensation was observed, a general phenomenon in biomolecular recognition processes (Chodera and Mobley, 2013). Thus, favorable binding enthalpy is associated to a loss of entropy due to massive reduction of structural freedom degrees.

The interaction surfaces involving folded proteins or IDRs showed similar amino acid composition and size and the distribution of 1H◦ values were statistically equal; so, all the destabilizing contribution had an entropic origin (–T1S ◦ > 0).

An entropic cost for binding an IDR is intuitively expected, however, the surprising result is its relatively small value suggesting compensatory mechanisms are an important component of IDR interactions. The importance of entropy compensation is highlighted by the similarity in the distribution of 1H◦ values, a very interesting result in itself that emphasizes the underlying short-range similarities between protein-protein interactions involving folded and disordered proteins.

Flock et al. (2014) have reviewed the importance of entropy control to tune IDR function. The importance of entropy in the formation of complexes endows IDR-involving complexes with their unique functional characteristics as molecular rheostats and signal integrators, able to respond in a precise, continuous and dynamic way to varying combinations of inputs with specific outputs.

The formation of a stable complex between two proteins (i.e., with a negative 1G ◦ = 1H◦ –T 1S ◦ ) can be achieved by optimizing the enthalpy gain (1H◦ < 0), increasing the entropic gain (–T 1S ◦ < 0 => 1S ◦ > 0) or minimizing the entropic loss (–T 1S ◦ ≈ 0). A fundamental aspect of the interplay between enthalpy and entropy components is their "locality."

Enthalpy effects usually reflect local short-range interactions and may be considered additive and proportional to the contact surface. Thus, large enthalpy components are usually associated to large contact surfaces, although these interfaces do not have to be necessarily continuous. Electrostatic contributions to the enthalpy, however, are long range. They often drive the partners together (thus reducing the translational and possibly rotational entropy of the system) and, in the formed complex, enable dynamic interactions that minimize the entropy loss upon complex formation. A recent example of a picomolar interaction between two IDPs without a significant loss of flexibility is driven by electrostatics (Borgia et al., 2018).

#### INTERACTING ELEMENTS AND MULTIVALENCY

Entropy contains "local" components associated to the degree of structure achieved by the contact regions of the two interacting partners, as well as more global contributions of which we may distinguish (i) the effect of regions that can remain highly flexible in the complex (thus not contributing an entropic penalty to binding), (ii) the possible preexistence of long range intramolecular contacts restricting the conformational freedom in the free IDR (therefore minimizing the loss of entropy upon complex formation), and (iii) the configurational entropy arising from multiple alternative binding poses ("microstates") contributing to the bound state.

The first two situations reduce the entropic cost of binding through IDRs and correspond to the strategies of not to pay (i) or pre-pay (ii). The third situation actually contributes an entropic gain.

In the interacting regions, pre-pay strategies may take the form of preformed structural elements retained in the complex (Davey et al., 2012; Pancsa and Fuxreiter, 2012) or bound solvent molecules that are retained in the complex in water-mediated interactions (London et al., 2010).

The dominant role of entropy in protein interactions is not restricted to IDPs. The changes in internal dynamics of the catabolite activator protein (CAP), measured by NMR in the entirely protein, explain the dramatic changes in affinity observed in CAP variants that form complexes with identical interfaces (Tzeng and Kalodimos, 2012).

In IDRs the change in entropy upon binding is determined by the interplay between local and global effects. Short linear motifs (LM) play an important role in IDR interfaces. Although the definition of LM is based on bioinformatic studies, they can be interpreted using structural and dynamic concepts. LM are often formed by hydrophobic residues grafted onto a maleable template (Fuxreiter et al., 2007). The expected lower enthalpy of the interaction by short elements, as compared to the large rigid interfaces between ordered proteins, can be partially compensated by the fact that IDPs often adopt extended conformations permitting short motifs to establish a variety of interactions through virtually any element of their backbone or side chains, thus their interacting interfaces have a larger effective area per residue than those of ordered proteins (Gunasekaran et al., 2003). In addition, these short stretches are modular recognition elements that can be combined to form multivalent complexes. Thus, a favorable binding enthalpy, comparable to that found associated to a large, rigid interface, can be achieved by weaker but multiple sparse anchoring elements (Cumberworth et al., 2013). The participating groups may be difficult to identify either experimentally, because interactions are weak, or statistically, because they may appear in a variety of combinations that are not repeated "motifs" (Van Roey et al., 2014).

The number and intrinsic properties of individual interacting regions, as well as the size and dynamic properties of the spacers between them collectively, and therefore non-linearly, determine the binding properties of IDRs.

The effects of multivalent binding have been recognized for a long time, beyond the field of IDPs. The strength by which a multivalent antibiotic binds to its antigen, termed avidity (Crothers and Metzger, 1972), can be explained by the fact that when one of the sites is bound to its cognate site receptor, a second site located close-by binds Arbesú et al. Fuzzy Domains in SFKs

cooperatively, basically because of the lower entropic cost of a (pseudo)-intramolecular interaction (Kitov and Bundle, 2003). If the linker connecting the two sites is flexible, the average distance between the sites is the main factor determining the cooperativity. If the flexibility is limited, the linker may also modulate the relative orientation of the components of the second interacting site. A consequence of the model is that avidity may be modulated either by modifying the interacting sites or the flexibility of the linker (Cerofolini et al., 2013).

The avidity model assumes multiple binding sites but the interacting partners for each site are not interchangeable. The scenario in which the multiple interaction sites in one of the molecules can interact with, and therefore compete for, the same site of the second molecule is referred to as **allovalency** (Klein et al., 2003). Allovalency predicts a dependency on the number of interacting sites, not through simultaneous cooperative binding, because all of them compete for a single site in the second molecule, but through local concentration and rebinding. In the defining example, the binding of Sic1 to Cdc4, the number of interacting sites is actively modulated by the random phosphorylation of up to ten serine and threonine residues, showing a sharp increase in the fraction of bound form after six of them are phosphorylated (Mittag et al., 2010).

#### FUZZY COMPLEXES AND MULTIVALENCY

**Fuzzy complexes**, introduced by Tompa and Fuxreiter (2008) describe binding situations in which at least one of the elements in the complex remains dynamic. Therefore, the complex cannot be properly described by a defined structure but has the characteristics of an heterogeneous ensemble. Importantly, the interaction heterogeneity of the fuzzy complex is an essential component of the functional outcome of complex formation. The functional character of the retained disorder, thus, differentiates a fuzzy complex from a complex including a random region in non-specific contact with the partner. An expanded repertoire of examples can be found in recent reviews (Fuxreiter, 2012; Fuxreiter and Tompa, 2012; Sharma et al., 2015; Miskei et al., 2017).

Structural disorder in fuzzy complexes represents a continuum, from rather rigid polymorphic complexes displaying static disorder with only a few alternative conformations to highly dynamic random complexes. The proportion between regions directly involved in short range contacts and connector regions decreases in this series. Individual regions contributing to 1H◦ < 0 become smaller but may increase in number, thus a favorable enthalpy contribution can be retained. Splitting the interaction interface in many smaller areas, each binding weakly and with high promiscuity, enhances binding degeneracy that contributes an additional entropic term, 1S ◦ configurational, which reflects the contribution stemming from the different forms in which the IDR and its partner can associate.

A recent experimental example is the detailed study of the thermodynamics of the fuzzy complex between the C-terminal IDR of antitoxin CcdA, which adopts α helical structure at the time of binding the toxin dimer CcdB (HadŽi et al., 2017). The authors perform a series of mutations that affect contacting and non-contacting residues. Their results show that mutations in residues not directly involved in protein:protein interaction reduce the degree of structuration both in the bound and free forms (11S ◦ conformational ≈ 0), but also promote alternative isoenergetic configurations (11S ◦ configurational <sup>&</sup>gt; 0 and <sup>11</sup>H◦ ≈ 0) thus minimizing the particular 1G ◦ of the mutant complex.

### NMR FINGERPRINT OF LONG-RANGE ORGANIZATION OF IDR

Fuzziness is not associated to promiscuous binding. The selectivity is encoded in the dynamic, non random, organization of distant potentially interacting regions.

Operationally, a very efficient method to map a set of long range interactions is by measuring the paramagnetic relaxation enhancement (PRE) along the sequence induced by one or several paramagnetic tags (Clore and Iwahara, 2009). Since paramagnetic effects are sensitive to transient interactions and efficient over considerable distances, in the case of disordered proteins the key aspect is to differentiate specific from random coil effects. In this respect, the Konrat's group have introduced the concept of paramagnetic relaxation interference (PRI) by comparing the simultaneous effect of two paramagnetic centers (Kurzbach et al., 2016) with the sum of the individual effects. A differences between these values requires that the two sites move in a correlated fashion. An alternative, often simpler, approach is to compare the observed PREs with the predictions of a random coil model. The 1PRE analysis (Arbesú et al., 2017) clearly identifies the relevant transient contacts in IDRs.

1PRE analysis of several paramagnetically tagged forms of the intrinsically disordered N-terminus of c-Src in the presence and in the absence of the folded neighboring SH3 domain show a very similar profile, confirming the presence of a conserved set of non-random long-range interactions and validating the use of the term **domain** for this intrinsically disordered region (Arbesú et al., 2017). Subtitution of residues important for IDR pre-organization and interdomain contacts showed that the 1PRE profiles are generally conserved upon different perturbations—e.g., same profile trends, location of maxima and minima, etc.—but can also reflect the functional loss of interactions—i.e., consistent contact reduction upon substitution. 1PRE profiling thus provides a structural signature that captures non random ensemble conformational preferences and their associated dynamics. This simple method enables facile comparison for carrying out functional analysis based on mutations.

In an analysis of the Pfam database, which identifies domains based on multiple sequence alignments, Tompa et al. (2009) found that a substantial number of the sequence defined domains contained disordered regions and confirmed that disordered domains are inheritable, evolvable, and functional units. Some domains, such as the Unique domain, which is the most discriminating feature of the distinct Src family kinases (SFK), is not identified as a domain using multisequence alignment methods. This is not surprising since sequence variability is its defining characteristic. We argue that the requirement of "autonomous folding," which would identify a domain without using sequence conservation data, could be replaced in the field of IDPs by that of a conserved, non-random, set of long range interactions.

The PRE and chemical shift perturbation analysis of wildtype Src as well as a number of mutated or truncated variants, also showed the interaction between the intrinsically disordered domain and specific regions of the SH3 domain. Interestingly, the most affected regions in the folded scaffold could be mapped to the loops that decorate its surface. The interaction between the disordered N-terminal region of c-Src and the SH3 domain has the characteristics associated to fuzzy complexes: (i) the disordered region remains highly dynamic, as seen by NMR, (ii) its overall dimension is affected by the presence of the SH3 domain, as seen by Small Angle X-ray Scattering, (iii) the local perturbations sensed by chemical shifts are affected by modifications in distant, well defined parts of the protein, and (iv) mutations in the disordered region cause strong functional effects in the entire protein.

#### INTRAMOLECULAR FUZZY COMPLEXES AS SIGNAL SENSORS

Classical descriptions of multidomain signaling proteins distinguish between regulatory/sensor and catalytical/effector domains. IDR can act as linkers, effectors or sensors (**Figure 1**). IDR-mediated signaling enables complex regulatory behavior, including multiple signal integration and rheostat-like graded responses (Tompa, 2014).

folded sensor and an effector domains. (B) The IDR can become an effector, e.g., by folding as a response to a stimulus. (C) A fuzzy complex, like in Src, can act as a sensor with the folded SH3 domain (in black) taking the role of the linker.

IDRs can act as linkers through which information is propagated to distant regions. This can occur without concomitant structuration through remodeling of the protein free energy landscape affecting the conformer populations and causing specific functional outputs (Tsai et al., 1999; Hilser and Thompson, 2007; Ma et al., 2011; Montlagh et al., 2014). Examples include include the DNA binding Ets-1 transcription factor (Pufall, 2005), the Sic1 cell cycle protein (Mittag et al., 2008), or the Drosophila Ultrabithorax transcription factor (Liu et al., 2008).

IDR conformational ensembles can be modified by "external" signals and modulated by "internal" parameters, such as posttranslational modifications. Thus, they can act as sensitive **sensors** with tunable selectivity and sensitivity. In a recent work, the effect of a small drug interacting with a disordered region of p27 was shown to cause a shift in its conformational landscape (Ban et al., 2017) stressing the capacity of IDRs as sensor of their environment (and not trivially, as drug targets).

Borrowing concepts from information theory (Shannon and Weaver, 1949), the capacity to transfer information is determined by the signaling event rate, and the size of the set formed by possible events it permits. Fuzzy complexes provide fast interconversion dynamics and a large set of configurations in the interface. Thus, fuzzy interfaces have the ability to act as high-capacity channels.

#### FUZZY INTERACTIONS IN SRC FAMILY KINASES

The Src N-terminal regulatory element (SNRE) studied by our group suggests an additional class of IDR allostery, in which the disordered region acts a sensor but the connecting element is a folded SH3 domain.

Mutations in the Unique domain of c-Src induce strong phenotypes in Src-dependent colorectal cancer cells (Arbesú et al., 2017). The Unique domain participates in a number of interactions with proteins such as calmodulin (Pérez et al., 2013) or N-methylaspartate receptor (Gingrich et al., 2004), lipids (Pérez et al., 2013) and is subjected to phosphorylation (Amata et al., 2014) and proteolytic processing (Hossain et al., 2013). In order to integrate these capabilities into a functional sensor-activator pair, the nature of the connector becomes a key issue. The SH3 domain has been shown to act as a scaffold of a fuzzy intramolecular complex (Maffei et al., 2015; Arbesú et al., 2017). These findings suggests that the SH3 domain may have a dual role in c-Src regulation: the traditionally recognized one, as a sensor (docking site) of polyproline peptide motifs, as well as that of a connector, relaying the information sensed by the preceding IDR (**Figure 2**). An enhanced capacity of SH3 motifs to interact with intrinsically disordered regions has been suggested (Beltrao and Serrano, 2005). Recently, the N-terminal IDR of Abl kinase has been shown to modulate its activity through the SH3 domain (Saleh et al., 2017).

Recent NMR data (Tong et al., 2017) as well as SAXS studies (Bernadó et al., 2008) confirm that the interaction

between the SH3-SH2 regulatory domains and the kinase (SH1) domain is conserved in the active form of cSrc, i.e., in the absence of the autoinhibitory interaction between pTyr527 and the SH2 domain. The NMR results show that this interaction is dynamic and suggests that modulation of the interdomain dynamics may contribute to modulate c-Src activity.

In spite of their large sequence divergence, the IDR regions of the various SFK show coevolution with their respective SH3 domains, suggesting that a fuzzy interaction such as the one found in c-Src may be a functional element in all SFKs (Arbesú et al., 2017). The large sequence variations in the Unique domains contrasts with the very high homology displayed by the SH3-SH2-SH1 cassette, suggesting that the Unique domain has evolved to read the distinct environments required by each SFK.

The 1PRE method is a simple and robust analytical approach to generate a fingerprint of the long-range interactions within intrinsically disordered domains. The complete processing tools are part of the Farseer software (Teixeira et al., in press).

#### REFERENCES


Simulations based on fuzzy logic recapitulate many features of a kinase network (Aldridge et al., 2009). Proteins like the SFKs can be considered as algorithms reading complex signaling inputs to generate the proper responses. Thus, fuzzy interactions by IDRs may, in fact, be implementing fuzzy logic at the level of individual proteins.

#### AUTHOR CONTRIBUTIONS

This is a mini-review article based on the Ph.D. thesis of MA, supervised by MP and with contributions from GI, HF, and JT working in fuzzy complexes of Src Family Kinases.

#### ACKNOWLEDGMENTS

We acknowledge the financial support from the Spanish MINECO (BIO2016-78006R) co-financed with EU structural funds, and the Fundació Marató de TV3 (2013-2830/31) and insightful discussions with M. Fuxreiter, O. Millet, and R. Crehuet during the defence of M.A. thesis.

of an intrinsically disordered protein. J. Am. Chem. Soc. 139, 13692–13700. doi: 10.1021/jacs.7b01380


Fuxreiter, M., and Tompa, P. (eds.).(2012). Fuzziness. New York, NY: Springer US.


protein ultrabithorax. J. Biol. Chem. 283, 20874–20887. doi: 10.1074/jbc.M80 0375200


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer AJDQ and handling Editor declared their shared affiliation.

Copyright © 2018 Arbesú, Iruela, Fuentes, Teixeira and Pons. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.