Investigating the Role of Large-Scale Domain Dynamics in Protein-Protein Interactions

Intrinsically disordered linkers provide multi-domain proteins with degrees of conformational freedom that are often essential for function. These highly dynamic assemblies represent a significant fraction of all proteomes, and deciphering the physical basis of their interactions represents a considerable challenge. Here we describe the difficulties associated with mapping the large-scale domain dynamics and describe two recent examples where solution state methods, in particular NMR spectroscopy, are used to investigate conformational exchange on very different timescales.

Intrinsically disordered linkers provide multi-domain proteins with degrees of conformational freedom that are often essential for function. These highly dynamic assemblies represent a significant fraction of all proteomes, and deciphering the physical basis of their interactions represents a considerable challenge. Here we describe the difficulties associated with mapping the large-scale domain dynamics and describe two recent examples where solution state methods, in particular NMR spectroscopy, are used to investigate conformational exchange on very different timescales.
Keywords: multi-domain proteins, free-energy landscape, conformational dynamics, nuclear magnetic resonance (NMR), small angle scattering, chemical exchange saturation transfer (CEST), single molecule Förster resonance energy transfer (FRET) Over the last four decades, X-ray crystallography, NMR and increasingly electron microscopy have provided unique insight into the nature of functionally-essential interactions between a vast array of biologically active molecules. This remarkable success has often overlooked the importance of the molecular motions that are required for function, with structural biologists tending to focus their attention on highly stable, high affinity biomolecular interactions. Proteins are however intrinsically dynamic, they can exhibit conformational modes of vastly differing amplitudes, from local bond fluctuations to folding/unfolding transitions, on timescales varying from femtoseconds to days. Interactions between proteins are also often required to be weak, for example in signaling pathways where efficiency and reversibility of information transfer are of key importance. The resulting focus on interactions that are strong enough to allow structure determination in terms of a single set of three-dimensional coordinates thus provides a distorted perspective on the interactome.
In order to fully understand the molecular basis of interactions between physiological partners it is necessary to map the modulation of the free energy landscapes of the interacting molecules throughout the interaction trajectory. This aim is the focus of considerable interest, requiring experimental, theoretical and analytical development. For this reason solution state methods have been increasingly used to actively investigate the nature of transient interactions between biomolecules (Vaynberg and Qin, 2006;Sugase et al., 2007;Bashir et al., 2010;Salmon et al., 2011).
Such considerations are particularly relevant for a highly abundant class of proteins whose physical characteristics are defined by their dynamic nature. The unexpected discovery that a high fraction of eukaryotic, but also prokaryotic and viral genomes code for proteins domains whose functional state is natively unfolded (Dyson and Wright, 2005;Uversky and Dunker, 2010), has imposed a new perspective on our understanding of molecular recognition. In contrast to folded proteins, the primary sequence of intrinsically disordered proteins (IDPs) does not code for a single, energetically stable fold, but occupies a flatter free-energy surface, spanning a continuum of different conformations. The dynamic nature of IDPs allows access to functional modes that are otherwise inaccessible to folded proteins. For example, the dynamic intrinsically disordered domains that fill the nuclear pore (nucleoporins), exhibiting fast association and dissociation rates with nuclear transport receptors, thereby facilitating highly specific but rapid transport of cargoes into the nucleus via a continuum of ultra-weak interactions . Interactions between IDPs often show kinetics that are more complex than simple two-state mechanisms, remaining dynamic within the complex , exhibiting "fuzzy" interactions that involve for example local conformational funneling into partner-specific bound conformations (Schneider et al., 2015), or involving transient, non-specific interactions to enhance affinity (Fink, 2005;Dunker et al., 2008;Tompa and Fuxreiter, 2008;Wright and Dyson, 2009;Van Roey et al., 2012;Forman-Kay and Mittag, 2013;Kosol et al., 2013).
The highly dynamic nature of multi-domain proteins necessitates the development of analytical approaches for the interpretation of the available experimental data in terms of representative ensembles. Although the description of the available conformational space already represents a demanding task, due to the risk of over-fitting, a second, equally formidable requirement is the estimation of the populations of different sub-states and, if possible, their rates of interconversion. The time-scales of this dynamic exchange, and the conformational dynamics that occur within different sub-states also define the details of the analytical approaches that can be applied, rendering the task yet more daunting.
Here, we will discuss the application of NMR, SAXS, and single molecule FRET to the study of the conformational sampling of two highly flexible multi-domain proteins, the human U2AF65 protein that plays an essential role in the spliceosome assembly (Banerjee et al., 2003;Wahl et al., 2009;Mackereth et al., 2011), and the 627-NLS domain of the PB2 segment of influenza polymerase (Tarendeau et al., 2008;Delaforge et al., 2015). These two cases illustrate very different exchange regimes, impacting the appropriate interpretation of the experimental data.
NMR spin relaxation is potentially a very powerful means to describe both local and global dynamics of macromolecules in solution. Nevertheless, quantitative analysis of the conformational space sampled by two domains relative to each other is highly challenging (Baber et al., 2001;Wong et al., 2009;Ryabov et al., 2012;Xia et al., 2013). More often, paramagnetic relaxation enhancements (PREs) are used to detect weakly populated close contacts between domains, for example transient encounter contacts in protein-protein and proteinnucleic acid complexes (Iwahara et al., 2004;Tang et al., 2006;Volkov et al., 2006). The level of information can be enhanced considerably if the paramagnetic center exhibits anisotropic magnetic susceptibility, in which case pseudo-contact shifts are induced that are sensitive to the distribution of distances and orientations of vectors connecting the observed spins and the electron spin, relative to the susceptibility tensor. Under such conditions, or when dissolved in a dilute liquid crystalline solution, residual dipolar couplings (RDCs) can also be measured to provide information about the distribution of orientations of the structured domains relative to each other (Tolman and Ruan, 2006;Salmon and Blackledge, 2015). In addition, SAXS reports on the pairwise distribution functions of all conformations averaged over the ensemble (Bernadó et al., 2007).
Phenomenologically, large levels of conformational disorder are often manifest by the inability to interpret the experimental data in terms of a single conformation, so that it becomes necessary to invoke the presence of multiple conformations present simultaneously in solution. Substantial efforts targeting ensemble descriptions of flexible multi-domain proteins have been directed toward using NMR and SAXS data to account for this conformational heterogeneity. As mentioned above, care must be exercised in order to avoid over-fitting and to introduce some estimate of uncertainty of populations and representative conformers in the ensemble descriptions. Examples of recent analytical approaches include an estimation of the maximum occurrence of each possible protein conformation on the basis of experimental NMR or SAXS data (Bertini et al., 2007Ravera et al., 2014), or weighted-ensemble selection from molecular dynamics (MD) simulation or available crystal structures (Yang et al., 2010;Francis et al., 2011;Russo et al., 2013). Conformational space can also be sampled using replica exchange MD (Sgourakis et al., 2007;Wu et al., 2009;Terakawa and Takada, 2011;Knott and Best, 2012;Narayanan et al., 2012;Zhang et al., 2012;Mittal et al., 2013;Wang et al., 2013) using the experimental data as constraints to guide ensemble distributions that reproduce the experimental data on average (Im et al., 2012;Cavalli et al., 2013;Roux and Weare, 2013). Alternatively rigidbody modeling can be used to develop representative ensembles of conformers (Deshmukh et al., 2013).
Conformational sampling can be achieved with high efficiency using restraint-free Monte-Carlo approaches, exploiting for example statistical coil models to describe the backbone dihedral angle distributions of the inter-domain linker. The nature of the intrinsic conformational equilibrium can then be examined by generating representative ensembles of conformers, using an adaptation of the ASTEROIDS approach for ensemble representations of intrinsically disordered systems (Nodet et al., 2009;Salmon et al., 2010;Ozenne et al., 2012;Guerry et al., 2013). Comparison with the experimental data is then used to identify sub-ensembles of conformers that, in combination, represent the Boltzmann distribution in solution. Ensemble descriptions of highly disordered systems are faced with two problems; identification of representative conformational states, and determination of their relative populations. ASTEROIDS uses a genetic algorithm to identify combinations of conformational states that when considered together reproduce the experimental data within estimated experimental uncertainty. Different conformers (i) have populations given by p i = 1/n where n is the number of conformers in the ensemble. Populations are not optimized, so that if a given state requires a higher weight to fulfill experimental data, additional conformers with similar characteristics will be present. The optimal number of structures necessary to reproduce the complexity of the experimental data can be estimated by cross-validation of independent experimental data that are not used in the analysis (Salmon et al., 2010;Guerry et al., 2013;Huang et al., 2014). The single-step selection of ASTEROIDS avoids additional optimization of the weights of specific conformers, and is therefore compatible with robust statistical analysis allowing estimation of the confidence levels of both conformation and population.
This approach was adapted to map the free-energy landscape of the RNA Recognition Motifs RRM1 and RRM2 domains of the U2AF65 protein. The two RRMs are connected by a flexible linker and adopt multiple domain arrangements as indicated by a combined analysis of PRE, RDC and SAXS data (Huang et al., 2014). The analysis reveals a heterogeneous ensemble of states dominated by highly populated, but very different relative positions of RRM1 and RRM2, despite the fact that these conformations are not strongly represented in the unrestrained ensemble of states. These conformations were found to resemble previously proposed "closed" and "open" states Simon et al., 2010;Mackereth et al., 2011;Hennig et al., 2015), where the latter corresponds to the RNA-bound form of the protein (Figure 1), supporting the role of conformational selection from the free-state ensemble as a driving force of this interaction (Mackereth et al., 2011;Mackereth and Sattler, 2012). "Open" and "closed" conformations were found to lie within a continuous density of states, indicating that transition could occur between these states without invoking large conformational jumps. Inspection of the nature of the interacting surfaces dominating the ensemble suggested that the "closed" state was stabilized by electrostatic interactions. This prediction was supported by an observed weakening of transient contacts in this interface, as detected by a reduction of distinctive PREs with increasing ionic concentration, providing an independent support for the nature of the solution state ensemble that is derived uniquely from experimental data.
In the latter example the experimental data are assumed to report on an ensemble of conformations that are in rapid exchange with respect to the difference in NMR chemical shifts, RDCs and PREs. In effect this assumes that all conformations exchange on timescales faster than or equal to tens of microseconds. Interpretation of the experimental data calculated for each independent sub-state can then be averaged to assess the ability of the ensemble to reproduce the experimental data. This exchange regime will not always be respected, for example in the case of strong interactions between domains, where high activation energies of dissociation may result in much slower exchange between states. Such an exchange regime was recently studied in the 627-NLS multi-domain component of Influenza A polymerase.
The viral RNA polymerase complex is made up of three separate proteins, PA (acidic protein), and PB1 and PB2 (basic proteins 1 and 2) that are imported into the host nucleus following translation to assemble into new polymerase heterotrimers that further catalyze viral RNA replication and transcription. PB2 enters the nucleus by binding to Importin α (Impα) via its C-terminal domain, termed the 221 aminoacid 627-NLS domain (Tarendeau et al., 2008;Kuzuhara et al., 2009) containing a nuclear localization signal (NLS) peptide. The 627-NLS domain is of particular interest, as a large proportion of mutations characterizing adaptation of avian viruses to human hosts are located on its surface (Tarendeau et al., 2008;Mehle and Doudna, 2009). Crystal structures of 627-NLS have been determined, in isolation from both avian and human forms of Influenza (Tarendeau et al., 2008), and more recently in the context of the entire polymerase complex (Pflug et al., 2014), in all cases exhibiting effectively identical compact conformations. This conformation was, however, incompatible with binding of the NLS domain to Impα, suggesting that largescale conformational rearrangement of either 627-NLS or Impα would be necessary for successful interaction (Boivin and Hart, 2011).
A recent NMR study (Delaforge et al., 2015) of 627-NLS unambiguously revealed the presence of a more complex conformational equilibrium in solution, with approximately twice as many peaks present in the 1 H-15 N TROSY spectrum as expected, and one set of peaks exhibiting the same chemical shifts as the 627 and NLS domains in isolation. This spectrum Frontiers in Molecular Biosciences | www.frontiersin.org is characteristic of slow exchange (in this case k ex = 30 s −1 at 15 • C) between a closed form of the protein (the crystalline conformation) and a previously uncharacterized open form where the two domains retain their secondary and tertiary structure, but dislocate, and evolve independently of each other (Figure 2). One set of peaks therefore report on the ensemble of conformations that are in fast exchange, while the other set of peaks report on the closed state that is stabilized by a tripartite salt bridge implicating highly conserved basic and acidic amino acids in the interface between the 627 and NLS domains, and in the 13-amino acid inter-domain linker. The linker becomes flexible upon dislocation, and, as in the case of the U2AF65 RRM1-RRM2 domains, provides the degrees of freedom required for the large-scale domain dynamics characterizing the open form.
Single molecule FRET can also be used to probe the distance distribution and inter-domain dynamics of the two Exchange between open and closed conformations of 627-NLS was characterized using NMR lineshape analysis and Chemical Exchange Saturation Transfer (CEST), revealing a strong temperature dependence, ranging from 10 to 100 s −1 over the range 5-30 • C, with the population of the open form increasing significantly from predominantly closed at 5 • C (p = 0.2) to approximately equal populations at 30 • C. Small angle X-ray and neutron scattering data were also measured over the entire temperature range, resulting in very good agreement between expected population-weighted scattering curves from open and closed ensembles and the experimentally determined populations from NMR exchange analysis. Simultaneous analysis of CEST data from multiple different sites throughout the protein as a function of temperature, using an Eyring relationship, provides unique insight into the thermodynamics of the temperaturedependent equilibrium. The activation energy of opening and closing is dominated by enthalpic contributions, while the open state exhibits both entropic and enthalpic contributions compared to the closed state, suggesting additional non-specific non-bonding interactions between the surfaces, as in the case of RRM1/RRM2. Characterization of the temperature-dependence of the equilibrium is of functional interest because viral replication involves adaptation from the warmer bird intestine to the cooler human respiratory system (Massin et al., 2001). Comparison of the temperature dependence of reconfiguration dynamics of 627-NLS from human and bird-adapted proteins is currently underway in our laboratories.
The physiological interaction between 627-NLS and Impα was also investigated using solution methods. The population of the closed form of the protein disappears upon interaction, as shown using single molecule FRET, with the open form maintaining the characteristics of fast distance fluctuations between the attached dyes and therefore suggesting fast inter-domain dynamics in the bound state. Although SAXS measurements of the complex also indicate flexibility of 627 with respect to Impα when bound to NLS, they also reveal that the 627 domain is localized primarily in the vicinity of the C terminus of Impα, possibly stabilized by interactions with the 34 amino acid long intrinsically disordered C terminal tail of Impα.
These observations strongly suggest that while the closed form is known to be necessary for function within the polymerase-RNA complex (Pflug et al., 2014), the open form of 627-NLS is required for interaction with Impα. Interestingly, mutation of R650A in the 627 domain removes the possibility of forming the salt bridge, resulting in suppression of the closed form in solution. This same mutation was independently shown to abrogate polymerase activity in the nucleus, while still allowing for nuclear import (Kirui et al., 2014), substantiating our model whereby the open-form mediates nuclear import. Further structural investigations of different forms of the RNA polymerase complex suggest that an open 627-NLS may also play additional roles in remodeling the polymerase structure during the viral cycle (Hengrung et al., 2015;Thierry et al., 2016).
The two studies described above demonstrate the functional importance of large-amplitude dynamics mediated by the intrinsically disordered linker peptides in multi-domain assemblies. In both cases the analysis suggests a role for conformational selection from an intrinsic pre-existing equilibrium in the interaction with physiological partners. Both studies show how solution-state approaches can be used to understand the role of complex dynamic conformational equilibria in biomolecular function and, more specifically, that these mechanisms could not be understood without a detailed description of the ensemble of states sampled in solution.

AUTHOR CONTRIBUTIONS
All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.