Integrating atomistic molecular dynamics simulations, experiments, and network analysis to study protein dynamics: strength in unity

In the last years, we have been observing remarkable improvements in the field of protein dynamics. Indeed, we can now study protein dynamics in atomistic details over several timescales with a rich portfolio of experimental and computational techniques. On one side, this provides us with the possibility to validate simulation methods and physical models against a broad range of experimental observables. On the other side, it also allows a complementary and comprehensive view on protein structure and dynamics. What is needed now is a better understanding of the link between the dynamic properties that we observe and the functional properties of these important cellular machines. To make progresses in this direction, we need to improve the physical models used to describe proteins and solvent in molecular dynamics, as well as to strengthen the integration of experiments and simulations to overcome their own limitations. Moreover, now that we have the means to study protein dynamics in great details, we need new tools to understand the information embedded in the protein ensembles and in their dynamic signature. With this aim in mind, we should enrich the current tools for analysis of biomolecular simulations with attention to the effects that can be propagated over long distances and are often associated to important biological functions. In this context, approaches inspired by network analysis can make an important contribution to the analysis of molecular dynamics simulations.

In the last years, we have been observing remarkable improvements in the field of protein dynamics. Indeed, we can now study protein dynamics in atomistic details over several timescales with a rich portfolio of experimental and computational techniques. On one side, this provides us with the possibility to validate simulation methods and physical models against a broad range of experimental observables. On the other side, it also allows a complementary and comprehensive view on protein structure and dynamics. What is needed now is a better understanding of the link between the dynamic properties that we observe and the functional properties of these important cellular machines. To make progresses in this direction, we need to improve the physical models used to describe proteins and solvent in molecular dynamics, as well as to strengthen the integration of experiments and simulations to overcome their own limitations. Moreover, now that we have the means to study protein dynamics in great details, we need new tools to understand the information embedded in the protein ensembles and in their dynamic signature. With this aim in mind, we should enrich the current tools for analysis of biomolecular simulations with attention to the effects that can be propagated over long distances and are often associated to important biological functions. In this context, approaches inspired by network analysis can make an important contribution to the analysis of molecular dynamics simulations.

Protein Dynamics and Conformational Changes and their Importance in Biology
It is now well-established that proteins, and biomolecules in general, are highly dynamic systems (Vendruscolo, 2007;Boehr et al., 2009;Zhuravlev and Papoian, 2010;Osawa et al., 2012). We moved, in the last decades, from the structure-function paradigm to the structure-dynamicsfunction triad, where not only the knowledge of the tertiary or quaternary assemblies is important to understand protein function, but also the dynamical behavior on different timescales (Henzler-Wildman and Kern, 2007;Klepeis et al., 2009;Villali and Kern, 2010;Wand, 2012). In this context, the knowledge of even protein states that account for a small population of the ensemble are important (Baldwin and Kay, 2009). Indeed, proteins undergo conformational changes even in their free and unbound/not modified state. Those minor conformations can resemble functional states, such as the structure that a protein assumes when it binds to another biological partner. The "perturbations" that affect the populations of the protein ensemble are caused by modifications, mutations, and interactions with other biomolecules and have impact on the related biological mechanisms (Popovych et al., 2006;Manley and Loria, 2012;Motlagh et al., 2012;Tsai and Nussinov, 2014).

An Atom-Scale View on Protein Structure and Dynamics by Molecular Dynamics Simulations
Molecular simulations applied to biology are a powerful technique to integrate experimental studies, as attested by the Nobel Prize in Chemistry awarded to Martin Karplus, Micheal Levitt, and Arieh Warshel in 2013 for "the development of multiscale models for complex chemical systems." Among the large array of simulation techniques that can cover the definition of a multiscale approach, I will here focus on atomistic classical molecular dynamics (MD) approaches. Other approaches that are important component of a multiscale approach are extensively covered in many review articles in the field (such as Rapaport, 1998;Koehl and Levitt, 1999;Schlick, 2002;Warshel, 2003;Snow et al., 2004;Xia and Levitt, 2004;Karplus and Kuriyan, 2005;Sherwood et al., 2008;Dror et al., 2012;Saunders and Voth, 2013;Salvatella, 2014;Field, 2015) and are not the focus of this perspective article.
Thirty-eight years are now passed since the first molecular dynamics (MD) simulation of a biomolecule, i.e., the bovine pancreatic trypsin inhibitor, was published (McCammon et al., 1977) and even if that was a short and in vacuum simulation, that work has a remarkable impact on the way in which we look at biomolecular structures. Thirty-eight years later, MD simulations are widespread tools that even experimentalists use to rationalize or guide experiments and a search in Pubmed with "molecular dynamics simulations" as a keyword provides more than 40,000 entries.
In MD methods, the system obeys to Newton's equation of motion and the interactions between atoms are described using, as physical models, molecular mechanics force fields (for a detailed discussion of the basis of MD see, for instance, Rapaport, 1998;Schlick, 2002;Kukol, 2015). Atomistic MD simulations have a unique strength of providing a description of biomolecules on several timescales from the femtosecond (fs) to the millisecond (ms), without renouncing to an atom-scale view for both the biomolecules and the solvent . They become over the last decades a useful method to integrate experimental research in structural biology and protein science, providing a "computational microscope" for proteins and their complexes .

The Importance of the Force Field Parameters: Improvements, Limitations and Experimental Validation
The way in which we describe biomolecules in MD simulations depends on the force field, that we use to describe the system.
Recently, new force fields have been developed that are in good agreement with many NMR-derived parameters, describing dynamics over different timescales Lindorff-Larsen et al., 2012;Huang and MacKerell, 2013;Reif et al., 2013). Despite the fact that the mathematical functional form of the classical force fields is similar, the new-generation force fields differ in parameters that are associated with a subset of very important torsional angles.
Providing the community with good and accurate force fields is an important task. Indeed, the risk to misinterpret the results from our simulations and, as a consequence, to provide biological models that are not meaningful is high. It is known that the even slight changes in force-field parameters have a large impact on the resulting conformational ensemble, as well as on the capability to reproduce experimental parameters (Lindorff-Larsen et al., 2012;Piana et al., 2014).
MD force fields, even if constantly improving, are still far from being perfect (Piana et al., 2014). Indeed, there are several aspects of protein dynamics that we are not able to simulate with accuracy, or even in a reasonable way. For example, currently employed classical MD force fields have been shown to overestimate salt bridges (Debiec et al., 2014). This observation can change how we interpret effects related to electrostatic interactions in proteins. Salt bridges, especially when they are in solvent-exposed positions, may be less populated in solution than expected from the analysis of MD simulations or X-ray structures, as convincingly argued by the NMR study of GB1 in solution (Tomlinson et al., 2009) or by the observation that mutations of residues involved in salt bridges in thermophilic or hyperthermophilic enzymes have often minor effects on thermal stability (Jónsdóttir et al., 2014). Moreover, classical force fields cannot accurately describe those interactions that need polarizable effects to be explicitly taken into account. This is crucial, for example, when we need to deal with metal-binding proteins (Banci, 2003). MD force fields also encounter the risk of over-compaction of the protein during the simulation, and this is especially important when we aim at studying systems as intrinsically disordered proteins (IDPs) or unfolded states (Knott and Best, 2012;Lambrughi et al., 2012;Invernizzi et al., 2013;Camilloni and Vendruscolo, 2014;Piana et al., 2014;Palazzesi et al., 2015). We encounter the risk, for example, to attribute a great importance to intramolecular interactions observed in MD-derived IDPs ensembles, which are a consequence of the highly compact states sampled during the simulation. Very recently, solutions have been provided for MD simulations of IDPs or unfolded states (Best et al., 2014;Piana et al., 2015). The authors of these works showed that it is possible to recover the compactness of IDPs or unfolded proteins observed, for example, by small-angle X-ray scattering thank to improved solvent models. This is an encouraging step toward a more accurate and reliable atomistic description of IDPs by simulations.
A continuous exchange between experimental biophysical techniques and MD is also needed.
Indeed, evaluation and validation of MD force fields cannot rely only on the study of few proteins, we need to move forward and test force fields on proteins with other folds. To do this, a large amount of experimental data are necessary for the comparison, along with methods to back-calculate Papaleo Integrating simulations, networks and experiments from the simulated ensemble properties that can be measured experimentally, for instance by NMR spectroscopy or SAXS (Lindorff-Larsen et al., 2005;Lange et al., 2008;Kohlhoff et al., 2009;Sahakyan et al., 2011;Li and Brüschweiler, 2012;Camilloni and Vendruscolo, 2014). In this context, the usage of NMR parameters to cross-validate a MD ensemble should become a routine practice in computational biophysics.

The Coverage of the Conformational Space in MD Simulations
On the other side, one of the main well-known issues of classical atomistic MD is the sampling of the conformational space. Indeed, during classical MD simulations, even if now we have the hardware (Friedrichs et al., 2009;Kohlhoff et al., 2014;Shaw et al., 2014) and software (Harvey et al., 2009;Pronk et al., 2013) to improve the performance and simulate even millisecond timescales, the protein encounters the risk to be trapped for a long time in a local basin of the conformational space, making the simulations not converging on even long timescales. Conventional MD simulations thus allow us to provide a limited description of protein dynamics since we neglect a major part of the conformational landscape. Also, we should keep in mind the risk to sample regions of the conformational space that are not relevant for the dynamics that we observe experimentally when a force field in bad agreement with the experimental data is used (Lindorff-Larsen et al., 2012).
The sampling problem has been recently overcome, for example, by enhanced-sampling techniques integrated to the atomistic force field description that MD provides. A rich portfolio of methods for enhancing sampling has been developed, each of them relying on different philosophies (Piana and Laio, 2007;Hritz and Oostenbrink, 2008;Sutto et al., 2012;Abrams and Bussi, 2013;Bernardi et al., 2014;Do et al., 2014;Papaleo et al., 2014;Spiwok et al., 2014;Barducci et al., 2015). Among them, metadynamics has the potential to enhance the sampling of rare transitions, which are often important in biology (Laio and Gervasio, 2008). Metadynamics approaches proved to be very accurate in recovering the conformational changes experimentally observed on the micro-millisecond time scale, which are the most important for many biological processes (Berteotti et al., 2009;Palazzesi et al., 2013;Sutto and Gervasio, 2013;Papaleo et al., 2014). In metadynamics, the sampling is enhanced along a properly selected set of reaction coordinates (i.e., collective variables) that have to account for the slowest degrees of freedom of the process of interest. It can be argued that it is limited to the usage of few collective variables at the time to ensure convergent simulations. The community, however, did extensive efforts in the last years to mitigate this problem, for instance, by reweighting procedures to calculate collective variables not used directly to bias the simulations (Bonomi et al., 2009) or by approaches as bias exchange metadynamics (Piana and Laio, 2007), in which the replicas exchange between different collective variables rather than in the temperature space.
The development of methods to predict NMR-derived parameters that are probes of dynamics over different timescales mentioned in the previous section, not only provides a solid strategy to evaluate the quality of MD ensembles but it also allow to simulate the protein integrating the atomistic description of the MD force fields with the experimental data. In this context, the usage of NMR parameters to restrain MD simulations is becoming a popular application and provided encouraging results toward a more complete picture of the complex dynamics of proteins (Lindorff-Larsen et al., 2005;Tang et al., 2007;Vendruscolo, 2007;Fenwick et al., 2011;Camilloni et al., 2012Camilloni et al., , 2013Camilloni and Vendruscolo, 2014). In some of these approaches, the restraints are generally averaged over multiple replicas to better reflect the intrinsic average nature of the experimental data. These computational methods, among the enhanced sampling techniques that can be integrated to the atomistic MD force fields, have also the capability to recover the dynamics observed experimentally in the millisecond regime with high accuracy and without limiting the analysis to a small subset of collective variables, as metadynamics does.

When the Structural Effects Come from Distal Site: Long-Range Communication in Protein Dynamics and Network Analysis Applied to MD Simulations
The network paradigm has been extensively used to describe structure, topology and dynamics of proteins (Vishveshwara et al., 2009;Atilgan et al., 2012;Collier and Ortiz, 2013;Csermely et al., 2013;Feher et al., 2014). The intramolecular non-covalent interactions in a protein are known to be crucial in determining the protein structure and they can be collectively represented in the form of a network, namely a Protein Structure Network (PSN), where the residues are the nodes of the network connected by edges that depend on their interaction strength. PSNs are "small worlds" (Vendruscolo et al., 2002;Atilgan et al., 2004) suitable for the fast transmission of conformational changes at distal sites. Indeed, in the small world of PSNs, the residues can communicate through the shortest paths available and multiple paths with common nodes are used (Del Sol et al., 2009). Several important issues, however, are, still unsolved when it comes to network analysis of protein ensembles. We can identify paths of communications between distal residues and we see that they are modified by perturbations. Very often, however, the way in which the native network of a protein is affected by a perturbation is not easy to predict. We do not exactly know if the paths that we calculate in a structural ensemble are really relevant for functional dynamics, or even for the conformational changes observed in the experiments. The community is collecting some examples in favor of a relationship between the networks observed in the protein structure and functional properties of proteins (Whitley and Lee, 2009;Mariani et al., 2013;Van den Bedem et al., 2013;Invernizzi et al., 2014;Papaleo et al., 2012bPapaleo et al., , 2014. We are, however, still far from being able to derive straightforward information and predictions out of the protein network. More efforts are certainly needed in this direction, also considering the potential that these approaches can have in applications such as drug discovery and protein engineering (Kar et al., 2010;Nussinov and Tsai, 2013).
Several methods are now available to identify networks and paths of communication in protein ensembles, such as the ones collected by MD simulations. The originally named PSN approach relies on the description of atomic contacts between residues that also feature correlated motions (Ghosh and Vishveshwara, 2007;Seeber et al., 2011;Papaleo et al., 2012a). It has become very popular also thanks to the availability of tools to derive PSN from both experimental and simulated ensembles (Seeber et al., 2011;Pasi et al., 2012;Bhattacharyya et al., 2013;Tiberti et al., 2014). The metrics to estimate correlated pairs of residues can range from Person correlation to Mutual Information and often suffer of the disadvantage in terms of convergence and statistical significance of the data. Caution has thus to be taken in deriving correlation maps from a MD ensemble and we should avoid in MD applications to biological target to use just a unique average correlation map from an entire MD trajectory. Other methods have been recently proposed and applied to MD simulations, relying on different philosophies, such as the usage of structural alphabets (Pandini et al., 2013), the integration of topology, correlated motions and communication propensity (Allain et al., 2014), the notion of energetic coupling between protein residues. (Ribeiro and Ortiz, 2014) or even the application of concepts inspired by engineering in which the protein is treated as a mechanical construct rather than a chemical molecule (Stacklies et al., 2011) with the possibility to trace communication even in stiff structures without evident atomic displacement.
The contribution of the solvent in the modulation of longrange communication in proteins should not be neglected and the community still lacks successful approaches to analyze the solvent-protein intermolecular networks, making this an appealing field for future research.
All the methods mentioned above often rely on the integration between network analysis and MD simulations. It is thus important to achieve a proper conformational sampling during the simulations, verify the reproducibility of the results over different replicates of the same system and o evaluate how long the simulations have to be extended to get meaningful results from network analysis. Indeed, long-range effects are often associated to conformational changes occurring on microsecondmillisecond timescales. Thus, the limitation inherent in the sampling achieved by classical MD has to be kept in mind. We recently showed that MD simulations of 100 ns or 1 µs of the same system provided similar average PSN results  but this account only for one case of study, making any generalization very hard.

The Strength is in Unity: Integration of Molecular Dynamics to Experimental Observable
In conclusion, we have reached now a point in which both from the computational and the experimental side, there are a large amount of information on protein dynamics and techniques to explore it on different timescale, as well as techniques to disclose effects that can be mediated by distal communicating sites and to actually describe these paths of communication at the atomscale. The strength is now in the unity, in the capability to tightly integrate the computational and experimental approaches combining their own capabilities and overcoming their own limitations for a more detailed and accurate understanding of protein dynamics at the atomic level.