Transfer Free Energies of Test Proteins Into Crowded Protein Solutions Have Simple Dependence on Crowder Concentration

The effects of macromolecular crowding on the thermodynamic properties of test proteins are determined by the latter's transfer free energies from a dilute solution to a crowded solution. The transfer free energies in turn are determined by effective protein-crowder interactions. When these interactions are modeled at the all-atom level, the transfer free energies may defy simple predictions. Here we investigated the dependence of the transfer free energy (Δμ) on crowder concentration. We represented both the test protein and the crowder proteins atomistically, and used a general interaction potential consisting of hard-core repulsion, non-polar attraction, and solvent-screened electrostatic terms. The chemical potential was rigorously calculated by FMAP (Qin and Zhou, 2014), which entails expressing the protein-crowder interaction terms as correlation functions and evaluating them via fast Fourier transform (FFT). To high accuracy, the transfer free energy can be decomposed into an excluded-volume component (Δμe−v), arising from the hard-core repulsion, and a soft-attraction component (Δμs−a), arising from non-polar and electrostatic interactions. The decomposition provides physical insight into crowding effects, in particular why such effects are very modest on protein folding stability. Further decomposition of Δμs−a into non-polar and electrostatic components does not work, because these two types of interactions are highly correlated in contributing to Δμs−a. We found that Δμe−v fits well to the generalized fundamental measure theory (Qin and Zhou, 2010), which accounts for atomic details of the test protein but approximates the crowder proteins as spherical particles. Most interestingly, Δμs−a has a nearly linear dependence on crowder concentration. The latter result can be understood within a perturbed virial expansion of Δμ (in powers of crowder concentration), with Δμe−v as reference. Whereas the second virial coefficient deviates strongly from that of the reference system, higher virial coefficients are close to their reference counterparts, thus leaving the linear term to make the dominant contribution to Δμs−a.


INTRODUCTION
It is now well-recognized that "bystander" macromolecules in cellular milieus may significantly influence the biophysical properties of proteins (Zhou et al., 2008;Zhou, 2013;Gnutt and Ebbinghaus, 2016). Such influences can be detected by many experimental observables, including equilibrium sedimentation gradient (Rivas et al., 1999), protein folding and binding stability (Batra et al., 2009a,b;Miklos et al., 2011Miklos et al., , 2013Phillip et al., 2012;Wang et al., 2012;Sarkar et al., 2013), light scattering intensity (Wu and Minton, 2013), small-angle neutron scattering profile (Goldenberg and Argyle, 2014;Banks et al., 2018), and fluorescence resonance energy transfer (FRET) efficiency (Soranno et al., 2014). Theoretically these influences are determined by the transfer free energies of test proteins from a dilute solution to a solution of the macromolecular crowders (Minton, 1983;Zhou et al., 2008;Qin and Zhou, 2009;McGuffee and Elcock, 2010). Since the transfer free energies in turn are determined by the effective protein-crowder interactions, in principle the experimental data contain information about these intermolecular interactions. However, apart from the work of McGuffee and Elcock at "very significant computational expense" (McGuffee and Elcock, 2010), until recently it was only possible to use relatively crude models of protein-crowder interactions for calculating transfer free energies (Minton, 1981;Qin and Zhou, 2010;Kim and Mittal, 2013) or quantitatively modeling crowding effects (Cheung et al., 2005;Minh et al., 2006), thereby limiting our ability to interpret and fully utilize the experimental data. To mitigate this problem, an FFT-based method for Modeling Atomistic Protein-crowder interactions, or FMAP, has been developed Zhou, 2013, 2014). Most recently FMAP was used to quantitatively interpret FRET efficiency data for disordered proteins in the presence of polyethylene glycol (Soranno et al., 2014), implicating mild attraction between the test proteins and the polymer crowder (Nguemaha et al., 2018). Here we used FMAP to calculate the transfer free energy ( µ) of folded and unfolded test proteins from dilute to crowded protein solutions, paying particular attention to the dependence of µ on crowder concentration. Even with an all-atom representation for both the test and the crowder proteins, the dependence of µ on crowder concentration was found to follow simple relations. We explore the physical reasons for this simple behavior.
As in our previous study (Qin and Zhou, 2014), we assumed a general, implicit-solvent energy function for protein-crowder interactions, consisting of hard-core steric repulsion, non-polar attraction, and solvent-screened electrostatic terms: Whenever r ij , the distance of any pair of protein-crowder atoms, is less than the sum of their hard-core radii, (σ ii + σ jj )/2, the steric term U st goes to ∞. When the test protein is free of such steric clashes with crowder atoms, U st vanishes and the two soft interaction terms come into play. Specifically, the non-polar attraction has the form of a Lennard-Jones potential: (2) where ε ij is the magnitude of the non-polar attraction between the i-j pair of atoms. The solvent-screened electrostatic term has the form of a Debye-Hückel potential: where q i are atomic charges, and λ and κ are the Debye screening length and the dielectric constant, respectively, of the crowder solution. FMAP finds the transfer free energy from an average of the Boltzmann factor of the protein-crowder interaction energy Zhou, 2013, 2014) More specifically, the test protein is fictitiously placed into the crowder solution, and the average is taken over the position ("R") and orientation (" ") of the test protein and configuration ("c") of the crowders. The average over R is taken care of by FFT, and is the core component of FMAP. The average over and c is realized by repeating MFAP calculations over different orientations of the test protein and different configurations of the crowder solution.
Here we obtained the transfer energies of eight test proteins over a wide range of concentrations of two crowder proteins. There are two main findings. First, the transfer free energy can be accurately decomposed into an excluded-volume component ( µ e−v ), arising from the hard-core repulsion, and a soft-attraction component ( µ s−a ), arising from non-polar and electrostatic interactions. Second, whereas the excludedvolume component has a complex dependence on crowder concentration, the soft-attraction component has a nearly linear dependence on crowder concentration. We explain this interesting result by a perturbed virial expansion of µ.
To obtain the crowder protein configurations, first hardsphere simulations were carried out and then the hard spheres in the final snapshots were replaced by randomly orientated protein molecules. The simulations were run using a C++ code (https://cims.nyu.edu/~donev/Packing/C++/), written by Skoge et al. (Skoge et al., 2006). In short, N spheres in a cubic box were grown from points at a steady rate and underwent ballistic collisions. The box had a side length of 1 and periodic boundary conditions were imposed. The simulations were terminated when the hard spheres grew to a desired radius. Specifically, for the simulations intended for LYS, the final radius was 0.1485, such that the hard-sphere volume fraction at N = 48 reached 0.658; for BSA, the final radius was 0.14 and the volume fraction at N = 48 was 0.552. Ten replicate simulations were run at each N for replacement into each of the two crowder proteins.
For replacing the hard spheres by protein molecules, the radii of the spheres were scaled to appropriate lengths to allow for the spheres to enclose the proteins. For the simulations intended for LYS, the unit length of the simulation box was scaled to 174 Å, and so the spheres were mapped to a radius of 25.84 Å. For BSA, the corresponding simulation box was scaled to a 300 Å side length, leading to a hard sphere radius of 42.0 Å. These spheres were sufficiently large to enclose the vast majority of the atoms in each crowder protein. The spheres were replaced by protein molecules one at a time. The protein molecules were assigned random orientations, by choosing a random direction for a unit vector attached to the protein and rotating the protein around the unit vector by a random angle between 0 and 360 • (Qin et al., 2011). When placing a new protein molecule, random orientations were repeatedly chosen until it did not clash with any of the protein molecules already placed (including their periodic images). The threshold for clash was 4.0 Å for any interatomic distance between two protein molecules. This process was repeated until all the hard spheres in the simulation box were successful replaced by protein molecules. The number, N, of crowder molecules in the simulation boxes ranged from 6 to 48, in increments of 6. At the highest number, the crowder concentrations were 217 mg/mL for LYS and 196 mg/mL for BSA.
FMAP entails fictitiously placing a test protein into the crowder box and calculating the interaction energy between the test protein and the crowder proteins. The interaction energy function is given by equations (1)-(3), and the parameters are those described in our previous study. Specifically, the Lennard-Jones parameters were taken from Autodock (Morris et al., 2009) and the partial charges were taken from Amber (Cornell et al., 1995). To achieve a better balance between U n -a and U elec (as judged by, e.g., salt and temperature dependences of second virial coefficients of proteins in unpublished work), we scaled down the former by a factor of 0.2 (for comparison, Autodock applied a scaling factor around 0.16), and scaled up the latter by a factor of 2.0. The temperature was 298 K (where the solvent dielectric constant was 78.4) and the ionic strength was 0.15 M.
At each crowder concentration, 10 independent configurations of crowders were generated; for each crowder configuration, 500 random orientations for each test protein were chosen. So altogether 5,000 FMAP calculations were carried out for each test protein at each crowder concentration, and the results were averaged to yield the transfer free energy. To test the additivity between the excluded-volume component and the soft-attraction component, we also carried out corresponding averaging to obtain these components (see below).
Two methods were used to do error analysis. The first was bootstrap. Here the 5,000 individual values of the transfer free energy (or a component thereof) were pooled to create the original sample. Bootstrap samples (with the same size, 5,000, as in the original sample) were then generated by randomly drawing from the original sample, and the standard deviation of the data in each bootstrap sample was calculated. The error of the FMAP calculation was finally estimated as the mean of the standard deviations of 10,000 bootstrap samples. The second was the block decorrelation technique of Flyvbjerg and Petersen (1989) (code downloaded from https://github.com/manoharanlab/flyvbjerg-std-err/). Here we treated the 5,000 data points as a time series. These data points were "blocked" in successive generations. Specifically, the data points in the first generation were the original ones; in the second generation, the first two data points, the next two data points, and so on were each "blocked, " i.e., merged and replaced by their averages. This blocking process continued until the total number of blocked data points went below a cutoff of 15. At each generation, the variance of the blocked data points was calculated. The variance reached a plateau before the cutoff, and the square root of the plateau value was taken as the error estimate.

Additivity Between Excluded-Volume and Soft-Attraction Components
As shown by equation (4), the transfer free energy µ is given by the average of the Boltzmann factor of the protein-crowder interaction energy U int ; the average needs to be taken over the position R of a fictitious placement of the test protein into the crowder box, the orientation of the test protein, and the configuration c of the crowders. For a given and a given c, FMAP calculates the average over R from values of U int at grid points within the crowder box. The grid points can be separated into ones with protein-crowder clash and ones that are clash-free. Note that exp −U st /k B T has value 0 at the clashed grid points and value 1 at the clash-free ones; the two soft interactions only operate at the clash-free grid points. Based on these considerations, we can write the average of exp −U int /k B T over R as where < · · · > R and < · · · > 1 signify averaging over all the grid points and clash-free ones, respectively. Note that < exp −U st /k B T > R is simply the clash-free fraction of grid points. Corresponding to the factorization in equation (5), we can write the transfer free energy, calculated without averaging over and c, as the sum of an excluded-volume component and a soft-attraction component: for a single and a single c (6) The excluded-volume component is given by the clashfree fraction, whereas the soft-attraction component is given by the combined soft interactions at the clash-free grid points: We further averaged < exp −U int /k B T > R over combinations of 500 test protein orientations and 10 crowder configurations to obtain the transfer free energy µ. Specifically, the algebraic average of 5,000 individual values of < exp −U int /k B T > R was calculated and then converted to µ. Similarly, we averaged < exp(−U st /k B T) > R and < exp [−(U n−a + U elec )/k B T] > 1 over the 5,000 /c combinations to obtain µ e−v and µ s−a , respectively. The sum of µ e−v and µ s−a provides a very accurate estimate of µ (Figure 2), demonstrating the additivity of these two components. The errors reported by two methods are very similar (Figure S1), and hence in Figure 2 and hereafter we only show errors determined by the bootstrap method.
If we have an infinitely large crowder box, then its different regions give a good representation for the configurations of a finite crowder box. Likewise, when the test protein with a single orientation is fictitiously placed into different regions of the infinite crowder box, it is as if many different orientations of the test protein are probed by a finite crowder box. Hence, for an infinitely large crowder box, a separate average over crowder configurations and test protein orientations is unnecessary; then the separation of µ into µ e−v and µ s−a is exact. That our results accurately conform to additivity provides an indication that our crowder box is sufficiently large. In particular, the clashfree fraction, < exp(−U st /k B T) > R , is highly constant among the 5,000 /c combinations, as indicated by very small µ e−v errors (<0.01 kcal/mol, except for polε in the most concentrated LYS solution, where the error is 0.02 kcal/mol; see also below) FIGURE 2 | Additivity of the excluded-volume and soft-attraction components of the transfer free energy.
Frontiers in Molecular Biosciences | www.frontiersin.org (Figure 2). The decomposition into µ e−v and µ s−a provides physical insight into the transfer free energy. µ e−v is necessarily positive, whereas µ s−a can be expected to be negative. As Figure 2 shows, these two quantities largely cancel each other, leading to a relatively modest magnitude for µ.
We also asked whether µ s−a could be further decomposed into the separate contributions of the two types of soft interactions. To that end, we carried out the averages of exp (−U n−a /k B T) and exp (−U elec /k B T), i.e., by including only one of the two types of soft interactions. For convenience we refer to the corresponding chemical potentials as µ n−a and µ elec , respectively. As shown in Figure S2, the magnitude of µ s−a is much larger than the sum of µ n−a and µ elec , indicating strong correlations between the two types of soft interactions. Indeed, one expects that the strongest electrostatic attractions occur when the test protein is apposed to crowder proteins with high charge and shape complementarity, but high shape complementarity also leads to strong nonpolar attraction.
For the largest test protein, polε, in the most concentrated LYS solution (217 mg/mL), the clash-free fraction in 2,086 of the 5,000 /c combinations was 0, i.e., not a single grid point was clash-free. In this concentrated crowder solution, the probability that voids large enough to accommodate polε is very small, which explains the high percentage of fully clashed /c combinations as well as the relatively higher error of µ e−v (calculated on the 2,914 /c combinations with clash-free grid points). The µ e−v value thus calculated was corrected by adding −k B T ln (2, 914/5, 000) to account for the fully clashed /c combinations. The same correction also applies to µ in this case. Values of µ, µ e−v , and µ s−a for the eight test proteins are presented in Table S1 for LYS crowding and Table S2 for BSA crowding.

Theoretical Modeling of Excluded-Volume Component
For calculating the excluded-volume transfer free energy, scaledparticle and other theories have been developed for test particles and crowder particles that have spherical and other simple shapes. Our generalized fundamental measure theory (GFMT) has enabled the test proteins to be represented at the all-atom level, though crowders still have to be modeled as spheres (Qin and Zhou, 2010). GFMT predicts the excluded-volume component as where v p , s p , and l p are the volume, surface area, and integrated mean curvature (with dimension of length) of the test protein; c is the osmotic pressure of the crowder solution, and γ c and κ c are the corresponding quantities for surface tension and bending rigidity; and φ is the total volume fraction of the crowders. The latter is given by φ = V c ρ c , where V c and ρ c are the volume and number density of the crowders. Two other quantities, ρ R = R c ρ c and ρ S = S c ρ c , with R c and S c denoting the radius and surface area of the crowders, are needed to define c , γ c , and κ c. The results are The osmotic pressure can be viewed as the energy to create a cavity with a unit volume in the crowder solution; the surface tension is the energy to create a unit-area interface between the crowder solution and a test protein; and the bending rigidity measures the energy arising from the curvature of the interface. Fitting our µ e−v data to GFMT meant that we modeled the crowder proteins as spheres; in so doing we needed to specify the radius, R c , for each crowder protein. Note that R c is the only free parameter; once R c is chosen, the volume, surface area, and linear size of the test protein (i.e., v p , s p , and l p ) are calculated by rolling a spherical probe of radius R c around the three-dimensional structure of the test protein.
We were able to achieve a good global fit for all the eight test proteins in either LYS or BSA after searching for an R c value that minimized deviations between the µ e−v data and the GFMT predictions (Figure 3). The resulting R c values are 21.4 and 35.4 Å, respectively, for LYS and BSA. These values are close to the hydrodynamic radii, 19.6 and 36.5 Å, calculated by HYDROPRO (Ortega et al., 2011). Using the preceding R c values, the volume fractions of the two crowders at their highest concentrations are 37.6% and 33.1%. The resulting values for v p , s p , and l p of the eight test proteins are presented in Table S3.

Quadratic Fitting of Soft-Attraction Component
The soft-attraction component, µ s−a , calculated by FMAP has a nearly linear dependence on the crowder concentration (Figure 4). We fitted the results to a quadratic function: The quadratic term makes a minor contribution in most of the 16 sets of results (eight test proteins pairs with two crowder proteins). In particular, at φ = 30%, the quadratic term is <20% of the linear term in 9 out of the 16 cases. Importantly, when the data at the first seven crowder concentrations were used for the fitting, the fitting function, by extrapolation, predicts well the µ s−a result at the eighth crowder concentration (comparing the red circle and the eighth blue circle in each case). This suggests that a function like equation (15) can be used to predict µ s−a at high crowder concentrations, where FMAP calculations become difficult because voids that can accommodate the test proteins are rare.

DISCUSSION
By using our FMAP, we have calculated the transfer energies of eight test proteins over a wide range of concentrations of two crowder proteins. We have shown that the transfer free energy can be accurately decomposed into an excluded-volume component, arising from the hard-core repulsion, and a softattraction component, arising from non-polar and electrostatic interactions. Our calculation results thus rigorously validate similar decompositions proposed previously (Petsev et al., 2003;Jiao et al., 2010;Minton, 2013). We have found that the excluded-volume component is predicted well by the generalized fundamental measure theory, which was developed for atomistic test proteins in the presence of spherical crowders that exert only steric repulsion. On the other hand, we have found that the soft-attraction component has a nearly linear dependence on crowder concentration. The latter result is interesting and has important implications.
Why does µ s−a have a nearly linear dependence on crowder concentration? To gain insight, we turn to the perturbed virial expansion for pure molecular fluids (Nezbeda and Smith, 2004). The expansion was originally applied to the pressure (P), where ρ is the number density, P ref is the pressure of a reference system, and B l are the "residual" virial coefficients, i.e., the differences in virial coefficients between the real and reference system. We can easily turn equation (14) into an expression for the excess chemical potential, using the relation (Qin and Zhou, 2016) The result is A protein-crowder system where the test protein and crowder protein are the same is equivalent to a pure molecular fluid. In that case, the transfer free energy µ and the excess chemical potential µ ex are equivalent. Furthermore, we may choose the reference system such that µ ex ref is equivalent to µ e−v , then the second term, an infinite sum and to be denoted as µ ex , on the right-hand side of equation (16) is equivalent to µ s−a . We can now recognize the quadratic function in equation (13) as a truncation of the infinite sum to the second order. The nearly linear dependence µ s−a on crowder concentration just means that the contributions of the second and higher orders are much less than that of the first order.
In Figure S3, we display the contributions of the first-, second-, and third-order contributions to µ ex for Lennard-Jones fluids, with the reference system chosen as hard-sphere fluids (diameter = σ ). Over a wide range of ε, the depth of the interaction potential, the dominant contribution comes from the first order. Virial coefficients B l are integrals of Mayer functions over the positions of l molecules. A Mayer function is the Boltzmann factor of the intermolecular interaction potential subtracted by 1; hence we expect that dominant contributions to residual virial coefficients B l come from clusters of l molecules in which all pairs are in the most attractive range of intermolecular distance. When the range of attraction is narrow, such molecular clusters become rare for l = 3 and higher. That would lead to small B l≥3 values and explain why µ ex is dominated by the first-order term, which is proportional to B 2 . The foregoing argument for pure molecular fluids largely applies to the protein-crowder systems studied in the present work, thus providing a rationalization for the nearly linear dependence of µ s−a on crowder concentration. An interesting future study would be to directly validate this argument by calculating virial coefficients of different orders for protein-crowder systems. It is straightforward to apply FMAP for B 2 calculations, but efficient B l≥3 calculations for protein-crowder systems will require careful algorithmic design. Minton modeled soft attraction as weak unsaturable binding (Minton, 2013), which leads to an approximately linear dependence on φ. Hoppe and Minton (Hoppe and Minton, 2016) used a perturbed virial expansion, similar to equation (16) and including B 2 and B 3 , for square-well crowders and found µ s−a to be linearly dependent on φ. The present work provides confirmation of these previous results and generalize them to atomistic models.
A nearly linear dependence of µ s−a allows us to extrapolate results obtained at lower crowder concentrations to higher ones, as we demonstrated here (Figure 4). As noted above, calculation of transfer free energies at high crowder concentrations becomes challenging for FMAP and likewise for other methods. For FMAP, the high percentage of fully clashed /c combinations for the largest test protein studied here in the most concentrated LYS solution gives an indication of this challenge. A similar situation was observed in our previous study of disordered proteins in the presence of polyethylene glycol (Nguemaha et al., 2018). Extrapolation to higher crowder concentrations may provide a means to finesse this challenge.
While our interaction potential is atomistic, it is based on implicit solvent modeling. As a result, the treatment of hydration effects may prove inadequate, thereby limiting the accuracy of model predictions. In future studies we will investigate the performance of our atomistic model in quantitatively predicting experimental observables, including second virial coefficients.
As a final note, for determining the liquid-liquid phase equilibria of protein (Qin and Zhou, 2016) and colloid (Lomakin et al., 1996) solutions, it has been found that at least a thirdorder fitting is needed for the soft-attraction component of the excess chemical potential. In that case, a precise dependence of the chemical potential on protein concentration is key to determining the phase boundary, and hence one must calculate the chemical potential over as wide a range of protein concentration and cover the concentration range with as many points as can be done. For our protein-crowder systems, we find that, in a cubic fit, the second and third order terms have opposite signs and hence largely cancel each other, still leaving the first-order-term dominant.

AUTHOR CONTRIBUTIONS
VN conducted research and analyzed data. SQ prepared methods and engaged in discussion. H-XZ supervised research and wrote the manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb. 2019.00039/full#supplementary-material Three figures comparing two methods of error estimates (Figure S1), showing non-additivity of the contributions to µ s−a by the two types of soft interactions (Figure S2), and illustrating relatively small contributions to µ s−a by higher order terms in a perturbed virial expansion of µ ex ( Figure S3); and three supporting tables, presenting data on transfer free energies (Tables S1,S2) and geometric properties ( Table S3) of test proteins.