State-of-the-art and novel approaches to mild solubilization of inclusion bodies

Throughout the twenty-first century, the view on inclusion bodies (IBs) has shifted from undesired by-products towards a targeted production strategy for recombinant proteins. Inclusion bodies can easily be separated from the crude extract after cell lysis and contain the product in high purity. However, additional solubilization and refolding steps are required in the processing of IBs to recover the native protein. These unit operations remain a highly empirical field of research in which processes are developed on a case-by-case basis using elaborate screening strategies. It has been shown that a reduction in denaturant concentration during protein solubilization can increase the subsequent refolding yield due to the preservation of correctly folded protein structures. Therefore, many novel solubilization techniques have been developed in the pursuit of mild solubilization conditions that avoid total protein denaturation. In this respect, ionic liquids have been investigated as promising agents, being able to solubilize amyloid-like aggregates and stabilize correctly folded protein structures at the same time. This review briefly summarizes the state-of-the-art of mild solubilization of IBs and highlights some challenges that prevent these novel techniques from being yet adopted in industry. We suggest mechanistic models based on the thermodynamics of protein unfolding with the aid of molecular dynamics simulations as a possible approach to solve these challenges in the future.

protein of interest (PoI).Therefore, in this manuscript, "mild solubilization" describes attempts to solubilize aggregated protein in such a way that all existing misfolded structures are unfolded, whilst the highest possible amount of already existing correctly folded secondary structures are preserved.
Despite the advantages of mild solubilization, it is still necessary to add a sufficient amount of denaturing agent during solubilization to allow misfolded structures to unfold.Hall et al. (2018) showed that dimeric disulfide-linked recombinant human bone morphogenic protein-2 could be extracted from IBs without denaturation by using a buffer containing 4 M urea and 250 mM guanidinium hydrochloride (Gnd-HCl).However, the extracted protein showed no bioactivity.It was hypothesized that this was due to the incorrectly folded disulfide bridges and hydrophobic core.This was supported by refolding the protein after solubilization at 6 M Gnd-HCl, thereby producing a bioactive product.This example shows that successful mild solubilization has to balance preserving the correctly folded structures with the unfolding of any misfolded structures.
Several analytical methods have been established to evaluate the structural changes during the solubilization and subsequent refolding of IBs.Infrared (IR) and Raman spectroscopy can be used to track changes in the secondary and tertiary structure of proteins, as well as the formation of disulfide bonds (Pauk et al., 2021).FT-IR spectroscopy can even be used to differentiate and quantify intramolecular α-helix structures from amyloid-like βsheet bridges of IBs within intact cells (Ami et al., 2006).By deconvoluting the IR spectra, the contributions of individual structure types, such as α-helices or β-sheets, can be quantified (Umetsu et al., 2005).However, amide signals of the protein commonly overlap with prominent water and urea signals.Therefore, high protein concentrations are required to use IR and Raman spectroscopy.
Another spectroscopic method for protein structure analysis is circular dichroism (CD) spectroscopy (Clarke, 2011), which measures the difference in absorption of right-and left-circularly polarized light.The resulting spectra show characteristic bands for the individual secondary protein structure types.
If the (un)folding occurs as a two-state reaction, differential scanning calorimetry (DSC) can be used to measure the enthalpy of unfolding (Ionescu et al., 2008).This is especially important to characterize the thermodynamics of the folding states.
Dynamic light scattering (DLS) is a widely used method to measure particle size distribution.This information can be used to track aggregation processes.Moreover, this technique can also be used to measure the hydrodynamic radius of proteins.As the protein unfolds, its hydrodynamic radius increases, thus enabling DLS to monitor the unfolding process (Yu et al., 2013).
Finally, intrinsic tryptophan (= Trp) fluorescence is a very robust method often used to track in situ refolding.The observed fluorescence maximum shifts as Trp residues in the protein get buried within the hydrophobic core during the folding process (Duy and Fitter, 2006).Additionally, acrylamide quenches Trp fluorescence via an entirely physical mechanism.This can be used to determine the Stern-Volmer constant and therefore quantify the amount of Trp residues located within the core of the protein versus those positioned towards the bulk medium (Tallmadge et al., 1989;Upadhyay et al., 2016).
In many protein folding studies, CD spectroscopy is used as a secondary analysis method alongside intrinsic Trp fluorescence.As a larger number of structural groups (amide bonds, aromatic amino acids, disulfide bonds) contribute to the information gained (Clarke, 2011;Pauk et al., 2021), CD spectroscopy is able to give detailed information about the structure of the protein.Meanwhile, Trp fluorescence offers high sensitivity and is compatible with high solute concentrations.These traits also make Trp fluorescence spectroscopy an excellent option to be considered as an online PAT tool for solubilization and refolding processes.

Ionic liquids as mild solubilization agents
ILs are salts in a liquid state at temperatures below 100 °C that have gained much attention in IB processing due to their adaptable physicochemical properties and environmental benefits.ILs have shown great potential as mild solubilization agents, dissolving aggregated protein whilst preserving native secondary structures of proteins (Fujita et al., 2016).Furthermore, they can help refold chemically denatured protein by replacing urea or Gnd-HCl from its solvation layer due to preferential interaction (Singh and Patel, 2018;Sindhu et al., 2020).Both of these properties could help to intensify refolding processes and lower their environmental burden by reducing the need of diluting the solubilizate.The influence of the most common IL ions on protein folding has been recently reviewed by Guncheva (2022).
Similarly, deep eutectic solvents (DES) are a subclass of ILs which has been recently investigated in protein stability studies (Yadav and Venkatesu, 2022).DES are mixtures of salts that are liquid at room temperature since the eutectic mixture has a significantly lower melting point compared to the individual components (Abbott et al., 2003).Although there have been several interesting studies concerning the conformational stability and folding state of proteins in these solvents (Niknaddaf et al., 2018;Kist et al., 2019;Sanchez-Fernandez et al., 2022), to our knowledge, there is no literature available yet regarding the solubilization of aggregated protein in DES.This emerging field of research is important for industrial applications, as many DES are comprised of cheap and biocompatible bulk chemicals (Gonçalves et al., 2021;Jesus et al., 2023;Usmani et al., 2023).Furthermore, DES are capable of being recycled, providing an economical and environmental advantage over traditional chaotropic agents (Prabhune and Dey, 2023).
Despite these potential benefits, ILs and DES are very challenging to fit into the currently established IB process design strategy.The early development of the chemical environment for solubilization and refolding is still carried out by empirical and elaborate screening experiments, as summarized in a recent review (Buscajoni et al., 2022).A schematic depiction of the currently established process development steps is shown in Figure 1.
The initial selection of buffer components is based on experience and reviewing the literature.Therefore, the design space is usually constrained to a short list of already established chemicals.Screening experiments are generally done in a DoE approach, maximizing responses, such as the solubilization efficiency, final product concentration after refolding, or the refolding yield, using the response surface method (Ahmadian et al., 2020).This approach can be iterated with adapted design spaces until an optimum for the desired response is found.The resulting model is then used to define the process parameters.Alternatively, data-driven models can predict optimal process parameters based on experimental data.A recent publication (Walther et al., 2022) has shown such a datadriven integrated process model for the solubilization, refolding, and purification steps in an industrial setting.This empiric approach based on statistics efficiently optimizes a set of quantitative process parameters.Still, the choice of the initial design space, based on experience, is highly influential on the final process, and the results cannot be transferred between different PoIs.Furthermore, if a comprehensive list of chemicals and their combinations are considered as buffer components, the experimental designs become very expensive in time and resources.
Besides some technical constraints (i.e., high pressure), this seems to be one of the reasons that, besides alkaline pH, none of the developed mild solubilization techniques are applied in industrial processes yet.Especially ILs are very hard to integrate into the screening-based approach due to the sheer amount of possible ion pair combinations.Another disadvantage of the current design method is the lack of generated platform knowledge.Therefore, this cumbersome and time-intensive process has to be repeated for each new PoI.The missing platform knowledge is also problematic when the push toward quality by design (QbD) principles is considered (ICH, 2017;Beg et al., 2019).
One approach to generate this knowledge is the formulation of mathematical models for each process step.However, for the solubilization and refolding steps, there is still a lack of mechanistic models describing the effects of the chemical environment.The refolding step is usually described by kinetic models, parametrizing reaction rates from the denatured to intermediate, native, or aggregated states (Jungbauer and Kaar, 2007;Pauk et al., 2021).The kinetics of solubilization has been shown to be predominantly dependent on pore diffusion resistance into the IB particles (Walther et al., 2013;Walther et al., 2014).However, while these kinetic models present an important tool to describe the influence of factors like protein concentration and process times, they do not help in choosing a suitable buffer composition for mild solubilization.To address these shortcomings of the currently established process design approach, the authors want to suggest the use of two theoretical tools: 1) Thermodynamic unfolding models as a way to describe and predict the solubilization process more precisely, and 2) Molecular dynamics simulations (MDS) to predict the interactions of the PoI with a wide array of chemicals.

Mechanistic models and molecular dynamics simulations for solubilization prediction
While no mechanistic models specifically describing the solubilization of IBs have been published so far, the thermodynamics of protein folding has been studied extensively.The chemical denaturation of a protein in an aqueous system occurs because a denaturant preferentially binds to the protein over water.Since the unfolded state provides a greater number of interaction sites for the denaturant, this confirmation is energetically favored, and the protein unfolds.Early observations showed that the free energy of unfolding in water linearly correlates to the denaturant concentration (Pace et al., 2008).This linearity can be explained by the protein-solvent interaction behaving more similarly to a solvent exchange than to covalent binding (Schellman, 2002;Pace et al., 2008).The slope of this systemspecific linear correlation is called the "m-value" (Greene and Pace, 1974) and can be used to estimate the free enthalpy of unfolding within water.While this m-value helps quantify "denaturation power" in a system (Magsumov et al., 2020), it is still an empiric parameter without a clear mechanistic interpretation (Wakayama et al., 2019).To formulate a mechanistic model, physically defined parameters are required, as exemplarily proposed by Hall et al. (2018).This model describes chemical denaturation via six parameters, representing a countable number of interaction sites for both the folded and unfolded state, distinctive binding constants for different groups of interaction sites, and the intrinsic stability of the native protein structure.However, this model was still built on insights gained from denaturation experiments using urea and Gnd-HCl as denaturants.Among other aspects, Wakayama et al. (2019) expanded the existing thermodynamic models by considering the possibility of stronger binding interactions and alternative denaturing factors, such as high pressure or temperature, thus, potentially creating a mechanistic basis for physical methods of mild solubilization and new denaturants like ILs.
However, this model still does not resolve the problem of choosing a suitable denaturant without screening all options, as the model parameters are always specific for a definite proteinsolvent system.Instead of limiting the possibilities to urea and Gnd-HCl, it would be required to estimate a protein's solubility in a wide range of alternative solvents, being especially relevant to ILs.
The solubility of protein can be partly estimated using the Hofmeister series (Hofmeister, 1888), which has developed into a series comparing individual cations and anions, qualitatively describing their influence on protein solubility.These Hofmeister effects were initially attributed to the ions increasing or decreasing the H-bond structuring of water.The H-bond structure was assumed to affect the hydration layer of the protein, explaining the influence on solubility.While this is partly true, extensive research revealed that the actual mechanisms are a far more complex mixture of Coulombic and disperse interactions, excellently reviewed by Schröder (2017) within the context of ILs.These newer mechanistic insights into protein-solvent interactions are heavily based on computational science, especially MDS.These simulations have been used to quantify the effect of different ions on the solubility of hydrophobic solutes (Thomas and Elcock, 2007), thereby differentiating the term "Hofmeister effects" into three categories of individually quantifiable contributions: 1) water-water hydrogen bonding (or "water structuring" in the Hofmeister context).2) free energy of the hydrophobic interaction between solutes and 3) preferential interaction of ions and solutes.
To investigate the specific interaction of individual solvent molecules with proteins, MDS have become state-of-the-art (Schröder, 2017;Ferina and Daggett, 2019;Otzen et al., 2022;Sinha et al., 2022).In these simulations, the motions of a protein molecule and the surrounding solvent molecules are simulated under the influence of their respective force fields.MDS have been used to gain insight into the distance and orientation between proteins and other solutes/solvents (de Oliveira and Martínez, 2020;Otzen et al., 2022), protein-protein interactions, such as aggregation (Loureiro and Faísca, 2020), as well as preferred interaction tendencies between multi-component mixtures (Ghosh et al., 2017).These insights might be used to formulate an early prediction of suitable chemical environments to solubilize IBs, without the need to conduct a single experiment in the lab. Figure 2 shows a potential extension of the currently established process development approach using the suggested methods.
MDS could be used to investigate the protein-solvent interactions for a comprehensive list of chemicals.These simulations could give a first estimation of the parameters that describe the denaturation curve, determining the concentration range in which the PoI is partially folded.Thus, novel solubilizing agents could be considered without additional screening experiments in the laboratory.Furthermore, basing experimental designs on physical parameters could be a key step to generate transferable process knowledge, leading to platform technology and QbD.While the significant computational cost of MDS must be considered (Sinha et al., 2022), there have been significant recent advances limiting this downside.Besides methodological improvements (Dominic et al., 2023), cloud- computing approaches (Zimmerman et al., 2021) and artificial neural networks (Tsai et al., 2022;Dominic et al., 2023) are significantly improving real-time efficiency.Furthermore, the optimization of simulation software for graphics processing units (GPUs) has made MDS feasible for consumer-grade hardware (Hollingsworth and Dror, 2018).The impact of GPU technology on the availability of MDS can be illustrated by comparing two studies that benchmark the hardware available at the time of their publication.In 2013 an NVIDIA GTX-TITAN was able to simulate dihydrofolate reductase, a 21.5 kDa protein comprised of 23,558 atoms, for 110.65 ns per day of computation (Salomon-Ferrer et al., 2013).Six years later, an NVIDIA RTX 2080 generated the same simulation times for an 80,000-atom membrane protein embedded in a lipid bilayer, including the surrounding water and ion molecules (Kutzner et al., 2019).In comparison, a recent study by Piccoli and Martínez (2022) used MDS to predict the denaturing effects of four ILs on ubiquitin by generating 3D structures of denatured ubiquitin with two longer simulations of 50 and 100 ns, respectively.Then MDS of only 10 ns each could be used to investigate the interactions of the ILs with the protein in different folding states.Considering these simulation times, current consumer hardware is likely to be powerful enough for the proposed use of MDS.

Conclusion and outlook
The mild solubilization approach aims to preserve protein structures present in IBs to increase refolding yields by reducing the reaggregation of the solubilized PoI.This has been empirically done by reducing urea and Gnd-HCl concentrations during the solubilization while using alkaline pH, high pressure, detergents, organic solvents, and freezethaw cycles to increase protein solubility.Recently, ILs have been investigated, both as a very promising method to solubilize protein aggregates without unfolding their nativelike structure, as well as refolding additives that counteract the effects of the traditional denaturants.However, due to the number of possible ion combinations, the established process design method of empirically screening buffer components quickly leads to an overwhelming number of experiments.Furthermore, this approach does not generate platform knowledge and has to be repeated for each new PoI.
To improve the established design process, MDS could be used in combination with mechanistic models that describe the thermodynamics of protein (un)folding to base experimental designs on physical parameters and improve process understanding.The current advances in the field of machine learning, and algorithms like AlphaFold, have made significant progress toward sequence-based protein structure prediction (Nussinov et al., 2022).These novel tools could soon be used to obtain the main requirement for MDS, a detailed 3D structure of the protein, without resource-intensive protein structure analytics.Combining these knowledge-based simulation tools might even enable the prediction of an entire refolding process based on the sequence of the PoI.

FIGURE 1
FIGURE 1 Schematic representation of the established approach to solubilization process development.The factors of the first statistical design of experiments (DoE) are chosen based on experience and literature, while the ranges are limited by the technical feasibility of the screening experiments.After evaluating the results, the DoE is iterated with adapted design spaces until sufficiently optimized process parameters are found and transferred to larger scales.

FIGURE 2
FIGURE 2 Schematic description of solubilization process development including the proposed additions highlighted by green dashed lines.Molecular dynamics simulations (MDS) are used to determine a first estimation for the parameters of a mechanistic model describing protein denaturation.Based on these values, a range of promising denaturants and buffer components are picked for the initial experimental design.The iterative optimization loop feeds back into the denaturation model, which is used as an input for the next design space.