Mutation of Framework Residue H71 Results in Different Antibody Paratope States in Solution

Characterizing and understanding the antibody binding interface have become a pre-requisite for rational antibody design and engineering. The antigen-binding site is formed by six hypervariable loops, known as the complementarity determining regions (CDRs) and by the relative interdomain orientation (VH–VL). Antibody CDR loops with a certain sequence have been thought to be limited to a single static canonical conformation determining their binding properties. However, it has been shown that antibodies exist as ensembles of multiple paratope states, which are defined by a characteristic combination of CDR loop conformations and interdomain orientations. In this study, we thermodynamically and kinetically characterize the prominent role of residue 71H (Chothia nomenclature), which does not only codetermine the canonical conformation of the CDR-H2 loop but also results in changes in conformational diversity and population shifts of the CDR-H1 and CDR-H3 loop. As all CDR loop movements are correlated, conformational rearrangements of the heavy chain CDR loops also induce conformational changes in the CDR-L1, CDR-L2, and CDR-L3 loop. These overall conformational changes of the CDR loops also influence the interface angle distributions, consequentially leading to different paratope states in solution. Thus, the type of residue of 71H, either an alanine or an arginine, not only influences the CDR-H2 loop ensembles, but co-determines the paratope states in solution. Characterization of the functional consequences of mutations of residue 71H on the paratope states and interface orientations has broad implications in the field of antibody engineering.


INTRODUCTION
The rise of antibodies as important biotherapeutic proteins has sparked the interest in characterizing antibody structures and investigating structure-function relationships (1)(2)(3). Understanding the structural determinants and the involved conformational transitions governing antibody antigen recognition is critical for understanding antibody functions, in particular antibody specificity and consequently processes such as affinity maturation (4,5).
The antigen binding fragment (Fab) consists of a heavy and a light chain and can be divided into a constant and a variable domain. The variable fragment (Fv) exhibits the highest diversity of an antibody, as it is the focal point of somatic hypermutation and recombination events (6,7). This high diversity of the Fv is concentrated on six hypervariable loops, also known as the complementarity determining regions (CDRs), which form the antigen binding site, the paratope. To facilitate the structure prediction of antibodies, five of these six CDR loops have been assigned to so-called canonical clusters, assuming that they can only adopt a limited number of backbone conformations (5,(8)(9)(10)(11). Due to its unchallenged diversity in length, sequence and structure, no canonical clusters can be assigned for the CDR-H3 loop. Thus, structure prediction still remains challenging. In order to capture the high flexibility and diversity of the CDR-H3 loop and to functionally characterize all CDR loops, they have to be described as conformational ensemble in solution (12,13). Within the obtained CDR loop ensembles in solution, also transitions between the majority of canonical clusters and additional dominant solution structures were observed.
Together with the CDR loops, the relative V H -V L interdomain orientation plays an important role in determining the shape of the antigen binding site (4,(14)(15)(16). Various studies observed that mutations in the framework regions, in particular in the V H -V L interface, can result in structural changes of the binding site and hence can influence antigen recognition. Additionally, allosteric effects during antibody antigen binding have also been reported, involving conformational rearrangements in the constant domains (C H 1-C L ) and the elbow angle (17)(18)(19)(20)(21)(22).
The majority of V H -V L , C H 1-C L and elbow angle dynamics have been shown to occur in the low nanosecond timescale, while the slower components of the movements are strongly correlated with conformational changes in the CDR loops, which occur in the micro-to-millisecond timescale.
Based on these observations, antibodies were shown to exist as ensembles of paratope states in solution, which are defined by a characteristic combination of correlated CDR loop conformations and interdomain orientations. These paratope states interconvert into each other in the micro-to-millisecond timescale by synchronous loop and interdomain rearrangements.
In this study we combine a well-established enhanced sampling technique with classical molecular dynamics simulations to kinetically characterize the influence of mutations of the prominent residue 71 H (Chothia nomenclature) (8,23), on the conformational diversity on the CDR loops and the resulting paratope states in solution. Figure 1 illustrates the position (HV4 loop) and residue type of 71 H with respect to the CDR loops, which are color-coded respectively.

Structure Preparation
As starting structure for the simulations, we used the human germline antibody IGHV1-69/IGKV1-39 with the PDB accession code 5I15 (24). The mutant starting structure for the simulations was prepared in MOE (Molecular Operating Environment, Chemical Computing Group, version 2020.01) by mutating residue 71 H to an arginine instead of an alanine. The mutated structure with the mutation at position 71 H will be further referred to as mutant. Additionally, the two structures were then also protonated using the Protonate3D tool (25,26). Charge neutrality was ensured by utilizing the uniform background plasma approach in AMBER (27)(28)(29). Using the tleap tool of the AmberTools20 (27) package, the crystal structures were soaked in cubic water boxes of TIP3P water molecules with a minimum wall distance of 10 Å to the protein (30). The structures were described with the AMBER force field 14SB (31). The antibody fragments were carefully equilibrated using a multistep equilibration protocol (32).

Metadynamics Simulations
To enhance the sampling of the conformational space, welltempered bias-exchange metadynamics (33)(34)(35) simulations were performed in GROMACS (36,37) with the PLUMED 2 implementation (38). As enhanced sampling technique, we chose metadynamics as it allows to focus the enhanced sampling on predefined collective variables (CV). The sampling is accelerated by a history-dependent bias potential, which is constructed in the space of the CVs (33,35,39). As collective variables, we used a well-established protocol, boosting a linear combination of sine and cosine of the y torsion angles of all CDR loops calculated with functions MATHEVAL and COMBINE implemented in PLUMED 2 (13,38,(40)(41)(42)(43). As discussed previously, the y torsion angle captures conformational transitions comprehensively (44). The underlying method presented in this paper has been validated in various studies against a large number of experimental results. The simulations were performed at 300 K in an NpT ensemble using the GPU implementation of the pmemd module (45) to be as close to the experimental conditions as possible and to obtain the correct density distributions of both protein and water. We used a Gaussian height of 10 kJ/mol. Gaussian deposition occurred every 1,000 steps and a biasfactor of 10 was used. 500 ns of bias-exchange metadynamics simulations were performed for the prepared Fab structures. The resulting trajectories were clustered with the program cpptraj (28,46) using the average linkage hierarchical clustering algorithm with a distance cut-off criterion of 1.2 Å resulting in a large number of clusters. For the 5I15 antibody, we obtained 256 cluster representatives, while for mutant the clustering resulted in 279 cluster structures.The cluster representatives for the antibody fragments were equilibrated and simulated for 100 ns using the AMBER 20 (27) simulation package. Thus, the aggregated simulation time for the 5I15 Fab are 25.6 µs and for the mutant 27.9 µs. Additionally, in Figure S2 the cluster representative for both antibody fragments is illustrated.

Molecular Dynamics Simulations
As mentioned above, we performed for each obtained cluster representative 100 ns of classical molecular dynamics simulations. Molecular dynamics simulations were performed in an NpT ensemble using the pmemd.cuda module of AMBER 20 (28). Bonds involving hydrogen atoms were restrained with the SHAKE algorithm (47), allowing a time step of 2.0 fs. Atmospheric pressure (1 bar) of the system was set by weak coupling to an external bath using the Berendsen algorithm (48). The Langevin thermostat (49) was used to maintain the temperature during simulations at 300 K.
With the obtained trajectories, we performed a time-lagged independent component analysis (tICA) using the python library PyEMMA 2 employing a lag time of 10 ns. tICA was applied to identify the slowest movements of the investigated Fab fragments and consequently to obtain a kinetic discretization of the sampled conformational space (50).
tICA is a possible dimensionality reduction technique, detecting the slowest-relaxing degrees of freedom and facilitating the kinetic clustering, which is crucial for building an MSM (51). tICA is a linear transformation method, which linearly transforms a set of high-dimensional input coordinates to a set of output coordinates by finding a subspace of good reaction coordinates.
Based on the tICA conformational spaces, thermodynamics and kinetics were calculated with a Markov-state model (52) by using PyEMMA 2, which uses the k-means clustering algorithm (53) to define microstates and the PCCA+ clustering algorithm (54) to coarse grain the microstates to macrostates. Markov-state models are network models which provide valuable insights for conformational states and transition probabilities between them, as it is possible to accurately identify the boundaries between two states (52). The states are defined based on kinetic criteria, which allow identification of the boundaries between free energy wells. Basically, MSMs coarse-grain the system's dynamics, which reflects the free energy surface and ultimately determines the system's structure and dynamics. Thus, MSMs provide important insights and enhance the understanding of states and transition probabilities and facilitate a quantitative connection with experimental data (55,56).
We performed tICA analyses and calculated Markov-state models of both investigated variants for the whole paratope and for all individual CDR loops.
The sampling efficiency and the reliability of the Markov-state model (e.g., defining optimal feature mappings) can be evaluated with the Chapman-Kolmogorov test (57, 58) by using the variational approach for Markov processes (59) and monitoring the fraction of states used, since the network states must be fully connected to calculate probabilities of transitions and the relative equilibrium probabilities. To build the Markovstate model we used the backbone torsions of the respective CDR loop, defined 150 microstates using the k-means clustering algorithm and applied a lag time of 10 ns.
The canonical cluster representatives for each CDR loop, extracted from the PyIgClassify database (60), were projected into the free energy surfaces of all individual CDR loops. We then used the respective macrostate ensembles to investigate correlations between the different paratope states and the relative V H and V L orientations.

Relative V H and V L Orientations Using ABangle
ABangle is a computational tool (14,15,61,62) to characterize the relative orientations between the antibody variable domains (V H and V L ) using six measurements (five angles and a distance). A plane is projected on each of the two variable domains. To define these planes, the first two components of a principal component analysis of 240 reference coordinates were used for V H and V L each. The reference coordinate set consists of Ca coordinates of eight conserved residues for 30 cluster representatives from a sequence clustering of the nonredundant ABangle antibody data set. The planes were then fitted with those 240 coordinates, and consensus structures consisting of 35 structurally conserved Ca positions were created for the V H and V L domain. Between these two planes, a distance vector C is defined. The six measures are then two tilt angles between each plane (HC1, HC2, LC1, LC2) and a torsion angle (HL) between the two planes along the distance vector C (dc). The ABangle script can calculate these measures for an arbitrary Fv region by aligning the consensus structures to the found core set positions and fitting the planes and distance vector from this alignment. This online available tool was combined with an in-house python script to reduce computational effort and to visualize our simulation data over time. The in-house script makes use of ANARCI (63) for fast local annotation of the Fv region and pytraj from the AmberTools package (27) for rapid trajectory processing.

RESULTS
We use a well-established protocol combining enhanced sampling techniques and classical molecular dynamics to investigate the influence of residue 71 H on the whole paratope, the individual CDR loop dynamics and the respective relative V H -V L orientations. We used the human germline IGHV1-69/ IGKV1-39 antibody as starting structure for this study, which originally has an alanine on position 71 H . We aim to kinetically and thermodynamically characterize the effect of mutating only alanine 71 to arginine on the resulting ensembles of paratope states. Figures 2 and 3 show the free energy surfaces of the paratope of the IGHV1-69/IGKV1-39 antibody and the mutant in the same coordinate system, respectively. The calculated Markov-state model results for both investigated Fab variants in three macrostates, corresponding to the three paratope states in solution, which are illustrated in Figure 2B and Figure 3B. The most striking difference between the IGHV1-69/IGKV1-39 antibody and the mutant is the substantial population shift. The obtained macrostate trajectories from the Markov-state models were further used to calculate the relative interdomain orientations upon conformational changes in the paratope. For the IGHV1-69/IGKV1-39 antibody we clearly see a significant  To pinpoint the obtained global changes of the paratope to local CDR loop and interface rearrangements we also calculated free energy surfaces of the individual CDR loops. What can immediately be noticed is the substantial rigidification of the conformational space, accompanied by a population shift, from the IGHV1-69/IGKV1-39 antibody ( Figure 4A) to the mutant ( Figure 4C). Furthermore, within the CDR-H1 loop conformational space of the IGHV1-69/IGKV1-39 antibody ( Figure 4A) the majority of available canonical clusters are present within the sampled conformational ensemble; however, especially in this example, other dominant solution structures have to be considered, which are not apparent from X-ray structures. The free energy surface of the mutated antibody shown in Figure 4C, shows a substantial population shift towards the assigned canonical cluster and reveals a  rigidification, which is also reflected in less sampled canonical clusters of the CDR-H1 loop. Figures 4B, D illustrate the resulting Markov-state models with the respective state probabilities. The thickness of the arrows corresponds to obtained transition times which occur in the micro-tomillisecond timescale. The type of residue of 71 H has already previously been shown to co-determine the canonical structure of the CDR-H2 loop (9). Figure 4 shows the resulting free energy surface and the Markovstate models including the respective state probabilities for the CDR-H2 loop with and without the mutation at position 71 H . While conformational diversity of both CDR-H2 loop variants is comparable, mutating alanine at position 71 H to an arginine results in a strong population shift. Again, as described for the CDR-H1 also for the CDR-H2 loop, with a loop length of 10 residues, we sample the majority of available canonical clusters. Especially interesting is that the assigned canonical cluster H2-10-1 (PDB accession code: 2BDN), colored in red, for the IGHV1-69/IGKV1-39 antibody, lies in a local side-minimum, while the H2-10-2 (PDB accession code: 1SEQ) is close to the dominant minimum in solution ( Figure 5A). Figure 4C shows  Figure 6 shows the free energy surfaces of the CDR-H3 loop with and without mutating residue 71 H and reveals a higher conformational diversity for the mutant ( Figure 6C). This higher flexibility is also reflected in the number of resulting macrostates, four macrostates for the mutant and three macrostates for the IGHV1-69/IGKV1-39 antibody CDR-H3 loop and is accompanied by population shifts. Thus, also the CDR-H3 loop ensemble in solution is strongly influenced by the type of residue of 71 H .The Markov-state models for both variants are illustrated in Figures 6B, D, which show conformational rearrangements in microsecond timescale.

Light Chain CDR Loops
These conformational rearrangements and population shifts in the heavy chain as a consequence of mutating residue 71 H from an alanine to an arginine can also be observed for the V L -CDR loops. Our results illustrated in Figure S1 show that for all V L -CDR loops additional minima in solution can be identified. As all CDR loops are strongly correlated with each other, conformational changes observed for the heavy chain CDR loops have an effect on the light chain CDR loops as well. The flexibility of the CDR-H3 loop increases substantially as a consequence of the mutation and transfers this higher variability also on the V L -CDR loops.

DISCUSSION
In this study we thermodynamically and kinetically characterize the effect of a single point mutation at position 71 H for a human germline IGHV1-69/IGKV1-39 antibody on the paratope states in solution and give a structural and mechanistical explanation of the observed conformational changes. Various studies have already investigated the role of framework mutations on the CDR loops and the relative V H -V L interdomain orientations based on X-ray structures (9,64,65). Even allosteric effects involving mutations in the C H 1-C L and the elbow angle have been reported to influence the antibody binding site and consequentially antibody affinity and specificity (17)(18)(19)(20)(21)(22)(66)(67)(68). In particular, residue 71 H , has been discussed to determine the canonical conformation of the CDR-H2 loop, according to whether there is a bulky residue or a small side-chain present and thus bringing the CDR-H1 and CDR-H2 loops closer to each other (16,23,69). Recently, it was also indicated that paratope states in solution, including the relative V H -V L orientation, could be influenced by the type of residue of 71 H (68). Previous studies described this prominent role of residue 71 H determining the CDR-H2 loop structure by considering crystal structures and sequences of naturally occurring antibodies and their respective variations. However, among these antibodies the CDR-H2 loop sequence not only differed, but they revealed also a high diversity in length, sequence, and structure of other CDR loops. Thus, in the course of antibody humanization, various studies focused on understanding the function of residue 71 H on structure, antigen-binding, and stability and engineered identical antibodies differing only in the type of residue at position 71 H . Compared to the natural variations in this residue, functional differences could now be pinpointed to a single residue (69)(70)(71)(72). Already from the earliest antibody engineering efforts, it has been observed that biases in the natural repertoire, which contribute to folding and stability, are selected and contribute successfully to the design of antibodies and synthetic libraries. Thus, also different residue types at position 71 H could be used to finetune the functions of antibodies and to balance the benefits of functional diversity by combining features of natural and engineered repertoires (73,74). Especially interesting is that the 71 H residue belongs to the Vernier-zone residues, which have been discussed to play a critical role in the humanization and for the rational design of antibodies in general as they can influence antibody specificity and affinity (65,75,76). Additionally, residue 71 H is part of the DE loop, also called H4 loop, which joins strands D and E on the heavy chain variable domain (9). The H4 loop has been traditionally considered to be part of the antibody framework; however, it has been shown not only for antibodies but also for T-cell receptors that the H4 loop can directly interact with the antigen and thus, influence antigen binding (9,40). The fact that one single residue in the H4 loop can determine different paratope conformations in solution strongly supports the idea of highly correlated CDR loop movements, which interconvert into each other on the micro-to-millisecond timescale and favor specific interdomain orientations (12,13,41,62,68,77,78). Considering only one single static structure might not be sufficient to fully understand the consequences of point mutations on the resulting conformational diversity (42,79). In line with these observations, we show for the human germline IGHV1-69/IGKV1-39 antibody strong population shifts towards different dominant paratope conformations in solution, when substituting 71 H from an alanine to an arginine ( Figure 2 and Figure 3). Additionally, we also identified shifts in the relative V H -V L orientations depending on the type of residue of 71 H .
To get a better understanding of the global changes, we also analyzed the influence of mutating residue 71 H on the individual CDR loops and their respective dynamics and were able to identify substantial differences in the obtained conformational ensembles in solution. For the CDR-H1 loop shown in Figure 4, we do not only see a strong population shift but also substantial rigidification when mutating residue 71 H to an arginine, which can be explained by strong hydrogen bond interactions of the arginine with the sidechains of a serine 245 (occurrence 20.24%), tyrosine 247 (occurrence 16.66%) and hydrogen bond and pistacking interactions with the backbone of phenylalanine 244 (occurrence 22.74 and 10.3%). Figures 4A, C illustrate the free energy surfaces of the CDR-H1 without and with the substitution, respectively. In line with previous studies (12,13,78), we observe that different canonical clusters lie within the same dominant minimum in solution, which is especially true for the germline IGHV1-69/IGKV1-39 antibody CDR-H1 loop. Thus, these canonical clusters might be combined. Another interesting aspect is that the slowest movement of the CDR-H1 loop, described by the TIC1, represents the conformational transition from the assigned canonical structure to all other available canonical structures with a CDR-H1 loop length of 13 residues.
Apart from sampling all available canonical structures, we are able to identify an additional solution structure, which represents the dominant minimum in solution and is not apparent from Xray structures (Figures 4A, B). Upon substituting the alanine 71 H to an arginine, the populations of this dominant minimum are shifted towards the originally assigned canonical structure ( Figures 4C, D). The stabilization of the CDR-H1 as a consequence of the substitution of residue 71 H allows the CDR-H3 loop more degrees of freedom, which is reflected in the resulting conformational space illustrated in Figure 6. The increase in the flexibility of the CDR-H3 loop in Figure 6C, as a result of the stabilization of the CDR-H1 loop when mutating the alanine in position 71 H to an arginine, is also accompanied by a population shift (Figures 6C, D). For the CDR-H3 loop, due to its high diversity, no canonical structures were available and in agreement with previous studies, also here we see that the CDR-H3 loop needs to be characterized as conformational ensemble in solution (42,78). As already discussed in literature, the CDR-H2 loop canonical conformation is strongly influenced by the bulkiness of the residue at position 71 H , and even though we observe a similar conformational space, indeed strong population shifts towards different canonical structures of the obtained CDR-H2 loop ensembles in solution can be observed ( Figure 5).
Similar to the observations for the CDR-H1 loop, also for the CDR-H2 loop the majority of canonical clusters are present within the sampled conformational ensemble in solution, clearly following the concept of conformational diversity. The concept of conformational diversity was proposed by Pauling and revived by Milstein and Foote, who demonstrated that the same antibody sequence can adopt various different conformations, which does not only influence their binding properties, but also increases the effective size of the antibody repertoire (80)(81)(82). Our results show that the individual CDR loops and the whole paratope, including the relative V H -V L interdomain orientations, follow the concept of conformational diversity. Figure S1 illustrates the free energy surfaces of the CDR-L1, CDR-L2, and CDR-L3 loops, to investigate if the mutation at position 71 H also influences the conformational diversity of the light chain CDR loops. In all light chain CDR loops, the free energy surfaces for the mutant reveal a broader conformational space similar to the increase in flexibility which was observed for the CDR-H3 loop. A potential explanation for this higher flexibility when substituting an alanine to an arginine could be that the introduction of arginine residues can enhance the promiscuity of antibodies (42,83).

CONCLUSION
In conclusion we observe in line with previous results that the type of residue at position 71 H does not only influence the neighboring CDR-H2 loop, but also induces conformational rearrangements in the whole paratope. Thus, mutating the prominent residue 71 H to either an alanine or an arginine results in different paratope states in solution, which also favor specific relative V H -V L interdomain orientations. The results show that the antibody binding site exists as multiple paratope states in solution, with strongly correlated CDR loop and interdomain movements. This study raises the awareness of the strong correlations between the CDR loops and that one single static structure is not sufficient to capture the involved conformational changes and population shifts, which occur as a consequence of one single point mutation. Thus, we provide a new paradigm in the field of antibody engineering in the design of interconvertible paratope states in solution, which allows a full characterization of the antibody binding interface.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.