Using Global Analysis to Extend the Accuracy and Precision of Binding Measurements with T cell Receptors and Their Peptide/MHC Ligands

In cellular immunity, clonally distributed T cell receptors (TCRs) engage complexes of peptides bound to major histocompatibility complex proteins (pMHCs). In the interactions of TCRs with pMHCs, regions of restricted and variable diversity align in a structurally complex fashion. Many studies have used mutagenesis to attempt to understand the “roles” played by various interface components in determining TCR recognition properties such as specificity and cross-reactivity. However, these measurements are often complicated or even compromised by the weak affinities TCRs maintain toward pMHC. Here, we demonstrate how global analysis of multiple datasets can be used to significantly extend the accuracy and precision of such TCR binding experiments. Application of this approach should positively impact efforts to understand TCR recognition and facilitate the creation of mutational databases to help engineer TCRs with tuned molecular recognition properties. We also show how global analysis can be used to analyze double mutant cycles in TCR-pMHC interfaces, which can lead to new insights into immune recognition.


INTRODUCTION
T cell receptors (TCRs) are clonotypic membrane proteins on the surface of T cells. TCRs are responsible for recognizing peptide antigens bound and "presented" by major histocompatibility complex (MHC) proteins. TCR recognition of a peptide/MHC complex (pMHC) drives the initiation and propagation of a cellular immune response as well as the development and maintenance of the T cell repertoire. TCR recognition of pMHC is also involved in conditions such as autoimmunity and transplant rejection and is central to new biologic and cellular therapies for cancer and infectious disease. Given the central role these interactions play in human health, there has been significant interest in the physical mechanisms underlying TCR-pMHC recognition.
The TCR-pMHC interaction is marked by the structural and genetic complexity of its interface . Structurally, TCRs are similar to antibodies, comprising four immunoglobulin domains and an antigen binding site with complementarity determining region (CDR) loops generated through genetic recombination and nucleotide editing. However, TCRs and antibodies differ in the nature of the antigen they recognize. Whereas antibodies can be elicited against antigens of nearly unlimited structural diversity, TCRs recognize a composite surface consisting of the antigenic peptide bound in a groove formed of flanking α helices and a β sheet floor ( Figure 1A). The peptide contributes ∼30% of the recognized surface (Rudolph et al., 2006), indicating a significant contribution from the MHC to binding. This combined recognition of a highly diverse non-self-component (the peptide) in the context of a less-diverse self-component (the MHC) is a fundamental aspect of cellular immunity (Zinkernagel and Doherty, 1974).
To engage this combined ligand, the TCR often, although not exclusively, aligns its most diverse "hypervariable" CDR3 loops over the peptide, and the less diverse "germline" CDR1/CDR2 loops over the α helices of the MHC peptide binding groove. This alignment, however, does not always translate into interatomic contacts, as numerous contacts are typically formed to the peptide by germline loops and contacts to the MHC by hypervariable loops. Understanding the "roles" played by the different loops, the peptide, and the MHC protein in driving TCR selection of ligand and ultimately T cell biology has remained a long-standing immunological challenge, and has become even more relevant with the advent of engineered TCRs as therapeutic reagents.
Much of the current understanding of TCR recognition of pMHC has come from the growing collection of structures and subsequent interrogation by binding experiments . Mutational analysis is regularly used to probe the contributions of different components of the TCR and pMHC. Alanine scanning mutagenesis has revealed binding "hot spots" on TCR and MHC surfaces (e.g., Manning et al., 1998;Baker et al., 2001;Borg et al., 2005). More targeted mutagenesis has been used to explore particular hypotheses (e.g., Wu et al., 2002;Feng et al., 2007;Borbulevych et al., 2009;Piepenbrink et al., 2009a), or in conjunction with computational design, generate variant TCRs with engineered binding properties (Zoete et al., 2013;Malecek et al., 2014;Pierce et al., 2014;Riley et al., 2016). Recently, double mutant cycles have been used to explore TCR recognition of pMHC (Piepenbrink et al., 2013;Blevins et al., 2016). Common to these and dozens of other mutational studies is quantitative FIGURE 1 | TCR-pMHC structural overview and simulated binding data. (A) Structure of a TCR-pMHC complex, showing the TCR bound to the composite pMHC ligand. (B) Simulated TCR-pMHC binding data. The wild-type data is simulated with a K D of 2 µM and reaches 99% saturation, and mutant data with a K D of 100 µM reaches ∼50% saturation. assessment of the impact mutations have on binding energy, or the G • . A challenge in studying TCR-pMHC binding is that the affinities are generally weak, with K D -values in the micromolar range or weaker (Matsui et al., 1994;Davis et al., 1998;Cole et al., 2007). Weak affinities can lead to difficulties in measuring the effects of mutations, resulting in inaccurate and imprecise G • -values. Although "thresholds" can be helpful (i.e., the G • is greater than a cutoff; Borg et al., 2005), accurate measurements permit better comparison with structural information and relationships between structural properties and binding energies. With inaccurate measurements, incorrect conclusions about the functional roles positions play in TCR binding and T cell biology can even be made (Gagnon et al., 2005).
Fortunately, there are ways to design binding experiments that avoid the common pitfalls of low affinity interactions. A commonly applied method is to perform a global analysis of multiple data sets collected simultaneously (Beechem, 1992). This can be used with a range of experimental approaches, such as titration calorimetry and surface plasmon resonance (SPR), two common methods for studying TCR-pMHC interactions. Another approach is to fix an unknown experimental parameter by pre-determining it in a separate experiment. For example, in SPR, pre-determining the functional density of a sensor chip and constraining this parameter in subsequent experiments allows measurements with fixed surface densities that reach only 30% saturation to have the accuracy of those that reach 90% saturation when both G • and surface density are fitted parameters (Piepenbrink et al., 2009b). Combined, global analysis of multiple datasets with constrained (or shared) experimental parameters should be expected to allow further gains in accuracy. Thus, appropriate experimental design and analysis can be used to obtain accurate G • -values for very low affinity interactions, such as those seen with mutant TCRs and pMHC complexes.
Here, we used simulated and real TCR binding data to show how accurate and precise G • -values can be obtained for very low affinity TCR-pMHC interactions. We demonstrate that global analysis with shared parameters significantly increases accuracy and precision, and can easily be applied to low affinity interactions. We also show how this procedure can be extended in the design and execution of double mutant cycle experiments, which can be used to probe TCR-pMHC interfaces and determine the contributions of various regions in more detail than traditional single amino acid mutagenesis (Piepenbrink et al., 2013;Blevins et al., 2016).

RESULTS AND DISCUSSION
Global Analyses Yield More Accurate and Precise G • measurements for Perturbed Data Global analysis of multiple datasets with shared parameters has long been used to extend the accuracy and precision of binding experiments (Beechem, 1992). To establish a baseline of the applicability of this approach for studying TCR-pMHC interactions, we simulated TCR-pMHC binding data reflective of a TCR-pMHC binding experiment that uses SPR to determine the G • associated with a MHC or peptide mutation ( Figure 1B). The K D for the wild-type measurement was set at 2 µM ( G • = −7.77 kcal/mol), characteristic of a "high affinity" TCR recognizing a viral antigen. The K D for the mutant measurement was set at 100 µM ( G • = −5.45 kcal/mol). The TCR surface activity (RU max ) was set at 150 response units (RU). Data for both datasets included 10 pMHC injections over the concentration range of 0.25-100 µM. The wild-type dataset reached >99% saturation, whereas the mutant dataset reached only ∼50% saturation.
The simulated data was then fit individually to determine the G • for the wild-type and mutant data and thus the G • . Individual fits floated both RU max and G • . The same two datasets were also fit globally, in which the RU max was a global (or shared) parameter and the two G • -values were local parameters. The G • -values determined either way were indistinguishable, and identical to the actual value of 2.32 kcal/mol, with negligible error. This is expected as the simulated data were unperturbed, lacking the experimental noise that would arise in real experiments. To better test the robustness of global vs. individual analysis, a series of perturbations were added to the data to simulate noise/error arising from the instrument or user. To ensure that any introduced noise would be reflective of that seen in real experiments, the reduced χ 2 -values from 50 separate, individually fit actual TCR-pMHC SPR binding experiments (described below) were averaged, yielding a value of 22%. Based on this value, Gaussian-distributed random noise between 5 and 30% of the value of each data point was applied to the simulated data in a series of steps. This yielded 10 sets of perturbed wild-type and mutant datasets with increasing noise as shown in Table 1. Figure 2 shows the G • -values obtained from individual and global fitting of the pairs of perturbed wild-type and mutant simulated datasets. While the average G • -value for both individual and global fitting sets is similar (individual fitting average of 2.29 kcal/mol, global fitting average of 2.35 kcal/mol), global fits more faithfully reproduced the actual value of 2.32 kcal/mol and showed far less variance across the 10 datasets (standard deviation of 0.51 kcal/mol from individual fitting, almost 5-fold larger than the standard deviation of 0.11 kcal/mol from global fitting). The error associated with global fitting was substantially smaller than the error associated with individual fitting (average of 0.56 kcal/mol for individual fitting, average of 0.18 kcal/mol for global fitting). F-tests confirmed that the variances of the values and their error differed significantly. Thus, with appropriate experimental design, global fitting is both more accurate and precise for determining G • -values.

Discrepancies Between Individual and Global Fits for Experimental Determinations of G • -Values
We next examined individual vs. global fitting for determining G • -values in actual experimental data. We used the two fitting approaches with data for 25 pairs of wild-type/mutant  (Table S1 and Figure S1; Piepenbrink et al., 2013;Blevins et al., 2016). In some cases, TCR mutations or peptide variants were used that enhanced affinities (Piepenbrink et al., 2009a;Ekeruche-Makinde et al., 2012;Cole et al., 2014;Pierce et al., 2014). The datasets spanned a wide range of affinities, with individual fits giving K D -values ranging from nanomolar to micromolar. For the 25 pairs of data, we determined the G • -values using both individual and global fits as above, and calculated the absolute value of G • global fit -G • individual fit, referred to as | ( G • )|. As shown in Figure 3, there is wide variation in | ( G • )|. Some datasets showed essentially no difference between individual and global fitting (e.g., datasets 1, 3, 14), while others had much more substantial differences (e.g., datasets 10, 19). The average | ( G • )|-value among the 25 pairs of experiments was 0.48 kcal/mol. To determine cases of particular interest, a 90% confidence cutoff of the standard deviation, equal to a Z-score of 1.64, was used. Eight datasets fell outside of this range, which amounts to a | ( G • )| of 0.62 kcal/mol (asterisks in Figure 3).
There was no correlation between the eight datasets with | ( G • )| ≥ 0.62 kcal/mol and the apparent affinities of one or more of the binding experiments, as these sets included apparent affinities ranging from 100 nM to 200 µM. The range of the measured G • -values was similarly wide, ranging from small  Frontiers in Molecular Biosciences | www.frontiersin.org (0.05 kcal/mol) to large (3 kcal/mol). The remaining 17 datasets were also dispersed within these ranges. This suggests that the large differences in fitting methods for the nine outlier datasets result from the influence of noise and error that is unique to individual experiments, as opposed to being directly correlated with strength of binding or G • .

Effects of Additional Noise and Missing Data Points on Experimental Data
To further probe the differences seen between individual and global fitting, the eight experimental datasets with large G • discrepancies when fit with individual vs. global fitting were further perturbed. Additional Gaussian error was introduced in stepwise increments (5, 10, and 15%) to both wild-type and mutant datasets as described above, and partial datasets for either wild-type or mutant were generated by deleting three consecutive points at the beginning, middle, and end of the wild-type or mutant titrations (Table 1; missing points may arise, for example, when bubbles occur during a sample injection). For all eight experimental datasets, this generated nine additional pairs (three pairs with noise added to wild-type and mutant data, three pairs with deleted data in the wild-type experiment, and three pairs with deleted data in the mutant experiment). The resulting 72 highly perturbed datasets were fit individually or globally to determine G • -values. For all eight datasets, the G • -values for individual fitting were once again highly sensitive to perturbations, whereas the global fits were more robust. The individually fit G • -values differed from their original, non-perturbed values by anywhere from 51 to 123%. For global fitting the range was 0.1-38%. The standard deviation of the average G • -values for the individual fits was 0.65 kcal/mol, 5-fold greater than the value of 0.14 kcal/mol for global fits. Figure 4 illustrates this outcome for two datasets. In Figure 4A (dataset 17), the individually fit data shows much larger deviations from the original G • -value than the globally fit data. The standard deviation of the G • -values is three times larger for the individually fits compared to the global fits (0.3 kcal/mol for individual vs. 0.1 kcal/mol for global). Figure 4B (dataset 2) shows another example of discrepancies in G • -values from the two fitting methods. Two of the perturbed series fit individually have excessively large G •values (+8 and −7.9 kcal/mol). These are undoubtedly incorrect and highlight the propensity of individual fitting to yield gross inaccuracies with noisy data. In comparison, all nine of the global fitting perturbation series showed reasonable G • -values with a standard deviation of only 0.08 kcal/mol. Notably, the two datasets that were wildly inaccurate when fit individually were not distinctive when fit globally.

Inaccurate Single Fitting Stems from Overfitting of Lower Affinity Measurements
It is notable that the average G • for all 25 datasets was smaller for individual fitting than global fitting (0.81 kcal/mol for individual vs. 1.25 kcal/mol for global). This was also reflected in the average G • -values for the eight "outlier" datasets in  In most cases, there was good agreement between the wildtype K D (or G • ) values when obtained through individual or global fits as the average difference between individual and global fitting for wild-type measurements was 0.1 kcal/mol. In contrast, there was substantial variation for the lower affinity mutant datasets: the average difference in G • -values here was 0.8 kcal/mol. This discrepancy arose because when singly fit, the mutant datasets appeared to overestimate the saturation levels. This was indicated by the fitted RU max -values, which were reduced by nearly half in the eight outlier data sets in Figure 3 (RU max for individually fit mutant data was on average 59% of the RU maxvalue in the wild-type or globally fit data). As lower-than-true RU max -values indicated greater saturation, this led to stronger mutant K D -values. An example of this occurrence is shown in Figure 5 for dataset 17 of the experimental data. Dataset 17 involves a variant of the DMF5 TCR interacting with both wild-type and mutant HLA-A2 protein presenting the MART1 peptide. In the titration with wild-type HLA-A2, good saturation is achieved, yielding a RU max of 378 ± 14. However, when mutant HLA-A2 complex is injected over the same surface, the individual fit to the data yields a reduced RU max of 190 ± 15. The apparent increase in saturation yields a K D of 80 ± 13 µM, for a G • of −1.27 ± 0.15 kcal. With global fitting, the RU max for the mutant is determined at 377 ± 11. The K D for the mutant interaction is then determined instead to be 250 ± 15 µM, and the G • is −1.97 ± 0.09 kcal/mol. This tendency to overestimate the degree of saturation and thus affinities of mutant interactions was not observed with the individually fit, error-modified simulated data, despite varying the G • between high and low affinity data and adding additional noise. While individually fit simulated data did show greater fitting errors and larger ranges for the recovered mutant G • -values (off by 2-to 5-fold from the actual value value), there was an even distribution between values above and below the actual G • -value. The fitted RU max -values matched this trend: approximately half of the noisy data sets had RU max -values higher than the actual value while half were lower.
The reason for the apparent enhancement in saturation for the lower affinity experimental data could have multiple causes. As all data were collected using SPR, it is possible that the sensor chip surface is deteriorating during the experiment. However, this would be expected to impact both high and low affinity experiments, and should impact both global and individual fits, which was not observed. Further, control experiments altering the order of injections (i.e., mutant then wild-type) and TCR vs. pMHC orientations on the sensor surface did not alter the outcome. A more likely possibility is that injection of high protein concentrations leads to surface aggregation, blocking available receptor binding sites and altering the shape of the curve, while simultaneously leading to higher RU units with each injection. As the curve shape and saturation point is restricted by the greater information content (curvature) in the higher affinity data, global fitting would be less subject to this occurrence. This is related to the "pseudo-saturation" profiles discussed by Myszka and colleagues in their review of SPR literature Myszka, 2006, 2008). Regardless of the underlying causes, however, the results are clear that global fitting minimizes the impact of experimental artifacts present in SPR-based analysis of wild-type/mutant TCR-pMHC interactions.

Extending the Applicability of Global Analysis to Perform Double Mutant Cycle Experiments
Double mutant cycle experiments are frequently used to determine the interaction free energy between two sidechains (Horovitz, 1996). Double mutant cycles can be particularly helpful in studying TCR-pMHC interactions, as they allow investigators to isolate side-chain interactions and study interactions between separate components of the interface, such as the peptide, MHC α helices, or the various TCR CDR loops (Piepenbrink et al., 2013;Blevins et al., 2016). Double mutant cycles require three G • measurements: two with two single mutants (one on the TCR, one on the pMHC), and one with a double mutant (mutations on both molecules). As a construct of G • measurements emerging from four separate binding experiments, global analysis can therefore be used to enhance accuracy and precision of double mutant cycles.
Global analysis of SPR-based double mutant cycle experiments is simply an extension of the G • measurements described above. Both wild-type and mutant TCRs can be coupled to adjacent flow cells, over which wild-type and mutant pMHC will be injected. This design allows for parameter sharing, with the RU max constrained for both surfaces. A single global fit can thus be used for all four G • measurements that make up the cycle. An example is shown in Figure 6, which illustrates FIGURE 5 | Individual analysis of mutant datasets tends to overestimate degree of saturation, inflating K D and reducing G • . (A) Individual fits of the DMF5 TCR binding to wild-type (black) and mutant (red) pMHC. Although the exact same surface was used for both titrations, the fitted RU max differed between the wild-type and mutant experiments. The lower value for the mutant interaction led to an inflated K D and smaller G • . (B) When both datasets are fit globally with RU max as a shared parameter, the reported value is the same as the high affinity value in the wild-type single analysis. The mutant K D was thus weaker and G • smaller.
FIGURE 6 | Example of a double mutant cycle analysis in a TCR-pMHC interface. When the double mutant cycle exploring the interaction between Trp101 of the A6 TCR and Ala69 of HLA-A2 was fit globally, the reported interaction free energy was 0.0 ± 0.3 kcal/mol. If the four datasets making up the interaction were individually fit, the resulting interaction free energy was −1.0 ± 0.2 kcal/mol. a double mutant cycle measuring the interaction between Trp101of CDR3α of the A6 TCR and Ala69 of the α1 helix of HLA-A2 presenting the Tax peptide. The four data sets are fit simultaneously, and the free energy of interaction between the two side chains is measured as 0.0 ± 0.3 kcal/mol. Fitting each experiment independently yields a very different value of −1.0 ± 0.2 kcal/mol, but is subject to the errors and complexities described above.
Previously published data further illustrates the robustness of global analysis in double mutant cycles (Piepenbrink et al., 2013). The free energy of interaction between Glu30 of CDR1α of the A6 TCR and Tyr8 of the Tax peptide presented by HLA-A2 was measured as −1.7 ± 0.1 kcal/mol. In this case the G • for the double mutant interaction was very weak. This measurement was therefore repeated in the background of a variant of the A6 TCR that bound with 100-fold higher affinity (Haidar et al., 2009;Cole et al., 2013). The resulting free energy of interaction was −1.6 ± 0.2 kcal/mol, identical within error to the first measurement.

Concluding Remarks
Here, we demonstrated how global analysis of multiple binding datasets can be used to substantially extend the accuracy and precision of TCR-pMHC binding experiments. We emphasized SPR experiments, which continue to dominate quantitative assessments of TCR-pMHC binding. However, the approaches described herein can be applied to other experimental techniques, such as calorimetric and fluorescence experiments (Piepenbrink et al., 2009b). Beyond improving G • measurements, the approach facilitates more complex experimental designs such as double mutant cycles, which provide the opportunity for novel insight into TCR recognition properties. As experimental studies of TCR-pMHC binding segues into rational design for new biologics and therapeutics, which often require the generation of large training sets , the improved reliability available through global analysis should be expected to have a direct influence on progress.

Creation and Perturbation of Simulated Data
To determine the accuracy of G • -values obtained by either single or global fitting methods, a simulated data set was created. An RU max of 150 was selected along with K D -values of 2 and 100 µM for the wild-type and mutant interaction, respectively. Ten pMHC concentration points were selected ranging from 0.25 to 100 µM and response units calculated for the two TCR-pMHC interactions using a simple 1:1 binding model.
Simulated data was subjected to Gaussian-distributed noise ranging from 5 to 30% of the response values in 2.5% steps, applied to each data point. Error was added using the "white_noise" function in Origin 9.0. These noisy datasets were fit individually and globally to compare effects on G • -value and error. Averaging the reduced χ 2 -values from 25 experimental TCR-pMHC SPR fits allowed for an estimate of the noise typically seen in experimental data (amounting to 22% of the RU max ). To mimic this, the simulated data was perturbed by adding Gaussian-distributed noise ranging from 5 to 30% of the 150 RU max as shown in Table 1.
Additional simulated datasets were created to explore effects on affinity and the spread of G • . These sets included widening the spread between the high and low affinity interactions and using the RU max -and K D -values from an experimental dataset to develop a simulated set. This data was also subjected to both single and global fitting as well as noise and error propagation.

Fitting Data Using Individual and Global Fitting Methods
Datasets were fit using both individual and global fitting. For individual fitting, both RU max and G • were fitted parameters. For global fitting, RU max was a shared parameter between the high and low affinity interactions while G • -values were individually floated. For global fitting, both wild-type and mutant datasets were fit simultaneously using a custom function in Origin 9.0. Errors were calculated by standard error propagation using the fitting errors obtained from Origin. Student's t-tests were performed to ensure differences between individual and global fitting were statistically different. F-tests were used to quantify differences in distribution of G • -values and error.

Collection of Experimental Data and Perturbations
Twenty-five sets of TCR-pMHC binding experiments were utilized (Table S1). Ten of the datasets were from previously published work using double mutant cycles (Piepenbrink et al., 2013;Blevins et al., 2016), whereas 15 new datasets were collected. Pairs of these datasets were used for "wild-type" and "mutant" conditions to create G • -values. In addition to serving as primary data, these data were subjected to perturbations, including adding additional Gaussian-distributed noise and removing data points as shown in Table 1. For the eight data sets with G • differences outside the threshold of ±0.62 kcal/mol, averages and standard deviations for the whole set of 72 perturbed data sets were obtained by calculating the average G • -value and standard deviation for each experimental sample, and then averaging the eight resulting averages and standard deviations.

Surface Plasmon Resonance to Determine Binding Affinity
Soluble TCRs and pMHC complexes were produced from bacterially expressed inclusion bodies and purified as previously described . Experiments to determine binding affinity were performed as previously described unless otherwise noted . Briefly, steady state equilibrium binding experiments were performed on a Biacore 3000 instrument. Purified TCR was diluted in 10 mM sodium acetate pH 4.0 and immobilized on the surface of a CM5 chip to a density of ∼1000 RU. Purified pMHC was serially injected over the surface in concentrations ranging from 0.5 to 200 µM in duplicate. Experiments were carried out at 25 • C. Alternatively, pMHC was immobilized on the chip to densities of ∼2000 RU and TCR was serially injected over the surface in concentrations ranging from 0.5 to 300 µM. These experiments were performed at 10 • C. All data was first processed in BiaEvaluation 4.1 prior to fitting in Origin 9.0. Double-mutant cycles were performed analyzed as described previously (Piepenbrink et al., 2013;Blevins et al., 2016). Wild-type TCR and mutant TCR were immobilized on the surface of a CM5 chip. Soluble wild-type and mutant pMHC complexes were injected over the chip surface in a series of increasing concentration points until steady-state binding was attained. Each injection series was repeated twice. In any one experiment, three flow cells were used: a blank, wild-type TCR, and mutant TCR. The output of one single experiment thus consisted of two sets of all four measurements that comprise a double-mutant cycle. Data were processed in BiaEvaluation 4.1, and all eight datasets were simultaneously analyzed in Origin 9.0 using a custom function. The wild-type and mutant TCR surface densities and the four G • -values that make up the cycle were parameters in the global fitting function. Data sets that shared a common flow cell shared an RU maxvalue, with the K D -and G • -values being fit locally. Errors were propagated using standard statistical error propagation methods.

AUTHOR CONTRIBUTIONS
SB performed experiments and analyzed data. BB oversaw the project. Both authors wrote and edited the manuscript.

FUNDING
Supported by grant R35GM118166 from the National Institute of General Medical Sciences, National Institutes of Health.