Regulatory Approved Monoclonal Antibodies Contain Framework Mutations Predicted From Human Antibody Repertoires

Monoclonal antibodies (mAbs) are an important class of therapeutics used to treat cancer, inflammation, and infectious diseases. Identifying highly developable mAb sequences in silico could greatly reduce the time and cost required for therapeutic mAb development. Here, we present position-specific scoring matrices (PSSMs) for antibody framework mutations developed using baseline human antibody repertoire sequences. Our analysis shows that human antibody repertoire-based PSSMs are consistent across individuals and demonstrate high correlations between related germlines. We show that mutations in existing therapeutic antibodies can be accurately predicted solely from baseline human antibody sequence data. We find that mAbs developed using humanized mice had more human-like FR mutations than mAbs originally developed by hybridoma technology. A quantitative assessment of entire framework regions of therapeutic antibodies revealed that there may be potential for improving the properties of existing therapeutic antibodies by incorporating additional mutations of high frequency in baseline human antibody repertoires. In addition, high frequency mutations in baseline human antibody repertoires were predicted in silico to reduce immunogenicity in therapeutic mAbs due to the removal of T cell epitopes. Several therapeutic mAbs were identified to have common, universally high-scoring framework mutations, and molecular dynamics simulations revealed the mechanistic basis for the evolutionary selection of these mutations. Our results suggest that baseline human antibody repertoires may be useful as predictive tools to guide mAb development in the future.


INTRODUCTION
Monoclonal antibodies (mAbs) are now ubiquitous as therapeutics, with over $100 billion in sales worldwide in 2020 (1) and applications ranging from oncology (2) and inflammation (3) to infectious diseases (4). mAbs are engineered not only to have potent and specific binding to a given target but also to have favorable drug properties, including in vivo stability, manufacturability, immunogenicity, solubility, and polyspecificity (5). Identifying highly developable mAb sequences in silico could greatly reduce the time and costs of therapeutic mAb development.
Antibody sequences sourced from baseline human antibody repertoires could inform our ability to engineer therapeutic mAbs by 'borrowing' consensus mutations (6,7). This premise rests on the successful use of sequence conservation in protein engineering for improving the functional properties of enzymes (8)(9)(10), nanobodies (11), and membrane proteins (12). Antibodies in particular contain great potential for sequence optimization because every human body contains an estimated 10 11 B cells with highly diverse antibody sequences (13), providing a rich space from which to glean important insights that could be used to guide future engineering efforts.
Using sequence conservation for improving antibody properties was first explored by Steipe et al. (14), who used known antibody sequences from the Kabat Database (15) to identify consensus positions within mouse V K repertoires. Mutation to the amino acids at these consensus positions resulted in improved thermodynamic stability for the majority of the antibody sequences tested. However, the power of any sequence-based method relies on the size of the database. It is now possible to sequence tens of millions of antibodies from a single individual. Studies evaluating such human antibody repertoires have focused on cataloging the immune response to vaccination or infection (16)(17)(18)(19). Recently, the Great Repertoire Project conducted the most extensive attempt to sequence entire baseline human antibody repertoires to date, acquiring a total of 364 million antibody sequences by sequencing full Leukopaks from ten healthy, HIV-negative adults (20).
We revisited the idea that sequence conservation predicts developable antibody sequences using this much more comprehensive database of baseline human antibody sequences, applied towards the analysis of FDA-approved mAbs. Specifically, we sought to answer the following questions: (i.) are there mutations from germline (GL) sequences that are highly prevalent in baseline human antibody repertoires and if so, are these also found in FDAapproved mAbs, given the generally favorable developability properties of these mAbs?; and more broadly (ii.) can sequence information alone predict more developable from less developable mAbs? We restricted our analysis to the framework regions (FRs) of the variable heavy (V H ) domain as antibody FRs impact in vivo stability, solubility, and immunogenicity (6) while also contributing significantly less than complementarity determining regions (CDRs) for binding antigen. We also explored some of the dynamics of peptide-MHC-II interactions using computational binding predictions (21), as the MHC-II peptide epitope contained within antibodies and other protein drugs has been recognized as an important component of clinical success (22,23). As a result, sequence information for FR regions can be applied to a broad array of antibodies with varying applications.
In this study, we present position-specific substitution profiles (PSSM for position-specific scoring matrix) for antibody FR mutations using the most complete dataset of baseline antibody repertoire sequences to date (20). We show that antibody repertoire-based PSSMs are consistent across subjects and produce high correlations between GL VH genes with expected differences based on sequence similarity and familial relationships. Our analysis shows that mutations in existing therapeutic antibodies can be accurately predicted solely from repertoire sequence data. We then quantitatively assessed entire FRs of these therapeutic antibodies and compared them to their baseline human repertoire counterparts. These data suggest that there may be potential for improving existing therapeutic antibody properties through incorporation of additional mutations of high-frequency in human repertoires. In addition, we found that high frequency repertoire mutations tended to reduce the affinity of germline-encoded peptides that bound to MHC-II epitopes. Several therapeutic mAbs have common, universally high scoring FR mutations, and simulations revealed a mechanistic basis for the favorable drug properties of some mutations. Overall, our results suggest that human antibody repertoires are useful as predictive tools that will facilitate engineering mAbs by improving drug-like properties.

Selection of Human Repertoire Antibody Sequences
In our analysis, we considered only the FR regions of immunoglobulin G (IgG) V H antibody segments. IgM sequences are the other common isotype included in the database but were ignored because they typically have low levels of mutation due to their role in the early stages of the immune response (24). D and J segments were excluded since these segments are highly variable, which would inhibit our ability to achieve significant coverage of all possible mutations at these positions.
Out of the 51 total VH genes in the Great Repertoire Project database (20) (https://github.com/briney/grp_paper) (data accessed November 2020), we analyzed 25 VH genes that represented inferred precursors for the human/humanized FDAapproved mAbs that we were able to collect sequences for. Additionally, these 25 VH genes represent many of the most common VH genes in this dataset. A minimum of 100,000 repertoire IgG variable heavy segments were extracted from the database and analyzed for each GL. The cutoff value of 100,000 was deemed necessary for the creation of reliable scoring matrices and was determined by a random subsampling analysis where score differences were compared across multiple sampling depths ( Figure S1). It should be noted that the resulting compiled sequences span several different alleles within each GL gene, though these differences are later reconciled by adjusting the scoring matrices to remove this bias, as described in detail in a subsequent section. All GL VH genes excluded from this analysis are in Table S1. GL VH genes were excluded either due to low sequence counts or if no FDA-approved mAb precursors were represented in the panel.

Selection of FDA-Approved Antibody Sequences
FDA-approved antibodies included in the analysis were selected according to three criteria: (i.) the sequences had to be human/ humanized antibodies (name ending in -umab), (ii.) the sequences had to be available in the DrugBank database (25) at the time of sequence collection (accessed October 28th 2020), and (iii.) the Great Repertoire Project database (20) had to contain more than 100,000 sequences from the same inferred GL gene. It is critical that the analyzed antibody is human/ humanized since this allows for the antibody to be matched to human GL genes.
In addition, to determine whether we can effectively identify developability issues in antibodies, we selected sequences for a second panel of engineered antibodies that were determined to have biophysical properties that fall outside of historically accepted limits. We used data from Jain et al. (5), which identified 12 biophysical properties that were indicative of an antibody's likelihood of progressing through all three stages of FDA approval. From this dataset, we selected all antibodies that were found to have two or more biophysical properties that fell outside of the described acceptable ranges. In addition, the antibodies had to be human/humanized and the Great Repertoire Project database had to contain at least 100,000 sequences in the repertoire database from the same inferred GL gene. IMGT BlastSearch v1.2.0 (26) was used to infer the GL gene for the V H sequence of each of the analyzed FDA-approved antibodies, and the hit with the lowest E value was assigned as the GL V H gene. As described below, position-specific scoring matrices (PSSMs) for each of the 25 GL genes were then constructed ( Table S1).

Generation of Position-Specific Scoring Matrices for Human VH Genes
A multiple sequence alignment was performed on each set of sequences derived from a common GL gene using a lightweight version of the MAFFT algorithm (27) designed for large numbers of similar sequences and using the FFT-NS-2 progressive alignment method in low-memory mode with the gap opening penalty set to max (5.0). All other alignment parameters were set to default. Each alignment file was collapsed into a single table of mutational counts by tallying the number of observations of each amino acid at each position in the sequence. This count output was modified to remove instances of insertions, indicated by positions that contained under 10% occupancy (i.e., more than 90% gaps). In addition, individual counts of wild-type (WT) and allelic variants were removed (Table S4). Amino acid substitutions with zero counts were replaced with a pseudocount of one to circumvent an undefined score (acting as a lower bound for scores). Then, the tables were manually aligned to IMGT human GL V H FR reference sequences (26) to associate each column of the count table with a real position in the FR sequence.
The counts for all amino acid substitutions within the FR regions were log transformed into a score via the following equation: Here S ij is the score of amino acid substitution j at position i and a is a V H -specific constant (Range: 726-2228, s.d.: 320) that centers the mean of the score distribution for V H -specific FR mutations at zero. For comparisons across GL families, this a normalization term results in a max difference of 4.9 between GL with a difference of <1.0 at 1 s.d. N ij is the count of amino acid substitution j at position i, the sum of which, over all amino acids from alanine (A) to tyrosine (Y), is equal to the total number of sequences without gaps in the multiple sequence alignment at that position. We used a log base of 1.26 to adhere to previous convention used by Dayhoff et al. for log-odds matrices (28). PSSMs for the 25 analyzed GL VH genes can be found in Data S1. This same protocol was then applied to the FDAapproved antibody sequences, which were first aligned to their corresponding GL sequences and then nonsynonymous FR mutations and their respective scores were recorded.

Generation of Framework Scores for Individual Antibody Sequences
We then derived an overall FR region score that we term an "FR score", a metric which can be used to compare sequences of antibodies with different numbers of mutations across different GL VH genes. We assumed that each FR mutation has an additive effect and is independent of one another (no epistatic interactions). With this assumption, the FR score is then defined as: where S ijk is the score of the k th sequential FR mutation of amino acid substitution j at position i as determined by Eqn. 1. The sum of m (the number of FR mutations) mutation scores is normalized by constants c 0 and c 1 which are specific to that GL gene (Table S3), derived from a least squares regression. The denominator of the FR score is a normalization constant which gives an estimate of a typical score of a randomly sampled Ab from the repertoire with the same number of FR mutations ( Figure S6). The FR score as derived gives an estimation of how different an antibody's framework mutations are as compared to baseline human repertoire antibodies. As such, if an antibody has framework mutations with scores similar to what we would expect of a human antibody with an equivalent number of mutations, this ratio becomes one. Thus, we assign all antibodies with no framework mutations an FR score of one.

Prediction and Analysis of MHC-II Peptide Epitope Affinity
MHC-II peptide epitope affinity prediction was carried out for a set of 38 representative HLA-DRB1 alleles (21) using netMHCIIpan 4.0 (29). Predictions were performed using 15-mer peptide scans for each peptide containing mutations with an associated PSSM, and the netMHCIIpan flag -BA was selected to obtain predicted MHC-II binding affinities.
Predictions were also carried out for germline (GL) peptides using the same settings, for the 38 representative HLA-DRB1 alleles. Next, netMHCIIpan predictions for FDA-approved mAb peptides were matched to GL peptides and their own binding predictions. To track affinity changes from high affinity GL peptides, peptides matching a germline peptide with peptide: MHC-II K D < 1,000 nM were assigned to three different bins based on the ratio of (mutated peptide K D /GL peptide K D ). This was done for K D s predicted for the same allele. Three bins were created: K D fold-change < 0.5, 0.5 ≤ K D fold-change ≤ 2, and K D fold-change > 2. We also considered the total set of peptides matching any high affinity GL peptides (K D < 1,000 nM), and the total set of peptides matching any low affinity GL peptide (K D > 1,000 nM). Next, PSSM values were averaged by DRB1 allele and K D group using standard mean. This was done to reduce redundancy, since the same peptide can have different affinities for different HLA-DRB1 alleles and thus be in different K D groups at the same time. As a result, average PSSMs per DRB1 and K D group take into account peptide contributions based on differential DRB1 binding. Next, PSSM distributions were compared between K D bins using a Welch Two Sample t-test, and p-values were adjusted for multiple comparisons using the Benjamini-Hochberg method. All data processing steps were carried out in R.

Molecular Dynamics Simulations
Molecular dynamics (MD) simulations were carried out using the GROMACS 2021 (30, 31) MD engine and the TIP3P (32) and AMBER99SB-ILDN (33) force fields to explicitly model water and the antibodies, respectively. Simulations were initiated from crystal structures of the following antibodies: Atezolizumab (PDB code 5X8L) (34), Daratumumab (PDB code 7DUN) (35), and Omalizumab (PDB code 4X7S) (36), with all antigens/ ligands removed beforehand. Additional simulations were performed with a single germline reversion mutation incorporated into each mAb sequence (A54S for Atezolizumab and Omalizumab, and F103Y for Daratumumab), using Visual Molecular Dynamics (VMD) (37). Counterions in the form of Na + or Cl -, also described by the AMBER99SB-ILDN force field, were added to the systems as needed to neutralize any net charge. Each system contained between 67,000 and 94,000 atoms. The initial coordinates for each system were minimized for a maximum of 5,000 steps using a steepest descent energy minimization. This was followed by 0.5 ns each of NVT and then NPT equilibration simulations performed at 310 K using the Bussi−Donadio−Parrinello thermostat (38), and at 1.0 bar using the Berendsen barostat (39) (same temperature and thermostat) for the NPT simulations. The time constant for coupling in both the NVT and NPT simulations was 0.1 ps. NPT production simulations were then performed at the same temperature and pressure and using the same thermostat and Parrinello-Rahman barostat (40). Particle mesh Ewald (PME) summations (41) were used to calculate long-range electrostatic interactions with a cutoff of 1.0 nm, and Lennard Jones interactions were calculated over 1.0 nm and shifted beyond this distance. Neighbor lists were updated every 10 steps with a cutoff of 1.0 nm and all simulations utilized full periodic boundary conditions. Production simulations were carried out for 0.3 ms each, for a total of 1.8 ms across the six simulations.
Standard GROMACS tools were used to compute the root mean squared deviation (RMSD) of the antibodies from their respective energy-minimized structures as a function of simulation time. For all RMSD plots, data was plotted every 10 th frame. We also computed the root mean squared fluctuation (RMSF) of the heavy chains of the antibodies over the last half (150 ns) of the simulations, during which time all six simulations were deemed converged based on RMSD. All simulation images were rendered in VMD.

Statistical Calculations
Unless otherwise noted, all p-values were calculated using Welch's t-test. Correlations were analyzed using a Pearson product-moment correlation coefficient. Hierarchical clustering for GL comparisons was performed using SciPy in Python (www.scipy.org) using the unweighted pair group method with arithmetic mean (UPGMA).

Generation of Unbiased Framework PSSMs From Antibody Repertoires
We created an efficient method to score the set of individual FR mutations in an FDA-approved mAb, which is shown schematically in Figure 1 and is carried out as follows. First, we identified the set of human or humanized FDA-approved mAbs for which sequences were available in the DrugBank database (25). From this list, we used IMGT BlastSearch (26) to infer a GL V H gene for each mAb (Table S1). We restricted our analysis to the FR positions of V H only (IMGT Numbering FR16/24/25-26 depending on the V H ; FR39-55; FR66-104), as the Great Repertoire Project reported only heavy chain sequences. Additionally, we considered only IgG sequences, as IgM sequences typically have low levels of mutation due to their role in the early stages of the immune response (23).
The collected IgG sequences from the Great Repertoire Project come from the eight individuals available on GitHub (accessed November 2020); our analysis of this dataset shows that the extent to which a B cell population is mutated varies widely across VH genes and individuals (Tables S2 and S3, respectively). Our expectation is that this dataset is composed of a minimally-biased representation of the baseline human antibody repertoire that includes both naïve and mature B cell sequences contained in the peripheral circulation. This expectation is informed by an analysis of the dataset, which reveals an average mutation frequency per position of 9.8%. Additionally, we analyzed the entire set of IgG sequences for synonymous (dN) and synonymous (dS) substitutions. Taking the ratio of the total number of substitutions for each of these we observe an overall dN/dS ratio of 2.87, indicating that these sequences have undergone targeted mutagenesis and selection. Further evidence for mature B cell sequences exposed to antigen is found in comparison of mutation rates for IgG vs. IgM sequences. Of the collected IgG sequences, 17.4% were found to have >98% nucleotide identity with their corresponding germline precursor and 3.3% are unmutated. These numbers are considerably lower than those of the collected IgM sequences, for which 54.6% were found to have >98% nucleotide identity with their corresponding germline precursor and 14.0% are unmutated (Table S3). Another piece of evidence for this repertoire including B cells exposed to antigen result from examination of the mutational rate of the CDRs relative to FR. The frequency of mutation by position is plotted in Figure S2 and shows an average mutation frequency of 18.8% in the CDRs versus 7.4% in the FRs.
We aligned a set of relevant IgG sequences by multiple sequence alignment (27) in order to generate a FR positionspecific scoring matrix (PSSM) for each GL V H gene considered (see Methods). For each FR position, we counted the number of sequences observed to have a mutation at this position and logtransformed this frequency to a score term. This scoring term has one adjustable parameter per GL V H gene such that the mean score for a randomly selected mutation would be zero. According to how we defined this scoring term (see Methods), higher scores represent more commonly observed amino acid mutations and the gross majority of scores range between -10 and 10.
One complication arising from statistical analysis of antibody mutations is that humans possess significant allelic diversity in V H genes, even in FR regions. For example, consider V H 3-9 which has a T99M FR allelic variation (Table S4). In generating a PSSM at position 99, one could identify the appropriate allele for each patient (either T or M) and then generate separate PSSMs for each allelic variation. Alternatively, we chose the simpler approach of removing all allelic variants from our PSSM since these only affect a small minority of possible substitutions.
The actual number of IgG GL V H -specific sequences in the Great Repertoire Project database varies tremendously, from around 2,500 sequences (V H 1-45) to more than 2 million sequences (V H 3-7). Given this range, we asked what number of sequences would be sufficient for accurate recapitulation of GL PSSMs. We chose to perform this analysis on V H 5-51 by subsampling between 1,000 and 300,000 sequences. For each mutation, we computed the absolute difference between the score derived from the full dataset (~700,000 sequences) and the score from the subsample. Increasing the number of sequences from 1,000 to 25,000 led to increasing agreement with the full dataset, with essentially no difference observed in the median displacement at or above 25,000 sequences for the large majority of scores (i.e., scores greater than -15) ( Figure S1). However, minor differences in displacement at 2s (95%) were observed between 25,000 and 100,000 sequences. Thus, we conservatively restricted our analysis of GL V H segments to those with 100,000 or more IgG sequences present in the Great Repertoire Project database. The GL VH genes analyzed and resulting sequence counts are denoted in Table S1.
We then asked whether mutation frequencies would be highly correlated between individuals and if the site-specific preferences would be repeatable. To test this, we generated patient-specific V H 3-23 PSSMs for all 8 individuals for which data was available which contained the same V H 3-23*01 allele. The V H 3-23 GL was selected to ensure that a sufficient number of sequences was available for generating these PSSMs. We observed a high degree of correlation between patient scores with correlation coefficients between 0.86 and 0.91 for all pairwise comparisons ( Figure 2). We then repeated our analysis considering only amino acid mutations encoded by 1-nucleotide (nt) substitution, as these points contain the highest sequence coverage in our dataset and, therefore, are presumably most accurate. The results were essentially unchanged, with pairwise correlation coefficients between individuals ranging between 0.84 and 0.92 ( Figure S3A). These correlation coefficients are only slightly below the basal noise level (0.93-0.94) calculated by computing the correlation coefficient between two independent random samples for each of two patients ( Figure S3B). These results show that GL V H -specific FR mutations are largely repeatable between individuals. This finding is not new, as Sheng et al. showed in their analysis that affinity maturation somatic hypermutation (SHM) seems to produce highly consistent substitutions for antibody V segments across different donors, even in the absence of functional selection (16). However, our independent analysis on a unique dataset corroborates their results, which together suggest underlying biophysical mechanisms are responsible for the reproducibility of the selection of certain amino acid mutations at different FR positions. These results strongly suggest that the process of affinity maturation results in replicable frequency distributions of FR mutations at the amino acid level.
Such replicable frequencies can arise if a given FR mutation is selected during affinity maturation for a general, antigenindependent reason. For example, a given FR mutation may improve the surface density of B cell receptors or could remove an aggregation-prone patch on the surface of the protein. An alternative but non-exclusive explanation is as follows. Sequences from baseline human antibody repertoires tend to be enriched in mutations which can be generated by a single nucleotide substitution, while mutations that would require di-or trinucleotide substitutions tend to occur less frequently. This is because activation-induced cytidine deaminase (AID) primarily generates single nucleotide point mutations in B cell receptor sequences during affinity maturation (42), and as this mutational frequency is~10 -3 per base per generation (43), it is far less probable that the same codon will be targeted twice or thrice than once. Consistent with this notion, analysis of our PSSMs shows that FR mutation scores that are one nucleotide away from the GL sequence are significantly higher (mean score of 8.42) than those requiring two or three substitutions (mean scores of -2.37 and -8.18, respectively) ( Figure S4).

FDA-Approved mAbs Contain Framework Mutations That Can Be Predicted From Human Antibody Repertoires
We asked whether FR mutations in FDA-approved mAbs could be predicted from native antibody repertoires. 39 FDA-approved human/humanized mAbs were found to have sufficient depth of coverage (i.e., > 100,000 sequences) in their GL V H family in the Great Repertoire Project database and were thus subjected to further analysis ( Table 1). While the precise developmental pathways for all of these FDA-approved mAbs are not publicly available, it is known that most of the mAbs were discovered using either hybridoma technology, transgenic mice, or phage display.
On average, the 39 FDA-approved mAbs have a mean of six V H FR mutations, and a range from zero to 16 FR mutations. The degree to which these mutations might align with (and thus may be predicted by) those found in human antibody repertoires is difficult to know a priori. On the one hand, the use of phage display or transgenic mice could result in a sufficiently different selection environment than what exists during affinity maturation in human germinal centers, leading to an altogether different set of FR mutations than are evolved in human antibody repertoires. On the other hand, the clinical approval process eliminates any antibodies that may have been selected ineffectively for drug-like properties and failed in manufacturing or pre-clinical and clinical trials. The affinity maturation process may indirectly select for some of the same qualities of antibodies that have gone through regulatory approval, including high expression yield, thermodynamic stability, low non-specific binding, little to no aggregation propensity, and low immunogenicity (44,45). In this latter case, FR mutations in FDA-approved mAbs could indeed be predicted by baseline antibody repertoire abundance.
To answer this question, we used our GL V H -specific PSSMs to score each FR mutation contained in the FDA-approved mAbs. Strikingly, every one of the 16 GL V H families that form the set of inferred precursors for these mAbs showed significantly higher (at 95% confidence level) FR mutation scores for FDA-approved mAbs relative to the set of all possible mutation scores ( Figures 3A, B, and S5, Table 1). The statistical significance of this result ranged from a p-value of 1.8e-5 (V H 1-69) to 7.6e-8 (V H 3-66) ( Figure 3B). However, the distributions in Figure 3B of the set of all possible mutation scores do not reflect the number of repertoire antibody FR mutations with given scores. Indeed, generating random samples of scores for 1,000 repertoire antibody sequences from each of these distributions shows that the majority of observed antibody mutation scores in human repertoires are above 10 ( Figure 3C). Plotting the individual mutation scores for the FDA-approved mAbs on top of these new distributions, we see that the distributions match much more closely, with scores of the FDA-approved mAbs now slightly below those of the baseline antibody repertoires ( Figure 3C). The slightly lower scores of the FDA-approved mAbs largely result from the fact that in each case, there are at least two mutations below a score of zero, and for the entire dataset of FDA-approved FR mutations 10.3% had a score below zero. These results suggest that while repertoire datasets can predict mutations in mAb V H FR regions with reasonable accuracy, as indicated by the large overlap in the distributions, there is also considerable room for engineering mAb sequences to mimic baseline human repertoire antibodies with increased fidelity.

FDA-Approved mAbs Have Lower FR Scores Than Typical Repertoire Antibodies
Motivated by the previous results, we wondered how the cumulative effect of all FR mutations for a given FDAapproved mAb compares to that of baseline repertoire antibodies. To assess this, we assumed, for simplicity, that individual mutations are independent, such that the cumulative effect of FR mutations is best represented as a sum of individual mutation scores. Due to the V H -specific centering parameter included in the score function, individual mutation scores cannot be directly compared across V H families. To account for these differences, the score sum was normalized by a regression function which estimates the typical score sum of a randomly sampled repertoire antibody from the same V H family with the same number of FR mutations ( Figure S6 and Table S5). The resulting "FR score" is defined such that a score of one represents an antibody that has FR mutations similar to that of a typical repertoire antibody framework sequence (see Methods). Antibody sequences with no FR mutations were assigned a pseudo-score of one since the calculated FR score is otherwise undefined. We then compared FDA-approved mAbs to repertoire antibodies using FR scores. A simple random sample with replacement of 1,000 repertoire antibody sequences across all GL genes was generated and an FR score for each sampled antibody was computed (Data S2). As expected, FR scores for the repertoire antibodies center at a value of one ( Figure S7). Next, FR scores were calculated for all FDA-approved mAbs used in the analysis above and the results are plotted along with the repertoire averages in Figure 4A as a function of the number of FR mutations. All individual mutation scores and FR scores for FDA-approved mAbs are tabulated in Table S6.
Thirty of the 39 FDA-approved mAbs have FR scores below the baseline repertoire antibody average of one (p-value 1.4e-4). Figure 4A highlights the five mAbs with the highest and lowest FR scores. Denosumab and Dupilumab are the two highest scoring FDA-approved mAbs with FR scores of 1.45 and 1.23, respectively. Both mAbs have a small set of apolar/polar FR mutations (A55G, A25G, S40T, A55S), and interestingly, both mAbs share the same V H 3-23 GL and were produced from transgenic mice. Nivolumab is the lowest scoring FDA-approved mAb with an FR score of 0.12 and only one mutation requiring a 2-nt change from the GL sequence (A24K: score=2.0). Emicizumab and Pertuzumab, both from the V H 3-23 GL and both developed using hybridoma technology, also have low FR scores of 0.32 and 0.26, respectively. However, the low scores of these mAbs are only somewhat explained by the fact that 14 mutations (out of 17 total) are more than 1-nt away from the GL codon. The solved complex of Pertuzumab and human epidermal growth factor receptor 2 (ErbB2 or HER2) (46) partially reveals the source of Pertuzumab's low FR score. The complex is unusual in that FR positions surrounding CDRH2 are directly contacting ErbB2, including mutations for Y66I, A68N, D69Q, and N82R. Thus, rare FR mutations are necessary in this specific mAb for Erb2 recognition.
As shown above, the methods used to develop mAbs can influence the selective pressure and, therefore, the mutations that are incorporated into therapeutic mAbs. On average, we find that the number of FR mutations in FDA-approved mAbs discovered by hybridoma technology is higher than in FDA-approved mAbs discovered by transgenic mice or phage display (p-value ≤4.6e-5) ( Figure S8). Even though FR scores are independent of number of FR mutations, we find that mAbs developed using hybridoma technology scored significantly lower than transgenic mice (p-value 5.5e-3), although there is not a statistically significant difference between mAbs developed via hybridoma versus phage display (p-value 0.13) ( Figure 4B). Phage display and transgenic mice technologies produce mAb FRs with similar FR scores (p-value 0.29). Thus, for existing mAbs, those produced by transgenic mice are the most representative of the baseline human antibody repertoire in terms of their FR mutations. We next questioned whether FR scores could be used to diagnose mAb developability issues. Collecting an appropriate dataset for this analysis is difficult, as there are few databases of therapeutic antibodies with developability issues available in the publicly accessible literature. As a proxy, we chose to use the set of "flagged" antibodies originally described in Jain et al. (5). Jain et al. examined 12 different biophysical properties of a set of mAbs in various stages of clinical trials, with outliers in any of these properties considered "flags". Two or more of these flags identified a mAb as a developability risk. We compiled all human/humanized mAb sequences from this analysis that contained at least two flagged properties and sufficient repertoire sequence data (39 in total). It should be noted that five of the flagged Abs that we analyzed have since received FDA approval and are therefore also contained within the FDAapproved dataset. FR scores were calculated for flagged mAbs and compared to our previously described panel of FDAapproved mAbs ( Figures 4C, S9 and Table S7). Six of the flagged mAbs contained no FR mutations. As with the FDAapproved mAbs, we see that flagged mAbs also contain  significantly lower FR scores than a population of randomly selected repertoire Abs (p-value 7.6e-3). However, there is no statistically significant difference between the distribution means (p-value 0.37) for regulatory-approved vs. flagged mAbs. Thus, our FR score is too coarse-grained to differentiate between antibody groups at such a late stage of development.

Biophysical Basis of Highly Prevalent FR Mutations
Why is the frequency of a given FR mutation largely repeatable between individuals? It could be the case that AID preferentially encodes given FR mutations due to sequence hotspots (42,47), with minimal selection of the amino acid sequence. Alternatively, there may be underlying biophysical bases for the selection of certain sequences. However, uncovering the rationale for every FR mutation is challenging as any individual mutation is pleiotropic, thus impacting many qualities in vivo including aggregation propensity, proteolytic stability, B cell receptor expression, antigen-binding affinity, and polyspecificity. Beyond these traits, it has also been hypothesized that engineering antibodies with FR regions derived from GL genepreferred substitution patterns would result in lower immunogenicity as these antibodies would more closely reflect native antibody responses, which are readily recognized by the immune system (6,21). As an initial effort, we decided to focus on two properties which could reasonably explain the molecular basis behind these high scores: stability and MHC-II peptide epitope content. These biophysical properties can be addressed sufficiently using in vitro biochemical assays or in silico.
Other research groups have published the effects of FR mutations on stability and heterologous expression yields ( Table 2) (48)(49)(50)(51). Across these four studies, thermodynamic stability was measured (as DDG, DT 50 , DT m ) for nine point mutants. All seven point mutants that required a 1-nt substitution and had an individual mutation score above zero. These seven showed neutral or improved stability. Additionally, the two mutants with the highest mutation scores (V H 1-69: V76L; V H 6-1: S74G) also showed the highest thermodynamic stability (measured as DT 50 ). Together, these results hint at a relationship between stability and repertoire abundance of a given FR mutation, although the small size of this dataset does not allow for sufficient statistical power to definitively draw a link between these factors.
FR mutations could also be selected to decrease the MHC-II epitope content of a given antibody sequence. Previous research has shown that unmutated antibody sequences have germlineencoded peptides that bind to and are presented by MHC-II proteins. These peptide epitopes are targeted for removal by SHM during B cell development in vivo (21). We sought to determine whether individual FR PSSM scores calculated here correlate with differential peptide:MHC-II binding affinities. To address this question, MHC-II binding affinities (K D ) for all 15-mer peptides in a sliding window (i.e., fragments of 15 consecutive amino acids) sourced from the 39 FDA approved mAbs were calculated using a custom computational pipeline (see Methods). For each peptide containing FR mutations, mutation scores were binned by the K D fold-change from the corresponding unmutated peptide's K D . Interestingly, peptides with > 2 K D fold-change from their unmutated germline peptides showed higher PSSM scores than mutations with < 0.5 K D foldchange, and these data were statistically significant. This analysis suggests that prevalent FR mutations are more likely to decrease the affinity of mAb peptide fragments for the MHC-II groove than to increase the peptide:MHC-II affinity at a given mutation site; these data were consistent with prior reports of human repertoire antibodies using paired heavy and light chain human sequence datasets (21) ( Figure 5). As expected, mutations with moderate or no change in binding (0.5 ≤ K D fold change ≤ 2) showed the highest PSSM scores across all groups because the majority of mutations do not affect the~5-8 heavy chain variable region MHC-II peptide epitopes for a given MHC-II gene. Strikingly, we found that the high-affinity MHC-II peptide epitopes showed higher PSSM scores than low-affinity MHC-II peptide epitopes, and the differences were statistically significant. Together, these data demonstrate that high-affinity MHC-II peptide epitopes inside the antibody heavy chain variable Mean DT m was calculated over all genotypes presented in Table 3 in ref (48). *Yield is calculated as a relative percentage to the original antibody sequence. a Antibody 2C2.  region are preferentially targeted for mutations, in both human repertoires and in clinically approved mAbs ( Figure 5).

Position-Specific Framework Mutation Scores Are Highly Correlated Between Germline Families
It has been established that affinity maturation produces antibodies that have GL-specific substitution patterns (16). What is the exact correlation of substitutions between GL V H genes? To address this question, we calculated the Pearson correlation coefficient between each pair of GL PSSMs ( Figure 6). Overall, correlation coefficients were quite high between GL PSSMs, ranging from 0.52 to 0.93. Hierarchical clustering was performed, revealing that correlations within subfamilies were higher than those between subfamilies. For example, all of the V H 1 GL members clustered tightly together, as did those in V H 3, and in V H 4. The resulting clusters may, to an extent, be reflective of sequence similarity: GL members within a subfamily will have a higher pairwise amino acid identity than between subfamilies, which could skew the results. To account for this, we also calculated Pearson correlation coefficients only between GL PSSMs for which a common DNA codon is shared ( Figure S10). As expected, higher correlations were observed between subfamilies than before, but the overall clustering within subfamilies was conserved. Thus, FR mutations are largely, though not wholly, repeatable between GL families and the correlation is stronger within subfamilies than between subfamilies.

Certain Highly Prevalent FR Mutations Are Observed Across Many Germline Families
We next looked at 'universal' FR mutations, or highly abundant mutations that are seen across all subfamilies considered. 18 mutations at 14 FR positions were enriched across all GL members scanned in this work (Table S8). Overall, 'universal' mutations were largely chemically similar and/or polar substitutions at positions predominantly on the protein surface and distal to the CDRs. 'Universal' mutations also tended to be the WT residue for at least some of the GL families. For example, serine at FR85 is highly prevalent across GL families and is the FIGURE 5 | Position-Specific Scoring Matrix (PSSM) distributions for peptide mutations present in FDA-approved mAbs, binned by peptide: MHC-II KD changes. Peptides were matched to V-gene germline peptides after peptide K D prediction. For each group, the mean PSSM scores for each DRB1 allele with median and interquartile ranges are shown. Peptides mutated from high affinity germline peptides (K D < 1,000 nM) were grouped based on K D fold-change from germline. The complete set of high affinity and low affinity peptides (K D < 1,000 nM) is also shown. Differences between groups was determined by a Welch Two Sample t-test, and p-values were adjusted for multiple comparisons. A total of 3,135 peptides were analyzed. ***p-value < 0.001 for comparisons between all groups, except when noted. **p-value < 0.01. WT residue for several V H GL families (V H 1-2, V H 1-3, V H 1-46, V H 1-69, V H 5-51). We noticed a specific mutation (S54A) that shows up in several FDA-approved mAbs (Pertuzumab, Atezolizumab, Omalizumab, and Trastuzumab) with high scores across multiple GL V H s (V H 3-23 and V H 3-66 with scores of 20 and 17, respectively). This mutation, which was observed at a frequency of 8.1% in the analyzed VH3-23 sequences, has also been observed at similar frequencies in another antibody evolution study (19). Figure 7A shows the predicted scores for the mutation of residue 54 to five different amino acids (A, C, G, S, and T) across all 25 GL VH genes studied. We observe that this specific mutation also shows up in multiple other GL V H s for which there is no corresponding FDAapproved mAb (V H 3-20, V H 3-21, V H 3-48, V H 3-74, and V H 3-9, with scores ranging from 20-21). To understand the mechanistic basis for S54A selection, we analyzed the four FDA-approved mAbs listed above with this S54A mutation. To test for increased global stability of the mAbs, we utilized the online protein stability prediction algorithm PremPS (52) to determine the impact of reverting the alanine at position 54 in each of the four mAbs back to its identity of a serine in the respective germline sequences ( Figure 7B). The A54S reversion mutation was predicted by PremPS to decrease mAb stability (calculated as DDG) for all four mAbs, indicating the forward mutation (S54A) in the mature FDA-approved sequences was predicted to be stabilizing. In addition, we see on average a 1.9-fold reduction in peptide: MHC-II affinity for peptides from FDA-approved mAbs with this mutation. Thus, in silico methods predict that S54A reduces the potential for CD4+ T cell immunogenicity and increases global stability in these four mAbs.
To gain a more mechanistic picture of how the S54A mutation mediates stability in the mature FDA-approved mAbs, we performed molecular dynamics (MD) simulations of Atezolizumab both in its mature form (e.g., with A54) and with the germline-reverted mutation (e.g., with S54). Production simulations were performed for 0.28 s in the NPT ensemble, using the GROMACS-2021 (30,31) simulation engine and the AMBER99SB-ILDN (33) and TIP3P (32) force fields to describe the mAbs and solvent, respectively (see Methods). Figure 7C shows the root mean square fluctuation (RMSD) of the mature and mutant forms of Atezolizumab from the respective energyminimized crystal structures as a function of simulation time. The results show the RMSD of the mature mAb was consistently lower than that of the mutant mAb, indicating increased stability of the mature mAb due to the S54A mutation, in line with both our mutation scores and PremPS results.
Increased stability due to the S54A mutation is further supported by the higher root mean square fluctuation (RMSF) of residues in the mutant versus mature mAb during the simulations ( Figure 7D). In particular, we observe a distinct region in the V H where residues in the mutant mAb experienced increased flexibility (decreased stability) during the simulations, namely IMGT-aligned V H residues 26-72. This region corresponds to a -sheet on which residue 54 is situated ( Figure 7E), indicating the observed changes in RMSF are directly attributable to this mutation. In the mature mAb simulation, we observe A54 to be optimally situated within a hydrophobic pocket formed by the inward-facing side chains of surrounding residues ( Figure 7F, top). Favorable binding interactions between the methyl group on A54 with these hydrophobic side chains during the simulation results in relatively small fluctuations of nearby loops ( Figure 7F, bottom), which includes the CDRH2 loop that is often in direct contact with antigens. Conversely, substituting a serine into this hydrophobic pocket leads to S54 having an unsaturated hydrogen bond. The polar side chain of S54 is observed to rotate freely during the simulation, forming a hydrogen bond only transiently with a nearby residue ( Figure 7G, top) and causing increased fluctuations in the nearby loops ( Figure 7G, bottom). The decreased motion of the CDRH2 region due to the S54A mutation is consistent with previous studies showing Ab rigidification may selectively occur during affinity maturation for certain GL families upon binding antigen (53). Future computational studies could test this directly through MD simulations of FDA-approved mAbs in complex with their cognate antigens. Figure S11 demonstrates this mechanism of stability due to the S54A FR mutation is the same for V H 3-66-based mAbs as well, underscoring the 'universal' nature of this mutation.
We observed a second mutation (Y103F) with even higher prevalence among human repertoire antibodies. Figure 8A shows the predicted scores for the mutation across residues with similar hydrophobic side chains and/or aromatic structures (F, H, M, W and Y) for all of the 25 GL VH genes studied. Our results show that this mutation broadly occurs across many of the GL VH genes, with high scores ranging between 18 and 22. In addition, it should be noted that 103F is present in several mouse germlines (VH1, VH9, VH11, VH13) (26), providing evidence that this mutation should be structurally tolerated particularly for mousederived antibodies. Our collected data gives a frequency of 12% for the Y103F mutation, which is consistent with the~15% frequency found in a prior antibody evolution study (19).
To determine the impact of the Y103F mutation on mAb stability, we analyzed the single FDA-approved mAb (Daratumumab) containing this mutation, as well as three other mAbs identified through an IMGT BlastSearch to have this mutation. The three non-FDA-approved mAbs included the anti-influenza FluA-20 Fab from the V H 4-61 GL family, the anti-HIV 17b Fab from the V H 1-69 GL family, and the anti-malarial 092096 Fab from the V H 1-24 GL. As with the S54A mutation, we first utilized PremPS to assess the change in mAb stability upon reversion of the phenylalanine at position 103 in the mature sequences of these mAbs to its identity of a tyrosine in all of the respective germlines. Figure 8B shows that this mutation, in contrast with the S54A mutation, is predicted to be destabilizing to all four of the analyzed mAbs. This prompted us to explore this mutation further: why would an antibody repeatably select a destabilizing mutation during affinity maturation, particularly one far away from the binding site?
To address this question, we performed MD simulations of Daratumumab both with and without the Y103F mutation. Production simulations were carried out as described earlier for the simulations with Atezolizumab and Omalizumab (see also Methods). Over the converged, second half of the simulation, we observe the RMSD of the mature antibody to be consistently higher than that of the mutant mAb ( Figure 8C), suggesting this mutation is indeed destabilizing. This result aligns with our PremPS results but still contradicts our mutation scores.
Decreased mAb stability due to the Y103F mutation is also evidenced by the decreased RMSF values of residues in the mutant versus mature mAb, averaged over the converged portion of the simulations ( Figure 8D). The largest RMSF differences are observed for IMGT-aligned V H residues 19-80. This region corresponds to FR2, CDR2 and FR3 and directly contacts the Y103F mutation, which is situated between the V H and V L domains of the antibody Fab ( Figure 8E). In the simulation of the mutant mAb, the tyrosine is observed to form consistent hydrogen bonds with neighboring residues, whereas the phenylalanine in the simulation with the mature mAb is unable to do so ( Figures 8F, G, top). Consistent hydrogen bonds formed by Y103 essentially lock the tyrosine into place, leading to relatively small fluctuations in surrounding loops during the simulation ( Figure 8G, bottom). Conversely, large fluctuations in neighboring loops are observed in the simulation with the mature mAb ( Figure 8F, bottom).
Overall, our simulations support the PremPS result that Y103F destabilizes the structure of Daratumumab, which indicates that the effects of this particular mutation may be much more context specific. However, due to the mutation's unique location at the intersection between the V H and V L domains of the antibody, we hypothesize that Y103F may facilitate a favorable conformational transition of the antibody upon antigen binding. This could explain the high prevalence with which this mutation is observed in baseline human antibody repertoires, but would require data on the light chain sequences of the antibodies to make more firm conclusions. Regardless, the negative impact of the Y103F mutation on the stability of Daratumumab suggests it may be important to consider a wide variety of mutations -whether frequently observed or not -when analyzing or engineering new therapeutic antibodies.

DISCUSSION
In this work, we generated PSSMs using repertoire sequences from 25 GL genes. We used these scores to assess framework mutations of FDA-approved therapeutic antibodies and to determine the degree of similarity with human antibody sequences. We found that mutations in FDA-approved antibodies are also common in human antibody repertoires.
To understand why high frequency repertoire mutations are observed, we analyzed various high-scoring mutations for stability effects, as we expect thermostability to be a top priority to select for in both baseline human repertoire antibodies as well as antibody therapeutics. Due to scant evidence in literature, we were unable to make firm conclusions about this claim. A comprehensive study of stability would be necessary to resolve this issue by directly testing many antibody variants in vitro. The potential for CD4+ T cell immunogenicity is another critical factor that should be selected for in baseline repertoire and human-engineered antibodies. Our work shows that increasing PSSM scores correlate with decreasing peptide:MHC-II binding affinity, signifying removal of CD4+ T cell (MHC-II) epitopes and potentially decreasing immunogenicity, a critical quality in antibody therapeutics. In future therapeutic mAb development, we speculate that repertoire-based PSSMs could be utilized as early warning signals that a sequence may have developability flags or induce an immune response.
The high scoring FR mutation S54A was evaluated in silico and by simulations, where we found clear evidence supporting the claim that S54A confers benefits to both stability and reduced MHC-II peptide epitope content. While we anticipate the factors investigated (thermostability and MHC-II peptide epitope content) to be highly correlated with relative repertoire abundance, there are clearly many other factors to discover. This is made clear by the fact that some mutations that we would expect to be detrimental to stability, like Y103F, seem to have high scores. This must be because they were selected for other benefits not considered in this study. We expect that future research surrounding human antibody repertoires will continue to illuminate the poorly understood selection criteria that are implemented in affinity maturation with regards to framework mutations.
Our work highlights several important findings of baseline human antibody repertoires and the dynamics of affinity maturation that produce them. We found that framework mutations are remarkably consistent across individuals, suggesting that affinity maturation is selecting mutations in a consistent manner. This finding indicates that it may be wellsuited for computational modeling, allowing for accurate prediction of the types and frequencies of framework mutations produced in affinity maturation. In addition, our results independently confirm the notion that mutation profiles are GLdependent. We showed that the correlations between mutation scores of germline family members are much higher than those across families. This analysis denotes that mutational frequencies are highly dependent on sequence, likely due to AID's intrinsic mutational biases. We look forward to future research aimed at determining the degree to which baseline human antibody repertoire mutations are dictated by the precursor local sequence environment.
Previous work on humanization of mouse Abs showed the importance of FR Vernier positions that scaffold and impact affinity of specific CDRs. We do find that such Vernier positions close to CDR1 and CDR2 have increased mutational frequency relative to other FR positions ( Figure S2); future work could identify the covariation of specific FR and CDR mutations, and we expect that similar refinements may identify FR mutations that are context-specific for individual CDRs or, in the case of Y103F, different light chains.
With the current high costs of and increasing demand for mAb therapeutics, methods that can infer sequence to function relationships are very valuable. We speculate that incorporating high scoring mutations into mAb sequences will further enhance drug-like properties for existing therapies and could significantly reduce development costs for future mAbs.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.