Understanding the Significance and Implications of Antibody Numbering and Antigen-Binding Surface/Residue Definition

Monoclonal antibodies are playing an increasing role in both human and animal health. Different strategies of protein and chemical engineering, including humanization techniques of non-human antibodies were applied successfully to optimize clinical performances of antibodies. Despite the emergence of techniques allowing the development of fully human antibodies such as transgenic Xeno-mice, antibody humanization remains a standard procedure for therapeutic antibodies. An important prerequisite for antibody humanization requires standardized numbering methods to define precisely complementary determining regions (CDR), frameworks and residues from the light and heavy chains that affect the binding affinity and/or specificity of the antibody-antigen interaction. The recently generated deep-sequencing data and the increasing number of solved three-dimensional structures of antibodies from human and non-human origins have led to the emergence of numerous databases. However, these different databases use different numbering conventions and CDR definitions. In addition, the large fluctuation of the variable chain lengths, especially in CDR3 of heavy chains (CDRH3), hardly complicates the comparison and analysis of antibody sequences and the identification of the antigen binding residues. This review compares and discusses the different numbering schemes and “CDR” definition that were established up to date. Furthermore, it summarizes concepts and strategies used for numbering residues of antibodies and CDR residues identification. Finally, it discusses the importance of specific sets of residues in the binding affinity and/or specificity of immunoglobulins.


INTRODUCTION
In 1986, Muromonab-CD3 was the first monoclonal antibody (mAb) approved as a drug for human therapy. This murine antibody directed against the T lymphocyte CD3 complex has been widely used to prevent acute rejection in patients with organ transplants (1,2). To date, the Food and Drug Administration (FDA) has approved 71 mAbs 1 . These antibodies are mostly used against cancers and immunological disorders (3). Furthermore, multiple mAbs were proven to be efficient in the treatment of various pathologies such as bone loss (Denosumab), hypercholesterolemia (Evolocumab) or infectious diseases (Raxibacumab, Palivizumab).
Over the last decade, with the emergence of the deep sequencing techniques, an important number of new antibody sequences have been reported. In addition, numerous 3D structures of antibodies in complex with their target antigen have been reported and have permitted statistical identification of residues that are in direct contact with the antigen or that affect the binding affinity (4)(5)(6)(7). This has allowed approaching the molecular basis of antigen-antibody interactions. Although different bioinformatic tools, based on structural data, have been developed to predict antigenic epitopes or the interaction surface between a known antibody and its antigen (docking), in silico approaches are currently not able to tailor de novo the specificity of an antibody for a target antigen. In contrast, in vivo immunization techniques or selection of antibodies from combinatorial libraries by phage display are often used and were proven to be efficient to obtain specific antibodies directed against a given antigen of interest.
It is well documented that the structure of an immunoglobulin forms a Y-shaped glycoprotein (∼150 kDa) that is composed of two identical heavy and two identical light chains. These heavy and light chains are each encoded by genes that have diverged from the same ancestral gene. The variable domains of the light and heavy chains are responsible for antigen binding while the constant domains communicate with other components of the immune system. Notably, besides these "standard" immunoglobulins, camelids as well as some cartilaginous fish express another type of antibody that is devoid of light chain and is referred to as heavy-chain antibody (HcAb) or immunoglobulin new antigen receptor (IgNAR), respectively. These homodimeric antibodies are able to bind to their antigen with similar affinity as conventional heterotetrameric antibodies. In all cases, the variable domains of each chain contain three hypervariable loops named complementary determining regions (CDR-1,-2, and-3). The CDRs are separated by structurally conserved regions called framework regions (FR-1,-2,-3, and-4) that form a "core" β-sheet structure displaying these loops on the surface of the variable domain. The length and composition of the CDR sequences are highly variable, especially in the CDR3. The origin of this diversity lies in the complexity of the genetic mechanisms that generate the highly variable pool of antibodies from a relatively small number of antibody genes. Variable regions are assembled from two genes (V and J, for λ and κ light chains) or three genes (V, D and J for heavy chains), following the V(D)J recombination mechanism. The joined regions are part of CDR3. Further variability in CDR3 length and sequence is introduced by the mechanisms that permit addition or deletion of nucleotides in those junctions and by somatic hypermutations in the recombined genes. The CDRs are often approximated to the paratope of the antibody that interacts with the antigen and therefore contains the antigen-binding residues. The present review will demonstrate that this definition of the paratope is an oversimplification and doesn't exactly match with the reality. Antibody engineering methods have triggered the attention of many research groups as well as pharmaceutical companies. Antibody engineering technologies are of increasing importance in drug development and different biotherapeutics have been developed, including exploitation of antibody fragments (8), bispecific antibodies (9) or antibody-drug conjugates (10). However, despite their increasing success in drug therapy, mAbs commonly induce adverse events when injected into patients, especially chimeric molecules that contain murine or rat sequences (11) and can lead to the appearance of human anti-globulin antibodies in the serum of the patients (12)(13)(14).
To overcome these problems, different strategies have been developed that successfully reduce the immunogenicity of mAbs and therefore the risk of immune adverse events. The simplest approach to humanize a mAb consists in replacing the IgG constant regions from animal origin with the corresponding constant regions of human immunoglobulins. These, so-called, chimeric antibodies still include the entire variable regions from animal origin that are responsible for antigen binding. However, in most cases, these variable regions contain immunogenic regions that are sufficient to trigger adverse effects including anaphylaxis (14). Therefore, further humanization methods of the variable region were developed. In this context, the CDR-grafting or Specificity Determining Residue (SDR)-grafting have become widely used methods in the field. Briefly, these approaches consist in replacing murine framework regions by homologous regions from human origin (15,16). These, socalled, "reshaped mAbs" show fewer immunogenic epitopes compared to chimeric antibodies. However, to reach a higher degree of antibody humanization, a complete and precise identification of immunogenic epitopes is required. In this context, various approaches are available that have drastically improved over the past few years, and have significantly refined humanization methods. These approaches include surface reshaping or veneering (17,18), superhumanization (19), human string content optimization (20) or combinatorial approaches using phage-display libraries (21).
Unfortunately, even if these humanization techniques produce mAbs with reduced immunogenicity, they frequently lead to a loss in antibody affinity and/or specificity (22). In most cases, the main causes for this affinity loss are attributed to various factors such as imprecise definition of the CDR sequences (23), inappropriate choice of the human framework scaffold used for loop grafting and erroneous identification of structural corresponding residues from different species. Indeed, the antibody engineering techniques require an accurate identification of CDRs, antigen-binding residues as well as structural corresponding residues. Therefore an appropriate and standardized numbering scheme is crucial. Unfortunately, the establishment of a robust inter-species numbering convention is extremely challenging, especially given the high variability in CDR lengths and sequences.
The present review is divided into two parts. The first part describes and discusses the different numbering schemes of the variable regions established up to date. The second part compares the different CDR definitions and discusses the different residues involved in antigen-binding as well as a number of framework residues that, indirectly, affect the binding affinity of immunoglobulins. Finally, we discuss and suggest a general approach for antibody humanization. Adequate antibody numbering and annotation is of crucial importance in the field of antibody engineering and it will strongly advance monoclonal antibody-based human drug development.

NUMBERING SCHEMES OF ANTIBODY VARIABLE DOMAINS
Antibody engineering methods require precise identification of the residues that have an impact on the interaction and/or affinity of the antibody for its target antigen. For example, as mentioned above, CDR-grafting aims to decrease the immunogenicity of non-human antibodies by engineering the variable regions directed against the target antigen. This method requires an accurate identification of the CDRs and therefore an adequate alignment of antibody sequences from human and non-human species. Moreover, as discussed later in this review, it has been shown that residues from the framework regions might also exert a strong impact on the antibody affinity (24). Thus, the precise identification of corresponding positions in human and animal immunoglobulin chains is essential. However, the use of different amino acid numbering schemes currently available in the literature is confusing and might lead to aberrant identification of framework and CDR residues. Therefore, it is of crucial importance to understand the different numbering schemes and, consequently, being able to compare them. The following section is dedicated to the description of the different numbering schemes that are compared and summarized in Table S1.

Kabat Numbering Scheme
Over the past decades, sequencing and crystallization of antibodies resulted in significant increase of various sequence and structure databases, which made the comparison of the variable regions from human and animal immunoglobulins possible. In 1970, Kabat and Wu aligned 77 Bence-Jones protein and immunoglobulin light chain sequences in order to study the statistical variability in amino acid composition at the sequential positions of the variable antibody regions. They defined the "variability parameter" as the number of different amino acids at a given position divided by the frequency of the most occurring amino acid at that position. This analysis revealed three hypervariable regions in the variable region of the light chains. The presence of highly conserved residues was also demonstrated, such as the two cysteines that form a disulphide bridge at the inner core of the immunoglobulin domain and a tryptophan residue located immediately after CDRL1 (25). Likewise, three corresponding hypervariable regions were also identified in the variable heavy chain domain (26,27). Kabat and Wu postulated that these hypervariable regions would cluster at one side of the folded domain to form a surface responsible for specific antigen recognition and referred to these hypervariable regions as "Complementarity Determining Regions" "CDR"-1,-2, and-3. This hypothesis was later confirmed and further investigated to distinguish antigen-contacting or conformational important residues within these CDRs (28).
In 1979, Kabat et al. were the first to propose a standardized numbering scheme for the variable regions of immunoglobulins (29). In their compilation of "Sequences of Proteins of Immunological Interest" (30), the amino acid sequences of the variable region of the light (λ, κ) and heavy chain of antibodies, as well as the variable region of T cell receptors (α, β, γ, δ) were aligned and numbered. They observed that the analyzed sequences exhibited variable lengths and that gaps and insertions could only be included at precise positions. Interestingly, the points of insertion were located inside the CDRs, except for CDRL2, but also at some positions inside the framework regions (30). In the numbering schemes, these insertions are identified and annotated with letters (e.g., 27a, 27b. . . ). It is also noticeable that residue L10 is absent in all the λ light chains, while λ and κ chains are being coded by two different genes, located on different chromosomes. Over the last decades, the accumulation of sequences resulted in the creation of the KABATMAN database (31).
Although the Kabat numbering scheme is often considered as the standard that is widely adopted for numbering antibody residues, it has some important limitations. Firstly, this scheme was built on the alignments of a limited number of sequences from antibodies with the most common sequence lengths. Consequently, sequences with unconventional insertions or deletions in the CDRs or in the framework regions were not included. Therefore, the original Kabat scheme ignores antibody chains of unconventional lengths, with unique insertions or deletions. However, a useful numbering tool named ABnum 2 that numbers the amino acid sequences of variable domains according to a much larger and regularly updated database (Abysis 3 ), takes into account insertions of variable lengths, particularly in CDR2 by adding an insertion point at position L54. The second main limitation of the Kabat scheme is that it doesn't match very well with the 3D structure of antibodies. Indeed, the hypervariable regions defined by Kabat do not exactly match with the structural antigen-binding loops. The defined insertion points in CDR-L1 (L27) and CDR-H1 (H35) do not fit with their corresponding positions in the structures (Figure 1). In other words, the corresponding residues (topologically aligned) in crystal structures in CDR-L1 and CDR-H1 don't share the same number in the Kabat numbering scheme.

Chothia Numbering Scheme
In 1987, Chothia and Lesk introduced a structure-based numbering scheme for antibody variable regions. They aligned crystal structures of antibody variable regions, defined the loop structures that form the CDRs and corrected the position numbers of the insertion points inside CDRL1 and CDRH1 so that they better fit their topological positions (Figure 1) (32). Furthermore, they classified the CDR loops of heavy and light chains in a small number of conserved structures, called "canonical" classes (32-34) that will be discussed later. Based on the alignment of antibody structures, the Chothia numbering scheme shifts the point of amino acid insertion from position L27 to L30 and from position H35 to H31. It is worth mentioning that the Chothia CDR definition ensures a better correspondence to the structural loops. The loop structure of CDRH3 identified by Chothia matches well the Kabat hypervariable region. In contrast, the other loops are shorter than the hypervariable sequences defined by Kabat, except for CDRH1 which extends from H26 to H32. In any case, the CDRs defined on the hypervariable amino acids according to Kabat and based on loop topology in Chothia's nomenclature have for some CDR's a shifted location and/or comprise deviating loop lengths (Figure 2).
The Chothia numbering scheme possesses the main advantage that topologically aligned residues from different antibodies are localized at the same position number and that the Chothia CDR definition corresponds in most antibody sequences to the structural antigen-binding loops. However, confusion can also arise given the limited use of this numbering scheme compared to the Kabat or the IMGT numbering schemes (see below). Furthermore, a later study published by Chothia et al. changed the insertion point in CDR L1 from residue L30 to L31 (34). However, while investigating the conformation of the antigenbinding loops, of antibodies present in larger databases, they returned to the initial L30 position in 1997 (33). In a similar way, they initially defined an insertion point at position L93 in λ light chains (32) that was shifted to position L95 in their subsequent study (33). Finally, an important limitation of this numbering scheme is due to the use of the most common CDR sequence lengths, like the Kabat numbering scheme, and therefore the Chothia scheme ignores sequences with unconventional length. However, similarly to the Kabat numbering scheme, this system could be optimized by defining new insertion points.

Martin Numbering Scheme
In a study published in 2008, Martin et al. focussed on the structural alignment of different framework regions of unconventional lengths (35). They highlighted residues that are absent in most sequences and structures and therefore the authors defined these as deletion positions. By analyzing sequences and structures, they also proposed a correction of the insertion point within the framework region 3 of the heavy chain domain from position H82 to H72. In addition and by analogy with CDRH2, they amended the position of the insertion point for the CDRL2 that locates now at position L52. Finally, they used the numbering software, ABnum, mentioned earlier and recommended a new numbering scheme that consists of the Chothia numbering system corrected by the ABnum software. Indeed, this software uses the much larger Abysis database, which integrates sequences from Kabat, IMGT, and the PDB databases. For this reason, the ABnum program defines a novel insertion/deletion position at position H6 in the Chothia and Kabat numbering schemes (6). The Martin numbering scheme corrects this point of insertion and shifts it toward position H8.
The Martin numbering scheme should be considered as the most recent version of the Chothia numbering. By analyzing

Gelfand Numbering Scheme
Another interesting numbering method that has been used in a few studies is the one described by Gelfand et al. (36)(37)(38)(39). This numbering system results in a relatively complex nomenclature. The variable chain sequences are divided into 21 fragments termed "words, " each of these "words" matches with a secondary structure element (a strand or a loop). The strands are defined by a letter in alphabetical order (e.g., A, B, C) and the loops by two letters that corresponds to the neighboring strands (e.g., AB, BC. . . ). However, there are two exceptions in this terminology: the three N-terminal residues of the variable chain (named OA since they are not part of the first β-strand) and the loop connecting the B and C strands, which has a 'two span bridge' conformation with one residue deeply inserted into the structure (40) (Figure 3). This loop is divided into two words named BC and CB. This numbering system does not include gaps or deletion points but permits a precise comparison of secondary structures (loops and strands) between aligned sequences. It's also noticeable that the Gelfand definition of several loops does not exactly correspond to Chothia's definition of loops.

IMGT Numbering Scheme
In 1997, Lefranc et al. introduced a new and standardized numbering system for all the protein sequences of the immunoglobulin superfamily, including variable domains from antibody light and heavy chains as well as T cell receptor chains from different species (41,42). Their numbering scheme was based on amino acid sequence alignment of the germ-line V genes. Consequently, the amino acid sequence and numbering stops where CDR3 should start. Later on, the authors extended their numbering scheme to the entire variable domains and developed various tools to analyze the full-length sequences (43). IMGT possesses its own definitions of the framework regions (named FR-IMGT) and CDR (named CDR-IMGT).
The IMGT numbering method counts residues continuously from 1 to 128 based on the germ-line V sequence alignment. Thus, it avoids the use of insertion codes, except between position 111 and 112 for CDR3-IMGT with more than 13 amino acids. Conversely, no number is attributed when a residue is missing in a particular sequence. For example, in a 6 amino acid long CDR1-IMGT, residue #27 is followed by residue #34 (and residue numbers #28-#33 are absent). An example of alignment according to Kabat, Chothia and IMGT numbering schemes is shown in Figure 4.
IMGT is the primary reference in immunogenetics and immuno-informatics. Its conventions, including its amino acid numbering method, have been recognized and are currently used by the World Health Organization-International Union of Immunological Societies. This numbering method has the main advantage that it is based on alignments of sequences from a complete reference gene database (44,45) including the whole immunoglobulin superfamily. This has led to the development of highly useful tools. For example, amino acid alignment and numbering can be performed by the IMGT/DomainGapAlign (46). This tool also enables to analyse sequence domain polymorphisms by identifying the corresponding VDJ genes coding for the variable region. It is coupled with another interesting application known as IMGT-"Collier de Perles" (47) that allows to visualize at a glance the position of the amino acids in a 2D representation, and also to delineate easily FR-IMGTs and CDR-IMGTs.
However, due to the continuous numbering of the amino acids along the sequence, the IMGT numbering scheme does not allow an intuitive visualization of insertion positions, even for the most common ones. For the same reason, this numbering scheme is less flexible. Indeed, while in the Kabat and Chothia numbering systems, positions of amino acid insertion points are easily incorporated; it is more difficult to adapt the IMGT scheme for potential sequences with new amino acid insertions. It has to be noted that IMGT places all such insertions at the end of the CDR, which doesn't correlate with the antibody structure. However, this problem has been corrected in the later V-Quest software that places insertions in the middle of the CDR-IMGTs, which matches better with the available structural data (48).

Honneger's Numbering Scheme (AHo's)
The Honegger scheme numbers the variable domains of the immunoglobulin superfamily in a homogenized format. This system is based on structural alignments of the 3D structures of the immunoglobulin variable regions covering the observed length variation. It allows to define structurally conserved Cα positions and therefore deduces appropriate framework regions and CDR lengths (Figure 5) (49).
The Honegger numbering scheme (AHo's) also defines conserved residues (C23, W43, C106, G140) and gaps on specific positions (#27-28, #36, #63, #123). The CDR1 has a "two span bridge" conformation created by a conserved hydrophobic residue at position #31 which is deeply inserted into the structure and therefore divides the loop into two distinct parts (40). The Honegger scheme describes two gap regions located onto these two parts, one located in the first part (#27 and 28) and the other one located in the second part (#36). This convention respects the variability of insertions present on both sides of the loop. Furthermore, two other insertion points are located, respectively, at position #74 and #75 to reflect the shorter C-terminal branch of the CDR2 loop exhibited by T cell receptor α. Additional gap positions are placed in the middle of the CDR-2 and-3 loops (Figure 5). From further structure analysis, they proposed to shift the insertion gap in V κ chains initially located in position L10 to L8.
The main advantage of the Honegger numbering system is that it is based on structural alignments and therefore it matches better to antibody 3D structures features, in a similar manner to the Chothia numbering scheme. In addition, as mentioned for the IMGT scheme, AHo's is well-suited for the numbering of all proteins from the immunoglobulin superfamily by including two gaps into CDR 1 and 2. However, similarly to the IMGT scheme, the AHo's can skip some numbers in the sequential residue numbering which can be puzzling when analyzing the sequence numbering. This numbering scheme is also less flexible and adaptable to include immunoglobulins with new or larger insertions. Although the observed length variability was covered, it is possible that new insertion positions/lengths could be found by taking into account a larger number of structures. Furthermore, the most structurally conserved positions were obtained from only 28 different structures. Likewise, a better precision in defining the framework regions could be reached by adapting the scheme specifically to the variable region of a specific type of immunoglobulin (e.g., antibody).
Finally, the sequence numbering using the aHo's can be submitted to the PyigClassify server 4 . However, this server doesn't seem to take into account the two insertion positions defined in the original paper (49). 4 http://dunbrack2.fccc.edu/PyIgClassify/User/UserSequences.aspx

FINDING YOUR WAY AMONG THE CDRS DEFINITIONS CDR Definitions and Antigen Binding Residues
CDRs are commonly considered as structured loops that are involved in antigen binding and exhibiting a hyper-variable amino acid composition. However, defining a CDR based on antibody amino acid sequences can be complicated. Indeed, the different numbering schemes presented in this review utilize different definitions of CDR lengths. In addition, as shown earlier, the Kabat (and IMGT) CDR definitions are based on sequence alignments while the Chothia CDR definition better reflects the loop structure in antibodies' 3D architecture. This lack of agreement in defining precisely the CDR lengths and positions is somehow unexpected since these regions were shown to be responsible for the antigen-binding activity already a long time ago (16). In order to address the disparity in CDR definitions, several authors have taken into account the different possible lengths for defining CDRs sequences. For example, North et al. use longer sequences in their recent structural analysis of the conformations of the CDR loops (50).
Furthermore, a high binding affinity reflects a very stable antibody-antigen complex. This is accomplished by multiple non-covalent bonds between amino acid residues of the paratope and the epitope. However, it has been shown that only 20 to 33 % of the amino acids within the CDRs make direct contact with the antigen (51). These residues, named "Specificity Determining Residues (SDR)", were first described by Padlan et al. (52). Their results show that these SDRs are involved in the interaction with the antigen and, in most cases, match with the most variable positions present in the CDRs. Using this SDR concept, MacCallum and co-workers suggested a new method to define the CDRs and re-named the SDRs "contact residues". They also suggested that contact residues are more often located in the center of the paratope and, as Chothia mentioned before, noncontacting residues play a role in shaping the conformation of the CDR loops and therefore orientate the contact residues optimally for efficient and specific antigen binding. Finally, their study showed that small antigens tend to bind a paratope with a more concave interface (6), whereas long-convex-shaped paratopes will be better suited to bind enzyme active sites and block their catalytic activity. In this context, the use of nanobodies or bovine antibodies that exhibit an extended length of CDR3 are particularly interesting (53,54).
Similarly, Ofran et al. used a multiple structural alignment approach to identify the antigen binding residues of the variable regions (55). They revealed that the antigen binding residues show a particular amino acid composition, as previously suggested by other groups (25,56). In particular, tryptophan and tyrosine residues are highly over-represented, whereas almost all other residues are under-represented in all CDRs. Moreover, they showed that conventional classical CDRs identification (Kabat, Chothia, IMGT) could miss about 20% of the antigen binding residues and suggested a new method for the identification of the regions containing these antigen-binding residues. They named these regions Antigen Binding Regions (ABRs) that can be identified using the Paratome 5 online tool. This server identifies ABR by comparing the antibody sequence with a set of antibody-antigen structural complexes (4,55,57). Another useful alternative tool is proABC (http://circe.med.uniroma1. it/proABC/). This software estimates the probabilities for each residue to form an interaction with the antigen (58).
Another interesting approach was offered by Robin et al. (5) studying the binding free energy of antibody-antigen complexes using a computational alanine scanning method. They showed that 80% of the binding free energy in the studied complexes 5 http://ofranservices.biu.ac.il/site/services/paratome/ is clustered on a very limited number of interacting residues (between 4 and 13). They highlighted 30 positions having major contribution to the binding free energy, 27 of them being located within the CDRs (using the Kabat definition) and the remaining 3 in the framework regions. All these positions are occupied by a restricted panel of amino acids (Y, G, S, W, D, N) where aromatic residues are the major contributors to the binding free energy. Based on the identity of these residues, the nature of the antigenantibody interaction can be predicted and will be discussed later (see discussion section). In their analysis, CDRH2 and CDRH3 include most of these residues while CDRL2 does not contribute at all in more than half of the investigated complexes. Finally, they show that the Ab-Ag complex formation involves between 3 and 6 CDRs. They also confirmed their computational analysis using experimental data from case studies (5).
Interestingly, a simple set of rules for Kabat and Chothia CDR identification has been defined by Martin and implemented in the ANTICALIGN software (59,60). Figure 6 illustrates the disparity in the CDR definitions. This comparative alignment shows that CDRs defined with the classical CDR definition (Kabat, Chothia, IMGT) should be considered only as an approximation of the paratope. According to all the concepts and observations described above, a residue should be considered as part of the paratope if: (i) it is in close contact with the antigen and/or; (ii) it significantly and specifically contributes to the negative Gibbs energy change occurring upon antibody-antigen complex formation. Also, it should be noted that: (i) some contacting residues may contribute minimally to the binding free energy and even disfavor the complex formation and; (ii) that a residue energetically important for binding to the cognate antigen may not be important for the difference in affinity between cognate and non-cognate antigens and, finally; (iii) a residue crucial for antigen recognition may not be important for binding free energy. Indeed, the paratope and its corresponding epitope possess shape and chemical complementary surfaces. This shape complementarity involves mainly aromatic residues (W, F, Y) establishing van der Waals (π-π) or other hydrophobic interactions. In contrast, stable Ag-Ab complexes, characterized with low dissociation rate constants, are mostly attributed to electrostatic interactions or hydrogen bonds involving charged or polar side chain residues. Importantly, many residues included within the CDRs are not directly in contact with the antigen, but may also play an important role in the Ag-Ab interaction by upholding the optimal conformation and orientation of nearby interacting residues (33,61).

Structural Classification of the CDRs
Ideally, an adequate definition of the CDRs should include all the residues that form the surface of the paratope and that interact with the epitope. A structural analysis of the CDRs is therefore important. Chothia's group classified the CDR loops of heavy and light chains in a small number of conserved structures, called "canonical" classes (32)(33)(34). This classification system indicates that the CDRs of the light chain (CDR L1, L2, L3) and the first two CDRs of the heavy chain (CDR H1, H2) adopt only a few different structures. These structures seem conditioned by the sequence length and presence of key amino acids at hallmark positions. The authors identified that only very few conserved residues (13 and 7 in the light and heavy chains, respectively) found within the CDR and FR regions are responsible for the conformations of the CDRs. However, controversially, it seems that there is no obvious correlation between the germ-line sequence of a CDR and its canonical class (62), most likely because small sequence differences, including in the framework regions, can impact the conformation of the CDRs (63). Beside Chothia's works, several other studies have enhanced the structural clustering of these CDRs and the identification of key residues impacting their conformations (50,62,(64)(65)(66), including the highly variable CDRH3 (67)(68)(69). An interesting study published by De Genst et al. illustrates the relation between sequence and structure of CDRs. Briefly, they showed that the part of the CDR3 loop encoded by the same D gene in two nanobodies adopts an identical structure and targets the same epitope on the antigen (70).
In conjunction, these studies suggest that the structural classification of the CDRs based on structure prediction from the CDR sequences can be a very useful tool for antibody engineering. A few online tools are available for this purpose (71,72) 6,7 . However, it is important to keep in mind that residues from the framework regions can also influence the CDR conformation as regularly discussed in this review.

INFLUENCE OF NON-CDR RESIDUES ON THE ANTIBODY BINDING AFFINITY
It is now well-established and documented that non-CDRs residues may play an important role in the binding affinity of the antibody to its antigen (73), either by making direct contact with the antigen (4, 61), by affecting the stability or flexibility of the antibody or its antigen-binding loops (74), or by structuring the CDR loop itself (75). Indeed, the residues from the framework 6 http://dunbrack2.fccc.edu/pyigclassify/ 7 http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/WelcomeSAbPred.php regions can modulate the conformation of CDRs and therefore affect the binding affinity. These residues were defined and named "Vernier zone residues" and included amino acids located in the framework regions just in the vicinity of the CDR loops (75). Finally, non-CDR residues that influence the light and heavy chain variable domain packing and orientation are also critical for the antigen-binding affinity and, surprisingly, were often ignored or neglected. Therefore, the amino acids of the Vernier zone are incomplete and should be extended to include all the residues located in the frameworks impacting the affinity of the antibody. The next paragraph focuses on the residues located at the interface between the V H and V L that alter the packing of these domains and consequently, influence the topography of the paratope.
Chothia et al. were the first to describe and analyse the packing of the V L and V H domains (76). They highlighted the presence of aromatic side chains that are involved in this interface, but their analysis relied on only three antibody structures available at that time. In 2010, Abhinandan and Martin made a more comprehensive analysis of the diversity between V H and V L packing angles including 567 antibody crystal structures. They developed a method to predict the packing/orientation of the heavy and light chain variable domains based on the presence of specific amino acids located at the interface between these two domains. In practice, they defined Cα atoms from structurally conserved residues located at the interface between the V H and V L domains. These atoms were then used to fit two regression lines, one on the light and one on the heavy chain. The packing angle value corresponds to the angle formed by these two fitted vectors (Figure 7) and varies significantly from one antibody to another (i.e., −31.0 • to −60.8 • ) with a mean of −45.6 • . Their results suggest that the Chothia's loop residues have only little influence on this angle. In contrast, the authors identified 13 residues on the V H /V L interface that influence this packing angle (77). A web tool named PAPS (Packing Angle Prediction Server) is available online for the prediction of the packing angle and is based on sequence homology 8 . More recently, Dunbar et al. (78) further characterized the orientation of the variable domains. Indeed, one particular angle might reflect more than one single possible orientation of the V H and V L . Therefore, they developed a new method to describe more precisely the V H /V L orientations by defining 5 different angles and one distance. They also investigated positions in the V H /V L interface and the residue identities influencing these different angles and distance. They concluded that a particular residue at a specific position might lead to more than one specific V H /V L angle. They generated a new software named AB Angle that is available online 9 , to measure the different angles from PDB structures (78).
This V H /V L angulation influences also the relative position of the CDRs and, consequently, the shape of the paratope. This parameter can therefore have a strong impact on the binding affinity. Indeed, the binding energy between two atoms is a function of their distance following the Lennard-Jones relation. A difference of a few Angstroms can strongly affect the value of the binding free energy. Using right triangle simple trigonometry and assuming a variable region length of 37 Å, a difference of 1 • between the V L /V H domains causes a displacement of 9 http://www.stats.ox.ac.uk/~dunbar/abangle/ the atoms exposed on the CDR surfaces by about 0.6 Å. The choice of the framework regions for humanization by the CDRgrafting technique is therefore of crucial importance to maintain affinity. For example, Nakanishi et al. showed a severe affinity loss of a humanized antibody and restored the original affinity by performing two mutations at the VH/VL interface (79). Similarly, Bujotzek et al. performed antibody humanizations by selecting human frameworks based on the predicted V H /V L orientation and revealed a correlation between similar angles and affinity of humanized antibodies (80).
This concept of packing angles is a critical aspect of antibodyantigen interaction and the residues that modulate the V H /V L orientation have therefore to be considered as elements that introduce further diversity in the paratopes. Framework residues affecting significantly the affinity are listed in Table S2.

DISCUSSION: IMPORTANCE OF AMINO ACID NUMBERING AND CDR DEFINITIONS IN ANTIBODY HUMANIZATION
To achieve a high affinity binding, the paratope and its corresponding epitope must have large shape complementary surfaces and, in addition, the contacting residues must establish interactions that stabilize the complex. The parameters that influence the shape diversity of the paratope are essentially the CDR lengths and conformations (canonical classes) (32,56,64,81), their relative orientations (77)(78)(79)82) and the hydration shell (solvation) of the binding interface (83). The binding of the two complementary surfaces is mostly driven by aromatic residues that establish van der Waals and hydrophobic interactions while the strengthening of the complex involves rather electrostatic interactions and hydrogen bonds established between side chains of adequately positioned charged and polar residues.
An antibody humanization experiment attempts to reconstitute the original paratope-epitope interactions, in most cases, by grafting the CDRs of a non-human antibody to a human antibody scaffold. This CDR-grafting or reshaping method is often based on a simplified view of antigen-antibody interaction that reduces the paratope to the 6 CDRs of the antibody. Although, there is no general protocol to perform an antibody humanization, since it is always a case-to-case experiment, this section attempts to provide guidance in such an exercise. Figure 8 summarizes and suggests a standardized protocol to humanize antibodies from animal origins. Firstly, we need to identify the CDRs within the loop of the donor antibody from animal origin. Under the simplified assumption that the paratope corresponds to the CDRs, it is recommended using the Chothia's CDR definition as they correlate very well with the structural loops present in the variable regions. A few online software tools are available for CDR structure prediction 10,11 (72,84) and classification 12 (71). However, it is always best to choose the broadest CDR definition to ensure that all residues constituting the paratope will be included. Secondly, as discussed repeatedly in this review, residues outside the classical CDR definitions can also be part of the actual paratope. Different studies can help to identify these residues, but unfortunately, only one bioinformatical tool (Paratome) is currently available. Finally, the relative orientation of the CDRs is also critical to reconstruct the paratope surface and to position adequately its antigen interacting residues. Hence, it is essential to choose the most appropriate human V H and V L framework scaffolds. Selecting this human antibody scaffold is probably as important as the definition of the CDRs. The chosen human framework scaffolds should exhibit the closest V H /V L angles compared to those in the animal antibody for a correct positioning of the CDRs in the reshaped construct. In this context, different amino acid positions have been highlighted for their role in the V H /V L angulation and various angle prediction software tools are currently available. Thus, a reshaping effort to humanize antibodies should be an effective "paratope grafting" experiment rather than a "CDR-grafting." Noticeably, for all these purposes, it is important to align the antibody sequences correctly and to identify precisely the residues with superimposed positions in chains of different origins. Therefore, the handling of the same and an adequate numbering scheme that attributes an identical number of residues (including fixed possible residue insertion points) to occupy the same structural positions in the immunoglobulin chains forms a prerequisite for all antibody engineering tasks. In this respect, the enhanced Chothia's (Martin's) numbering system is a bit easier to use since it identifies precisely insertion points but, of course, this choice is quite subjective.
Another critical point is that the humanized antibodies should not induce any adverse immune reaction in the patient. Therefore, prediction of potential immunogenic epitopes in the protein sequence should be performed. Briefly, this adverse immune response takes place after internalization of the antibodies by the antigen presenting cells. Antibodies are digested in oligopeptides that bind to HLA-DR, HLA-DQ, or HLA-DP molecules. These membrane protein complexes bind specific oligopeptides and present them to lymphocytes T helpers that activate the immune response. Nowadays, databanks reporting the polymorphisms of these MHC-II molecule alleles are available to predict the oligopeptide binding potential (85)(86)(87). One of these is the Epivax web tool (88) that consists in scanning a protein sequence for identifying putative T cell epitopes. Identified antigenic sequences are predicted to bind to HLA-II DR (or other HLA-II isotypes) proteins but the program can also identify Treg epitopes that inhibit the immunity response. This software offers to mutate one or more amino acids in order to reduce the immunogenicity of a protein sequence. Sequence immunogenicity is calculated and presented in an "Epimatrix score" allowing to predict the immunogenicity of a given amino acid sequence (89). Highlighting important residues (for binding affinity/specificity, angulation, etc.) that shouldn't be mutated could improve this software. It is important to realize, that in the case of antibodies, conflicts may arise between maintaining a high affinity and a low immunogenicity. This is one of the reasons why antibody humanization remains challenging.

CONCLUSION
This review describes the different amino acid numbering systems and CDR definitions that are currently available and it highlights the importance of standardized numbering system for antibody engineering strategies, especially for antibody humanization tasks. Indeed, an effective amino acid numbering system should be able to assign the same number of residues to structurally aligned positions in antibodies from different species. Although several numbering tools based on ever growing databases are available online, it is recommended to compare the different numbering systems as inaccuracies are still possible, especially for variable antibody domains with unconventional lengths.
Furthermore, the different CDR definitions and other concepts, such as contact residues or antigen binding residues, have been reviewed. In the context of antibody humanization methods, paratope has been very often limited to the CDRs. This approximation is useful as long as it permits the CDRgrafting method to be an easy and generalizable tool for humanization. In contrast, grafting contacting residues (even these outside the CDRs) and/or residues having an impact on the binding free energy would allow a better reconstitution of the paratope. However, this approach is less convenient in antibody humanization efforts because of the discontinuous nature of the paratope combined with the experimental approaches that require the precise determination of these residues. Moreover, different studies that have analyzed the angle between the light and heavy chain variable regions have been described. In CDR-grafting or other antibody humanization methods, residues affecting the V L /V H packing angles should be considered in order to restore full binding affinity. Finally, all of these concepts that are crucial for the humanization of antibodies should be included in the humanization process.
In summary, a precise identification of the paratope established using an appropriate amino acid numbering scheme is necessary to engineer humanized antibodies with high affinity, stability and low immunogenicity. Designing a fully functional paratope on another framework should not only be restricted to grafting the antigen contacting and interacting residues, but should also include the amino acids that assist in fixating the antigen-binding loops and the residues at positions that affect the relative orientation of the paired V L /V H . All these residues have to occupy identical positions in the 3D structure.

AUTHOR CONTRIBUTIONS
MD and MV wrote the manuscript. PF, ES, BQ, SM, MG gave regular intellectual inputs and proofread the manuscript.

ACKNOWLEDGMENTS
The authors would like to acknowledge the Fonds de la Recherche Scientifique, the Walloon Region and the funds from DGO6 for their support.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu. 2018.02278/full#supplementary-material Table S1 | Alignment of the numbering systems described by Kabat, Chothia, Martin (enhenced Chothia), Gelfand, IMGT and Honegger. VL states for variable domains of light chains, VH for variable domains of heavy chains. IMGT and Honegger (Aho) numbering systems do not take into account the nature (heavy or light) of the chain. Insertion positions are highlighted in yellow. In the Aho numbering scheme, these yellow residues correspond to the gap positions. Deletion positions are in brackets (residue L 10 in lambda light chains, Martin numbering scheme). CDR definitions are indicated in orange and framework definitions in blue (IMGT and Aho schemes). The Gelfand numbering scheme is divided in words (boxes) with the numbering below. For the Aho numbering scheme, structurally conserved position are highlighted in green. Conserved amino acids are indicated below the IMGT numbering scheme (Cys 23, Trp 41, Leu 89, Cys 104, Phe/Trp 118, Gly 119).