^{1}

^{1}

^{2}

^{1}

^{2}

^{3}

This article was submitted to Biological Modeling and Simulation, a section of the journal Frontiers in Molecular Biosciences

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Antibodies have the remarkable ability to recognise their cognate antigens with extraordinary affinity and specificity. Discerning the rules that define antibody-antigen recognition is a fundamental step in the rational design and engineering of functional antibodies with desired properties. In this study we apply the 3D Zernike formalism to the analysis of the surface properties of the antibody complementary determining regions (CDRs). Our results show that shape and electrostatic 3DZD descriptors of the surface of the CDRs are predictive of antigen specificity, with classification accuracy of 81% and area under the receiver operating characteristic curve (AUC) of 0.85. Additionally, while in terms of surface size, solvent accessibility and amino acid composition, antibody epitopes are typically not distinguishable from non-epitope, solvent-exposed regions of the antigen, the 3DZD descriptors detect significantly higher surface complementarity to the paratope, and are able to predict correct paratope-epitope interaction with an AUC = 0.75.

Antibodies, also known as immunoglobulins, are multimeric Y-shaped proteins that the immune system uses to recognize and neutralize foreign targets, named antigens. The antigen binding site is located on the upper tip of the molecule, and is formed by the pairing of two variable domains, the VH and the VL, each contributing three hypervariable loops or complementary determining regions (CDR). The remarkable ability of the antibodies to recognize virtually any foreign antigen stems from the sequence and length variability of the CDR, while the framework of the molecule is largely conserved (

Early studies, based on a handful of crystallographic structures, revealed that despite the large sequence variability of CDRs, five out of the six hypervariable loops only exhibit a limited number of main-chain conformations called “canonical structures” (

The study of molecular interactions in proteins, and antibodies in particular, poses well known challenges. Existing experimental methods, such as Xray crystallography, mass spectrometry, phage display and mutagenesis analysis are intrinsically expensive, laborious, and time consuming (

In this study we rely on a surface representation of antibodies and their cognate antigens based on the 3D Zernike Descriptors (3DZD). The Zernike polynomials were first described by Fritz Zernike in 1934 (

Additionally, we show that while in terms of surface size, solvent accessibility and amino acid composition, antibody epitopes are not distinguishable from non-epitope, solvent-exposed regions of the antigen, they display significantly higher surface complementarity to the antibody paratope, both in terms of shape and electrostatic 3DZD, leading to a prediction performance in terms of ROC AUC of 0.75 and 0.61 respectively.

We selected 326 antibodies with redundancy lower than 90% and resolution <3.0 Å using the SabDab database (

For each antibody and protein antigen 3D structure, atomic partial charges and radii were assigned using PDB2PQR with default parameters (

The set of selected molecular surface points was scaled to the unit sphere and placed into a 3D grid of dimension 128^{3}. To avoid boundary effects, the size of the bounding box of the point cloud was set so as to be contained within 80% of the unit sphere (

Since the Zernike formalism does not differentiate positive and negative values (

In summary, voxels with positive electrostatics values were initialized to 1 and all other voxels with negative electrostatics values were set to zero, and vice versa. The resulting voxels, one for SAS values, and two for positive and negative ES values, respectively, were considered as three different 3D functions, f(x), each expanded into the 3DZD as described in the next section.

For the quantitative description of the binding sites, we rely on a representation based on the Zernike polynomials and their corresponding moments. Moment-based representations are a class of mathematical descriptors of shape, originally developed for pattern recognition and subsequently generalized to three-dimensions (

A surface described by a function

The Zernike polynomials can be written as:

The 3D Zernike moments of a surface described by a function

Their rotation invariant norms, i.e. the 3DZD, are defined as:

The Zernike formalism can be as detailed as desired by modulating the order of the expansion n. In our implementation, the function f represents the geometric or the (positive or negative) electrostatic potential of the molecular surface, and the maximum order of expansion was set to 20, giving a total of 121 invariants.

Given the dataset of Antibody-Antigen complexes containing protein antigens, the native geometric epitope was defined as the set of residues of the antigen having a distance lower than 6 Å to any residue of the antibody. The pivot residue was defined as the residue with the lowest mean distance to any residue of the native geometric epitope. The native electrostatic epitope was defined as the set of residues of the antigen having a distance shorter than 15 Å to any residue of the antibody. For the set of native geometric epitope residues, the Solvent Accessible Surface Area (SASA) was computed using GROMACS. The mean and standard deviation values of the computed global and residue-based SASA were used to generate an alternative set of surface patches, i.e. decoy epitopes. The algorithm first selects a decoy pivot residue, i.e. by randomly selecting any solvent exposed residue having a value of SASA within half standard deviation of the mean SASA value measured over all pivot residues of the native epitopes, i.e. ^{2} (

Given a pair of ordered set of 3DZD, x and y, their cosine distance is measured as:_{
c
} (

Given two patches A and B, the similarity between their 3DZD is computed as:_{
shape
},

The surface complementarity between A and B is defined as follows:

In this work we aim at providing a quantitative description of the geometric and electrostatic properties of antibody-antigen interaction through a mathematical representation of the interacting surfaces. To this aim, we rely on a dataset of experimentally determined 3D structures of antibody-antigen complexes and a moment-based representation of the interacting surface using the 3D Zernike descriptors (3DZD) (

The 3DZD descriptors provide a compact, roto-translationally invariant representation of 3D objects, thus enabling effective comparison of both global and local properties of molecular surfaces by standard pairwise similarity metrics. The order n of the series expansion determines the resolution of the descriptor. In this study, 3DZD were computed at different levels of truncation of the expansion, with n ranging from 10 to 20, which correspond to vectors of 36 and 121 invariants, respectively. The overall scheme of the procedure used in this work is shown in

Schematic workflow for the comparison of Ab-Ag interfaces based on 3DZD.

We have previously shown that a 3DZD-based description of the surface of the antibody CDRs provides an effective metric for antibody classification according to their specificity towards protein and non-protein antigens (_{
pb
}) in the neighbours set. As it is shown in _{
pb
} (_{
pb
}] = _{
Prot
}/_{
tot
}, where _{
pb
}] is the expected number of protein-binding antibodies if they were distributed uniformly, _{
prot
} represents the number of protein-binding in the dataset and _{
tot
} is the total number of antibodies in the dataset.).

_{
pb
}) in the neighbours set of protein binding (green curve) and non-protein binding antibodies (orange curve) based on surface shape similarity. _{
pb
}) in the neighbours set of protein binding (green curve) and non-protein binding antibodies (orange curve) based on electrostatic surface similarity. _{
pb
} (green curve).

We next analyzed the performance of each descriptor in classifying the CDRs as a function of the antigen type, using a leave-one-out approach. In summary, for each CDR, if the _{
pb
} was greater than Ex (_{
pb
}) the CDR was labeled as protein-binding, non protein-binding otherwise. The obtained classification accuracy for the shape and electrostatic descriptors at order _{
pb
} computed based on shape and electrostatic descriptors, respectively, and A is the weight ranging from 0 to 1. The results are shown in

The sequence and structure analysis of antibodies, as well as antibody engineering experiments, crucially rely on the precise identification of the CDRs from the antibody sequence (

We then applied the same classification procedure as described previously, by fixing the order

As it can be noticed in

A key feature of the 3DZD description is that it is invariant under rotation and translation of the represented surface. This implies that two interacting protein regions with perfect surface complementarity yield identical sets of 3DZD descriptors (

Surface complementarity of antibody-antigen interacting surfaces based on shape

In this work we describe a computational protocol based on the 3D Zernike descriptors formalism, which allows a fast, superposition-free comparison of molecular surfaces, and has been applied here to the study of the interacting regions of the antibodies and their cognate antigens. The method represents a significant upgrade compared to our previous implementation (

As 3DZD descriptors are roto-translation invariant, they are also adept at capturing and quantifying surface complementarity at protein-protein interfaces (

The original contributions presented in the study are included in the article/

RL contributed to conception and design of the study. LDR collected the datasets, wrote the software and performed the analyses. EM and GR contributed to analysis and interpretation of the results. RL and LDR wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

The Supplementary Material for this article can be found online at: