Large-Scale Analysis of B-Cell Epitopes on Influenza Virus Hemagglutinin – Implications for Cross-Reactivity of Neutralizing Antibodies

Influenza viruses continue to cause substantial morbidity and mortality worldwide. Fast gene mutation on surface proteins of influenza virus result in increasing resistance to current vaccines and available antiviral drugs. Broadly neutralizing antibodies (bnAbs) represent targets for prophylactic and therapeutic treatments of influenza. We performed a systematic bioinformatics study of cross-reactivity of neutralizing antibodies (nAbs) against influenza virus surface glycoprotein hemagglutinin (HA). This study utilized the available crystal structures of HA complexed with the antibodies for the analysis of tens of thousands of HA sequences. The detailed description of B-cell epitopes, measurement of epitope area similarity among different strains, and estimation of antibody neutralizing coverage provide insights into cross-reactivity status of existing nAbs against influenza virus. We have developed a method to assess the likely cross-reactivity potential of bnAbs for influenza strains, either newly emerged or existing. Our method catalogs influenza strains by a new concept named discontinuous peptide, and then provide assessment of cross-reactivity. Potentially cross-reactive strains are those that share 100% identity with experimentally verified neutralized strains. By cataloging influenza strains and their B-cell epitopes for known bnAbs, our method provides guidance for selection of representative strains for further experimental design. The knowledge of sequences, their B-cell epitopes, and differences between historical influenza strains, we enhance our preparedness and the ability to respond to the emerging pandemic threats.


INTRODUCTION
Influenza epidemics result in substantial morbidity and mortality (1). The World Health Organization (WHO) Global Influenza Network provides annual recommendations on antigenic variants to be included in the influenza vaccine formulations. Influenza virus has low-fidelity polymerases that result in high mutation rates (2). As a consequence, seasonal influenza viruses efficiently escape from acquired immunity in the human population through antigenic drift increasing the impact of seasonal influenza. The antigenic shift in influenza A viruses -the reassortment of multiple viral genomes resulting in new strains with recombined antigensleads to occasional worldwide pandemics that result in significant morbidity and, usually, high mortality. High transmissibility of influenza combined with rapid mutation rates makes the discovery of novel influenza therapeutics an imperative (3). The main challenge in developing antibody-based prophylactics and therapeutic vaccine against influenza is to understand the variation generated by the virus and developing means to elicit broadly neutralizing antibody responses.
The majority of neutralizing antibodies (nAbs) generated during a normal immune response target hemagglutinin (HA) and block viral entry into host cells (4). However, significant sequence diversity among HA genes limits the protective breadth of these nAbs (5). This sequence diversity of influenza A virus is highthere are 17 HA serotypes that belong into two major groups called group 1 (Grp1: H1, H2, H5, H6, H8, H9, H11, H12, H13, H16, and H17), and group 2 (Grp2: H3, H4, H7, H10, H14, and H15) (6). C179, the first neutralizing antibody reported to neutralize strains from H1 and H2 of influenza A virus, was isolated from mice immunized with the A/Okuda/57 (H2N2) strain (7). Later it was found that C179 was able to cross-neutralize H1, H2, H5, H6, and H9 subtypes (8)(9)(10)(11). The next major advance in the field came about 15 years later (12), a novel class of human antibodies encoded by the V H 1-69 gene were discovered. Among these antibodies, a series of broadly neutralizing antibodies (bnAbs) have been described, such as CR6261 and F10 (13). Most bnAbs that neutralize influenza A virus have been reported to neutralize strains from either exclusively Grp1 or Grp2. FI6v3 (14) and 39.29 (5) are the only antibodies reported to neutralize human influenza isolates from both Grp1 and Grp2. Influenza B viruses are classified within a single influenza type, with two antigenically and genetically distinct lineages that co-circulate (15), represented www.frontiersin.org by the prototype viruses B/Victoria/2/1987 (Victoria lineage) and B/Yamagata/16/1988 (Yamagata lineage) (16). Antibody CR8071 (17) is a bnAb against influenza B viruses, with neutralizing ability for both Victoria and Yamagata lineages. bnAb CR9114 (17) binds a conserved epitope on the HA stem and was shown to neutralize all tested influenza A viruses. However, it did not show in vitro neutralizing activity against influenza B viruses at the tested concentrations (17).
Generally, the neutralizing effectiveness of these bnAbs was evaluated using representative strains from the subtypes of influenza A virus or lineages of influenza B virus. Because of the high variability of HA genes, such evaluation might result in a conclusion that is limited to the tested viral variants. To determine the landscape of nAbs and better understand their cross-reactivity properties, we performed a systematic study of B-cell epitopes of a selection of nAbs against influenza virus. Antibodies recognize discrete sites on the surface of macromolecule called B-cell epitopes (antigenic determinants). Some 10% of B-cell epitopes are linear peptides while 90% are formed from discontinuous amino acids that create surface patches through the three dimensional (3D) conformation of proteins (18). We defined a novel way of describing discontinuous motifs, using virtual peptides, to represent B-cell epitopes and further used this representation to estimate potential cross-reactivity and neutralizing coverage of these nAbs.
Functional characterization of the increasing number of nAbs and known crystal structures of these nAbs complexed with HA proteins enables us to precisely define their B-cell epitopes. A large number of sequences of influenza variants are available in public databases (19) enabling systematic bioinformatics analysis of cross-reactivity of nAbs against influenza virus. Such systematic analysis improves our understanding of antibody/antigen interactions, facilitates mapping of the known universe of target antigens, and allows the prediction of cross-reactivity. These methods and tools are useful for the design of broadly protective vaccines against emerging pathogens. This article describes a study of influenza HA cross-reactivity, but the method is applicable to any viral pathogen where information about nAbs and a collection of variant sequences of the target antigen are available.

NEUTRALIZING ANTIBODIES AGAINST HEMAGGLUTININ
The names and specificities of nAb against influenza virus HA were collected from published papers. Twenty-two nAbs against influenza virus with crystal structures available in PDB were collected from published articles ( Table 1). Fifteen of these nAbs target at the globular head of HA, and for the other seven, the binding sites are located on HA stem region.
The nAbs in underlined italics are nAbs specific for strain A/X-31 (H3N2). The designation of two groups (Grp1 and Grp2) of influenza A virus subtypes are shown in  Representative sequences were selected for each subtype (34)  The majority of these nAbs were observed to bind or neutralize influenza A virus isolated either from Grp1 or Grp2. Antibodies FI6v3, CR9114, and 39.29 were shown to neutralize influenza strains within both Grp1 and Grp2 (5,14,27). Antibodies CR8059 and CR8071 (17) were the only two nAbs for influenza B virus. CR8059 is a light chain D95aN variant of CR8071. Since the mutation on CR8059 is not present at the binding interface and does not affect the binding, only CR8071 was used in the following study (17). The majority of these nAbs were shown to neutralize more than one strain, some of them are broadly neutralizing across subtypes of influenza A virus or lineages of influenza B virus. The Abs BH151, HC19, HC45, and HC63 were shown to specifically neutralize HA from the A/X-31(H3N2) strain. The available structures of nAb/HA complexes were downloaded from PDB (37).

VALIDATED INFLUENZA STRAINS BY NEUTRALIZING ANTIBODIES
Binding and neutralization assays were collected from published materials. Binding and non-binding strains were classified according to their affinity measurements. The thresholds used to discriminate binding and non-binding strains were inconsistent in different studies: the lowest affinity detectable values were set as 10 −4 M (17), 10 −5 M (33), and~10 −6 M (20). In some reports, nAbs showed positive binding results but did not display neutralization ability to the same strains [e.g., nAb CR9114 against strain B/Florida/4/2006 (Yamagata) (17)]. Because of the lack of standardized thresholds and ambiguous definition of binding, only . The structure is a HA trimer of three identical copies (one of them is colored as cyan and green; the other two are in gray). Each copy contains the HA1 (cyan) and HA2 (green) chain, also the heavy chain of F10 (red), the neutralized epitope is highlighted in pink; (B) Close-up view of neutralized epitope identified on the structure (highlighted as pink surface).

HC45
The nAbs are classified as cross-reactive or X-31-specific. For each binding region, a representative nAb was selected (shown in bold) and its B-cell epitope was mapped on the structures shown in Figure 3.
results that indicate non-binding of antibodies were considered as useful information and were retained for the subsequent analysis as negatives.
The HA sequences of strains that were experimentally validated for neutralization by studied antibodies ("validated strains") were retrieved from the literature. The influenza strains HA sequences were collected from the literature or, if absent, from the Influenza Knowledge Base (FLUKB) 1 . All experimentally validated strains were grouped into either neutralized strains or escape strains. The neutralized strains were selected based on reported experimental evidence. The escape strains included true escape strains as well as strains that were reported not to bind nAbs. We did not find any discrepancies in reported neutralizing properties across different studies used to collect functional data.

GENERATION OF MULTIPLE SEQUENCE ALIGNMENT OF HEMAGGLUTININ SEQUENCES
The HA sequences of influenza strains from FLUKB were aligned using the MAFFT tool (42). The resulting multiple sequence alignment (MSA) results provided a consistent numbering scheme for all the further analyses. MSA were generated for both experimentally validated strains of HA and for all entries from FLUKB. For each nAb, every HA sequence from the crystal structure and from the experimentally validated strains were searched individually within the FLUKB database to find a strain with highest similarity using BLAST (43). This procedure was done to ensure that residue position mapping in following steps is consistent with the numbering scheme.

IDENTIFICATION OF B-CELL EPITOPES
B-cell epitope were identified from antigen-antibody structure, using a formula with the combination of the measurements of accessible surface area (ASA) and atom distance. For each residue from HA antigen, the ASA value was calculated using Naccess software (44) for both free HA and for HA coupled with an antibody. Residues r i with ASA loss more than 20% were selected as epitope residues, Frontiers in Immunology | B Cell Biology The epitope residue positions of nine nAbs were mapped to the 1EO8 structure chain A. The symbol "+" indicates a contact epitope residue by corresponding nAb, and the symbol "−" means it is not a epitope position. 2D1, with a different epitope area to other eight nAbs, is labeled in red.
The majority of contacts between two contacting atoms occur at distance smaller than 5 Å separation (45). Euclidean distance was calculated between atoms a i and a j using their coordinates a i (x i , y i , z i ) and a j (x j , y j , z j ) in PDB structure data, Hemagglutinin residues r i whose minimum atom distance to the closest nAb atom was within 4 Å were also incorporated in the epitope. The minimal atom distance was defined as: d min = min d ij , a i ∈ antigen residue r i , a j ∈ antibody residue r j , The residues that satisfy either of these two conditions (ASA loss or minimum distance) are considered to constitute a B-cell epitope.
The specific residues on HA that form hydrogen bonds, salt bridges, disulfide bonds, and covalent bonds between the HA and nAb were considered to define a B-cell epitope. The antigen/antibody interaction was further analyzed using PISA tool (46). The analysis of HA structures showed that all the hydrogen bonds, salt bridges, disulfide bonds, and covalent bonds between HA and nAb in each studied structure were incorporated in B-cell epitopes defined in the previous step.

EXTRACTION OF DISCONTINUOUS MOTIFS FROM VALIDATED STRAINS
For each nAb, using the MSA result and the standardized numbering, the residue positions of B-cell epitope identified from the HA/antibody crystal structure were mapped onto all HA sequence of validated strains. Then discontinuous motifs composed of mapped residues were extracted from these sequences. These discontinuous motifs were classified as either "neutralized" or "escape" motifs according to the experimental validation status of the corresponding strain.

MAPPING OF DISCONTINUOUS MOTIFS TO HA SEQUENCE DATASET
For each nAb, based on the MSA result, the residue positions of Bcell epitope identified from the HA/antibody crystal structure were mapped onto the HA sequence dataset. A "discontinuous peptide" composed of amino acids that form B-cell epitope, in order that they appear in the sequence, was extracted from each HA sequence. By comparing the discontinuous peptides to all validated neutralized and escape motifs from experimentally validated strains, each discontinuous peptide was classified as neutralized (if 100% www.frontiersin.org matching a neutralized epitope motif), escape (if 100% matching an escape epitope motif), or non-validated (if 100% matching validation data are missing). The term "discontinuous motif " indicates positions that define each B-cell epitope extracted from experimentally validated strains collected from publications, while term "discontinuous peptide" represents specific B-cell epitopes extracted from the HA sequence dataset.

B-CELL EPITOPE REGIONS
For each nAb, the B-cell epitope was identified from the crystal structure as described in Section "Materials and Methods." The structure of nAb F10-H5 (13) and identified epitope are illustrated in Figure 2. After B-cell epitopes of all studied nAbs were mapped to the same template structure, the overlapping of binding sites were found among different nAbs, particularly at the receptorbinding site (RBS), which is the necessary structure for binding to the sialic acid receptors during virus infection.
For cross-reactive nAbs against influenza A virus, four major binding locations on HA structure are apparent: two of them reside on the globular head of HA and the other two target the stem region of HA (Table 2; Figure 3). The RBS is a heavily targeted area, with overlapping epitopes defined by eight nAbs. The only nAb that binds HA head but not the RBS is 2D1 (21). The 2D1 recognizes the Sa site of A/South Carolina/1/1918(H1N1). Sa site is one of the earliest known antigenic sites (47), which is proximal to the receptor-binding pocket. The detailed comparison of epitope residue positions between 2D1 and the other HA headtargeted nAbs are listed in Table 3. In contrast to the Abs that interact with the HA head, a series of nAbs recognize another highly conserved helical region in the membrane-proximal HA stem. The epitopes on F subdomain (CR6261, 39.29, etc.) and stem base (CR8020) are adjacent to each other, with a small number of shared residues. The only broadly nAb neutralizing influenza B virus, CR8071 binds to the lower region of the globular head of HA -the "head base" (Figure 3C). All the remaining antibodies analyzed in our study bind specifically the HA on A/X-31(H3N2) strain. All X-31 specific nAbs complex with the membrane-distal domain of HA. NAbs BH151 and HC45 (22) recognize a single epitope located at the base of the eight-stranded antiparallel βsheet structure. The HC19 binding site is adjacent to the RBS. The HC63 epitope shares several residues with HC19, thereby the antibody binding site overlaps the membrane-distal domains of two HA monomers.

EXPERIMENTALLY VALIDATED DISCONTINUOUS MOTIFS
Discontinuous motifs were extracted from the validated sequences as described in Section "Materials and Methods," and presented Frontiers in Immunology | B Cell Biology by WebLogo (48) and BlockLogo 2 [Ref. (49)]. WebLogo figures consist of stacks of amino acids, while the overall height of the stack indicates the sequence conservation at that position, and the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. While BlockLogo is a web-based application for visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from MSAs. The BlockLogo figures present the actual combinations of amino acids, and the height of each combination represents its relative frequency. In the nAb F10 as an example, the neutralized and escape discontinuous motifs are shown in Figures 4A,C (WebLogo  figures), and Figures 4B,D (BlockLogo figures). WebLogos show a clear overall description of each residue conservation difference between individual neutralized and escape motifs. For example, 44N, 48T, 304R/D, 380L/Y, 391N, 394E/A/L on F10 epitope region are likely to contribute to the escape strains. In the BlockLogo figures, specific neutralized and escape B-cell epitopes of F10 were listed with their frequencies, which can be used for their direct comparison.

ANALYSIS OF VARIATION OF DISCONTINUOUS PEPTIDES IN HA SEQUENCES DATASET
For each nAb, the residue positions of their B-cell epitopes were mapped on the complete HA sequences dataset collected from the FLUKB. Amino acid strings representing discontinuous peptides were extracted from the HA sequence of each strain. The variability of discontinuous peptides and validated discontinuous motif coverage were analyzed for each nAb.
For example, for the nAb F10, 589 different patterns of discontinuous peptides were generated among all 45,812 sequences in HA sequence dataset, using the F10 B-cell epitope identified from the crystal structure. In the next step, the discontinuous peptides were sorted according to their frequencies. The second most frequent peptide in FLUKB is identical an escape motif, while the 6th, 8th, and 19th are each identical to one of the neutralized motifs. However, the most frequent F10 discontinuous peptide in FLUKB (see text footnote 1) has not been experimentally tested (Figure 5), along with other 14 discontinuous peptides. The analysis of differences between the most frequent discontinuous peptide and neutralized or escape motifs was inconclusive. Therefore future experimental studies should include a representative sequence containing the discontinuous peptide HHVLSLPTVDGWLTQITVNI that is present in more than 10,000 entries in the FLUKB. We also recommend that motifs 1, 4, 5, 7, 9-18, and 20 are considered for the experimental validation. The remaining sequences are less common, each having <400 sequences in the data set.
The discontinuous peptides were generated and the variability was investigated for all cross-reactive nAbs ( Table 4). The B-cell epitope regions on the HA stem are less variable as compared to the epitopes on the HA head. The specific result generated within each subtype in HA sequence dataset show similar patterns as for all subtypes (data not shown). This conclusion is consistent with our previous knowledge that the globular head of HA1 has a higher mutation rate than the stem (29), making the stem a more conserved region for bnAbs targeting.
The motif coverage analysis within the 45,812 HA sequences was performed for all nAbs. For the nAbs with available crossreactivity data, the motif coverages were different between the nAbs targeting the HA globular head and those targeting the stem part. The nAbs that bind stem normally have higher neutralized motif coverage than those that bind the globular head (Figure 7). The motif coverage is shown as heat map for each subtype and each nAb (Figure 8). The nAbs (such as CR6261, CR9114, F10, and FI6v3) that target stem region are more cross-reactive -they cover more strains, and also more subtypes of influenza.

COMBINING OF NEUTRALIZING ANTIBODIES
For each sequence in the HA sequence dataset, 22 strings (discontinuous peptides) were extracted to represent 22 B-cell epitopes by all nAbs analyzed in this study. The majority (82.62%) of all strains in FLUKB have at least one discontinuous peptide that is identical to the validated neutralized motifs ( Table 5). A small number (2.25%) of sequences can be neutralized by as many as seven nAbs.
Here, we propose a combination of nAbs, where a small number of nAbs can cover a large proportion of influenza strains. The nAbs FI6v3, F10, CR9114, and CR8071 ( Figure 9A) were selected, and the neutralized coverage has increased from 18.91% (F10), 4.06% (CR8071), 43.89% (CR9114), and 58.44% (FI6v3) to 78.45% ( Figure 9B) when these antibodies were combined. These nAbs also covered most subtypes of influenza A virus and both lineages in influenza B virus ( Figure 9C).

DISCUSSION
This study presents an overview of binding specificities of reported nAbs, as well as an estimate of their neutralization and escape coverage (neutralization effectiveness) in more than 45,000 HA sequences available in FLUKB. The variety and frequency of discontinuous peptides within different B-cell epitopes have been analyzed in the HA data set. The results of the analysis of discontinuous peptides provide insights into further experimental design: strains with peptides that have high frequency among the strain populations should be given priority for experimental validation and their neutralizing status for specific nAbs.
Of note, additional sequence changes in HA outside the nAb epitope may result in either local or quaternary structural alterations that impacts antibody binding to the epitope per se.   Likewise, modification of glycosylation sites through sequence change may impact accessibility of antibodies to the neutralization site, creating discordance between sequence identity of binding site shown in BlockLogo and neutralization outcome between two strains of viruses sharing the same epitope sequence. The frequency of such occurrences will be important to determine. Neutralization assays of strains with discontinuous epitopes identical to validated B-cell epitopes will provide a proof of cross-neutralization. Since the experimental validation is time and money consuming, the introduction of extended B-cell epitope (see Supplementary Material) aims to help select representative sequences that differ in extended B-cell epitopes. For each proposed neutralizing or escape peptide (actual B-cell epitope), a small number of variants defined by changes in its environment www.frontiersin.org (extended B-cell epitopes) constitute the majority of strains with the proposed peptide. On the other hand, before more experimental data generated to fill the existing "non-validated gap," it will be meaningful to bring out some reasonable estimation. The assumption and methods in this paper are based on complete identity to discontinuous motifs on B-cell epitope (additionally extended B-cell epitope).

Frontiers in Immunology | B Cell Biology
To check the validity of this assumption, the similarity between discontinuous motifs and discontinuous peptides could be used to estimate and predict neutralization and binding results in the future. For example, a discontinuous peptide with mutated residues of similar feature to the neutralized motif would be considered as "possible neutralized peptide" against specific nAbs. These estimations could also be validated in experimental assays, and then be used to further experimental design iteratively. Frontiers in Immunology | B Cell Biology lymphocytic choriomeningitis virus and human immunodeficiency virus, among others. Insights from such bioinformatics analyses coupled with antibody antigenicity through crystallographic determinations will facilitate electronic neutralization profiling that can be tested empirically in subsequent laboratory neutralization assays.