Unveiling a Drift Resistant Cryptotope within Marburgvirus Nucleoprotein Recognized by Llama Single-Domain Antibodies

Marburg virus (MARV) is a highly lethal hemorrhagic fever virus that is increasingly re-emerging in Africa, has been imported to both Europe and the US, and is also a Tier 1 bioterror threat. As a negative sense RNA virus, MARV has error prone replication which can yield progeny capable of evading countermeasures. To evaluate this vulnerability, we sought to determine the epitopes of 4 llama single-domain antibodies (sdAbs or VHH) specific for nucleoprotein (NP), each capable of forming MARV monoclonal affinity reagent sandwich assays. Here, we show that all sdAb bound the C-terminal region of NP, which was produced recombinantly to derive X-ray crystal structures of the three best performing antibody-antigen complexes. The common epitope is a trio of alpha helices that form a novel asymmetric basin-like depression that accommodates each sdAb paratope via substantial complementarity-determining region (CDR) restructuring. Shared core contacts were complemented by unique accessory contacts on the sides and overlooks of the basin yielding very different approach routes for each sdAb to bind the antigen. The C-terminal region of MARV NP was unable to be crystallized alone and required engagement with sdAb to form crystals suggesting the antibodies acted as crystallization chaperones. While gross structural homology is apparent between the two most conserved helices of MARV and Ebolavirus, the positions and morphologies of the resulting basins were markedly different. Naturally occurring amino acid variations occurring in bat and human Marburgvirus strains all mapped to surfaces distant from the predicted sdAb contacts suggesting a vital role for the NP interface in virus replication. As an essential internal structural component potentially interfacing with a partner protein it is likely the C-terminal epitope remains hidden or “cryptic” until virion disruption occurs. Conservation of this epitope over 50 years of Marburgvirus evolution should make these sdAb useful foundations for diagnostics and therapeutics resistant to drift.

Marburg virus (MARV) is a highly lethal hemorrhagic fever virus that is increasingly re-emerging in Africa, has been imported to both Europe and the US, and is also a Tier 1 bioterror threat. As a negative sense RNA virus, MARV has error prone replication which can yield progeny capable of evading countermeasures. To evaluate this vulnerability, we sought to determine the epitopes of 4 llama single-domain antibodies (sdAbs or VHH) specific for nucleoprotein (NP), each capable of forming MARV monoclonal affinity reagent sandwich assays. Here, we show that all sdAb bound the C-terminal region of NP, which was produced recombinantly to derive X-ray crystal structures of the three best performing antibody-antigen complexes. The common epitope is a trio of alpha helices that form a novel asymmetric basin-like depression that accommodates each sdAb paratope via substantial complementarity-determining region (CDR) restructuring. Shared core contacts were complemented by unique accessory contacts on the sides and overlooks of the basin yielding very different approach routes for each sdAb to bind the antigen. The C-terminal region of MARV NP was unable to be crystallized alone and required engagement with sdAb to form crystals suggesting the antibodies acted as crystallization chaperones. While gross structural homology is apparent between the two most conserved helices of MARV and Ebolavirus, the positions and morphologies of the resulting basins were markedly different. Naturally occurring amino acid variations occurring in bat and human Marburgvirus strains all mapped to surfaces distant from the predicted sdAb contacts suggesting a vital role for the NP interface in virus replication. As an essential internal structural component potentially interfacing with a partner protein it is likely the C-terminal epitope remains hidden or "cryptic" until virion disruption occurs. Conservation of this epitope over 50 years of Marburgvirus evolution should make these sdAb useful foundations for diagnostics and therapeutics resistant to drift.
Keywords: filovirus, sdab, Vhh, nucleoprotein, crystallization chaperone, luciferase, Marburg, ebola inTrODUcTiOn Marburg virus (MARV) is a single-stranded, negative-sense RNA virus, which first emerged almost half a century ago in Europe to cause transmissible hemorrhagic fever in vaccine production staff handling African green monkey tissues imported from Uganda (1). Reservoiring in Egyptian fruit bats (Rousettus aegyptiacus), which are native to large regions of Africa (2), MARV Insight into MARV NP-sdAb Interaction Frontiers in Immunology | www.frontiersin.org October 2017 | Volume 8 | Article 1234 has re-emerged to spill over into human populations sporadically with increasing severity (3)(4)(5). With no approved vaccines or therapeutics available, though several in development (6), diagnosis, quarantine, and contact-tracing have been effective at containing outbreaks so far (5,7). However, as seen recently in West Africa with the related filovirus Ebolavirus (EBOV), outbreaks in highly mobile and populated areas can be difficult to extinguish, especially when combined with limited healthcare infrastructures (8).
Compared to other negative-strand RNA viruses such as influenza A, filoviruses appear relatively stable between different years and geographies, suggesting a high degree of adaptation to the reservoir host(s). However, where extensive human to human transmission has occurred across West Africa, mild viral evolution is apparent for EBOV with mutations improving viral fitness being recently defined (9). Though MARV outbreaks have so far been much smaller, with less extensive human to human transmission sometimes involving multiple separate spillovers, genomic variation has been observed in the largest outbreaks that occurred in Angola and Democratic Republic of Congo (DRC) (4). Nucleotide changes can impact the performance of sequence-based therapeutics (10) and diagnostics assays (11), making it imperative to keep such countermeasures up to date with currently circulating strains (12). Non-synonymous nucleotide changes can also alter the performance of virus protein-based therapeutics (10), especially those targeting the glycoprotein (GP), since it is the target for neutralizing antibodies generated by the host humoral response. Antibodies cloned from human survivors (13) and murine hybridomas (14) can all select escape mutants in vitro for MARV GP, which parallels the situation for EBOV as shown in vitro (15,16) and in vivo (17), indicating a high degree of epitope plasticity for GP. Though internal viral antigens are not known to be overtly subject to antibody based immune surveillance, they are subject to T-cell surveillance which can cause selection of T-cell epitope variants. Such variations along with enabling compensatory, stabilizing (18), and random mutations can impact sequence (19) and protein-targeted countermeasures.
With these factors in mind, a single monoclonal affinity reagent may at first appear risky as the foundation for longterm viral recognition. However, we postulate that carefully selected non-neutralizing binders to highly conserved motifs of internal antigens should virtually eliminate the chances of antibody reactivity being diminished by natural viral evolution. Previously, we had selected four unique sdAb specific to MARV by panning our semi-synthetic phage display library (20) on virus preparations at BSL-4 (21). Each sdAb recognized nucleoprotein (NP), a critical viral structural protein enveloping the RNA genome as part of the viral ribonucleocapsid (22) and also a vital component of the viral assembly (23) and replication machineries in concert with VP35, polymerase L (24), and VP30 (25). The sdAbs were capable of sensitive and specific recognition of MARV Musoke and Angola strains, plus the closely related Ravn virus (RAVV) in a monoclonal affinity reagent sandwich assay, where a single antibody acts as both captor and tracer against polyvalent antigen (26,27). Wishing to advance these sdAb further as diagnostic and transbody-based countermeasures, it was imperative that we find out precisely where and how they bind NP, to gauge the likely impact of MARV and RAVV evolution on the sdAb-NP recognition process. To pursue antibodies that are vulnerable to epitope erosion would be foolhardy in the long term, yet to identify antibodies that target completely conserved epitopes would be advantageous.
Here, using mutagenesis and X-ray crystallography, we determine the region of NP recognized by the sdAb and, in so doing, discover a novel tertiary structure of MARV NP. Elucidating this cryptic epitope or "cryptotope" allowed us to predict the likely impact bat and human MARV variation might have upon sdAb interactions, allowed us to compare and contrast local MARV and EBOV NP structures, and speculate on its natural role in viral replication. resUlTs anti-MarV sdab Bind the nP c-Terminus with nM ec 50

and Differing conformational sensitivities
Predicted amino acid sequences of the four anti-MARV sdAbs ( Figure 1A) reveal three unique families with sdAb C and D sharing complementarity-determining regions (CDRs) 1 and 3 and all four sdAbs sharing an aromatic residue in the middle of CDR3. Sandwich-based detection of Triton lysed virus employing each purified sdAb as captor and phage displayed sdAb tracer (21) was recapitulated on purified NP (Figures S1A,B in Supplementary Material). The trend shown in Figure 1B suggested that other MARV ribonucleoprotein components were unlikely to be involved in sdAb binding in this semi-quantitative polyvalent antigen capture format. In this assay, sdAb D was re-confirmed as the poorest performing clone and was only sparingly studied further since it was also a relatively poor expresser. That sdAb D shares CDRs 1 and 3 with sdAb C yet appears to bind poorly suggest framework region (FR) residues or CDR2 composition might be non-optimal. Prior to any structural work, we mutated the single aromatic residue of CDR3 of sdAb A-C to alanine, and purified mutant proteins ( Figure S2A in Supplementary Material) to explore the impact on binding NP since it is known that aromatic R-groups, especially in CDR3 perform critical antigen binding services (28). All three sdAb show diminished antigen binding when amino acid 100 was substituted for alanine ( Figure  S2B in Supplementary Material). Wild-type sdAb A and C are equivalent binders while sdAb B is a relatively poor performer in this format, where immobilized polyvalent NP captures monomeric sdAb which is then revealed with bivalent anti-His6 IgG horseradish peroxidase (HRP).
(F) The sdAb-gluc fusions were titrated over passively immobilized mbp-NP600 fusion protein to determine the monovalent EC50 values. Each titration was performed in duplicate wells with a negative control mbp only binding curve subtracted from each mbp-NP600 curve. The experiment was repeated three times and the plots represent the mean values with error bars ± SD. The EC50 values were determined for each curve and are shown in the legend for each sdAb-gluc fusion protein ± SD. (g) The nluc-NP600 fusions were titrated over oriented immobilized sdAb to determine the EC50 values in a reversed orientation. Each titration was performed in duplicate wells with a negative control nluc only binding curve subtracted from each nluc-NP600 curve. The experiment was repeated three times and the plots represent the mean values with error bars ± SD. The EC50 values were determined for each curve and are shown in the legend for each sdAb ± SD. Material) over immobilized polyvalent NP defined the EC50 values for each antibody in the low nanomolar range ( Figure 1C) though with no statistical difference (P-value > 0.05). Deletion mutagenesis of E. coli expressed NP followed by probing with bivalent sdAb-alkaline phosphatase (AP) fusion proteins to leverage avidity and precipitating chemiluminescent product, revealed loss of binding for all four sdAb when the last 95 amino acids were absent ( Figure 1D). Nineteen anti-Ebola virus sdAb previously isolated from the same phage displayed sdAb library using similar selections on four species of EBOV also recognized the NP C-terminal regions and performed as both captor and tracer (27), indicating a particularly attractive epitope for sdAb appears to reside here. The C-termini of both MARV and EBOV NP are known to be repetitively displayed along the ribonucleocapsid (22,32), explaining why our anti-MARV sdAb perform akin to our anti-Ebola sdAb in sandwich immunoassays, where polyvalent antigen enables one sdAb clone to serve as both captor and tracer.
When the last 95 amino acids of MARV Musoke NP were overexpressed and purified as an N-terminally His-tagged motif (termed NP600) (Figures S4A-C in Supplementary Material), the isolated antigen was able to be recognized by the three bivalent sdAb-AP tested by Western blot, though sdAb C was a relatively poor binder ( Figure 1E with original blots shown in Figure  S5 in Supplementary Material). Since passively immobilized NP600 was also a poor substrate for sdAb C glucibody (data not shown), we immobilized purified fusion proteins ( Figures  S6A,B in Supplementary Material) of maltose-binding protein (mbp) and NP600 to determine monovalent EC50 values for the sdAb as glucibodies ( Figure 1F). Single-domain antibody B glucibody was significantly different from both sdAb A and C glucibodies (P-value <0.05). To reconfirm these findings using solution phase NP600 antigen, we reversed the assay orientation by immobilizing sdAb via a single biotin acceptor peptide (BAP) on a neutravidin coat. The sdAb were probed with purified fusion proteins of NP600 to the C-terminus of nluc ( Figures S7A,B in Supplementary Material), another small monomeric luciferase heavily engineered for brightness (33). The nluc protein is highly soluble in the cytosol of E. coli, though in our hands is poorly secreted to the periplasm, making it an ideal fusion partner for NP600 which we were also unable to secrete efficiently. Titration of nluc-NP600 fusions over the oriented sdAb revealed EC50 values for each antibody on par as before ( Figure 1G), with sdAb B significantly different from sdAb A but not to sdAb C (P-value <0.05). The monovalent EC50 values for the sdAb in both assay formats were higher than when using polyvalent NP antigen as expected, yet the ranking of sdAb A followed by sdAb C and then sdAb B tended to be preserved.
Since linear peptide arrays representing the length of NP600 were unable to identify any reactivity with sdAb C (data not shown) and sdAb C reacted poorly with NP600 on Western blots indicated dependence on a conformational epitope. Classifying epitopes as conformational or non-conformational solely based on Western blotting is ill-advised as immunoblotted antigens can retain sufficient structural information for at least some binding by the sdAb (34). To define the epitope(s) further we chose X-ray crystallography, since it would also yield the structure for the MARV NP C-terminus which has so far proven elusive to tertiary structural assignment (35).

Difficulty generating crystals of sdab-antigen complexes Mirrored reliance on conformation
Both sdAb A and sdAb B were straightforward to crystallize alone, and in complex with NP600 simply by equilibrating an approximate 1:1 M ratios of the purified sdAb and NP600 proteins overnight before dispensing into crystallization screening experiments. However, sdAb C was highly refractory to crystallization alone and yielded a single polycrystalline cluster from thousands of screening trials. While further attempts to improve this crystal form were unsuccessful, the structure was determined and revealed the C-terminal His6 tag provided fortuitous packing interactions. We were unable to co-crystallize sdAb C with NP600 by simple equilibration after mixing or following size exclusion chromatography (SEC) of the complex. We, therefore, used a bait prey strategy to allow facile production and purification of large amounts of pre-formed antibody-antigen complexes. We removed the C-terminal His6 tag from sdAb C (prey), isolating it from crude osmotic shocks using partially purified His6-tagged NP600 (bait), and then employed immobilized metal affinity chromatography (IMAC) followed by SEC to purify the complex which yielded occasional, poorly diffracting small crystals. We repeated the strategy using a trimmed version of NP600 that begins at Trp632 (termed NP632), to avoid potentially flexible regions not visible in the sdAb A or B complex structures, resulting in pure sdAb C/NP632 complex ( Figure S8 in Supplementary Material). Within the first screen, two wells with small, irregularly shaped, poorly diffracting crystals were discovered that, upon further optimization, yielded crystals that diffracted satisfactorily.
The EC50 of sdAb A, B, and C glucibodies for mbp-NP632 were determined to be 15.4 ± 3.9; 189.1 ± 55; 22.3 ± 3.2 nM, respectively, while EC50 values of nluc-NP632 for sdAb were 12.8 ± 4.3; 28.5 ± 3.6; 26.6 ± 3.4 nM, respectively ( Figures S9A,B in Supplementary Material). In both cases, the sdAb B EC50 value was significantly different from those of sdAb A and sdAb C (P-value <0.05). The overall similarity between EC50 values determined using NP600 or NP632 suggests the first 31 amino acids of NP600 that are absent in NP632 are not critical for sdAb binding, though sdAb B exhibits variation depending on the assay format. We were unable to generate suitable crystals of NP600 alone, and NP632 proved somewhat insoluble unless produced as a fusion protein. To date, we have also been unable to generate crystals of mbp-NP600 or mbp-NP632, suggesting that our semi-synthetic sdAb had a chaperone effect on the ability of the C-terminal domain to crystallize, as seen previously for a protein refractory to crystallization by itself (36). X-ray diffraction data collection and statistics for the bound and unbound crystal structures for sdAb A-C are shown in Table 1.

sdab employ common and Unique approaches to engage the MarV nP c-Terminus
All three sdAb-NP complexes are shown in Figure 2A revealing the different approach angles used by the antibodies to interact with the MARV NP C-terminal domain with the pivotal CDR3 aromatic side chains shown in stick form. Unique VH and VL domains capable of binding the same epitope through overlapping but non-identical footprints resulting in different approach angles have been revealed to atomic resolution for broadly neutralizing IgG against viral envelope proteins of influenza A (37) and HIV-1 (38). Epitopes that can elicit a wide diversity of antibodies that are now able to be mined through various repertoire selections are dubbed supersites (39). A sdAb's eye view of our more modest NP bijousite is shown in Figure 2B in cartoon form where the main chains of the three NP C-termini overlay with one another within 0.4-0.7 Å RMSD for all NP structures in the crystallographic asymmetric units. The last 64 residues of NP visible in the crystal structures primarily consist of three alpha helices associating to form an upper V-like shelf of the two C-terminal most helices (arbitrarily named 1 and 2 counting back from Leu695), with the third descending between them to re-appear after a turn as beta sheet positioned under the C-terminus. Contact mapping analysis using the Weizmann server running part of the SPACE suite (40) identified NP residues potentially involved in binding each sdAb with different combinations of CDRs engaging the three helices ( Figure S10 in Supplementary Material). When side chains of all of the potential sdAb contacts are displayed as sticks on the epitope backbone ( Figure 2C), minor differences are apparent in the disposition of R-groups (e.g., Asn694 and Glu687), though the epitope appears fairly constrained. Electrostatic surface rendering ( Figure 2D) reveals an asymmetric basin-like depression between helices 1 and 2 with helix 3 forming the basin floor with a hydrophobic core of Leu676, Val691, and Met683 at the closed end, while Leu663, Leu695, and Tyr667 reside at the upper more open end. Single-domain antibodies are well known to target concave active sites of enzymes (41), recessed epitopes of parasite variant surface GPs (42), and canyons of virus particles (43), and it appears that the MARV NP C-terminal basin also constitutes such an attractive cryptotope. The basin overlook also offers potential for alternative modes of interaction with a crescent of negative charges (Glu675, Asp679, Asp682, Glu687, and Asp686) toward the closed end being noteworthy for salt bridge potential.
Figures 3A-C summarize the shape and charge complementarity between NP epitope and sdAb A, B, and C, respectively. Top is the sdAb's eye view of the NP C-terminus as electrostatic surface potential occupied by key hydrophobic paratope residues. The pivotal CDR3 aromatic residue of each sdAb appears nestled toward helix 2 Asp679 and close to Leu676 plus Val691 borne on helix 1 and Met683 on the turn between helices 1 and 2. sdAb A and B dispose Trp100 almost at right angles to each other while sdAb C employs Tyr100. Since Tyr100 of sdAb C is slightly more toward the open end of the basin, this allows Phe29 to engage Asp679, Met683, and Val691 toward the closed end. Secondary hydrophobic areas in the basin formed by Leu663, Tyr667, Leu695, and again Val691 afford suitable accommodation to Ile31, Trp55 of sdAb A, Gly101-103 of sdAb B, and Met102 plus Leu105 of sdAb C.
The electrostatic surface potentials of the undocked sdAb flipped 180° from binding NP (middle of Figures 3A-C) give an epitope's eye view of each paratope clearly showing the prominence of the hydrophobic CDR residues that engage the basin. Both sdAb A and C appear to exhibit the classical convex paratope, with relatively large contiguous regions of hydrophobicity, while sdAb B appears less pronounced. Differences in the number and distribution of positively charged paratope residues engaging the negatively charged basin overlook are apparent, and salt bridge and hydrogen bonding potential were revealed by PDBePISA analysis (44). While only Arg106 of sdAb A salt bridges Asp679, sdAb B employs Arg98, Arg50, and Arg58 to engage Glu675 plus Asp679, Asp679, and Asp682, respectively. While sdAb C also shares the Asp679 salt bridge route (with Arg30), this antibody is highly unusual in employing Lys1 of FR1 to engage Asp686 in a second salt bridge. Perhaps this alternative approach to binding may partially compensate for not employing CDR2, a feature only shared with one other sdAb to date (45). Amino acids Glu675, Asp679, and Glu687 are also involved in hydrogen bonding all three sdAb with Asp682 additionally H-bonding sdAb B. Hydrogen bonding potential is also predicted for Asn669 to sdAb A, Ser684 and Ala678 to sdAb B, His690 and Ser672 to sdAb A and C, and finally Tyr667 to sdAb B and C. The lower panels of Figures 3A-C show space-filling representations of all predicted paratope residues giving an indication of the potential breadth of interactions. Here, the different approach angles shown in Figure 1A are also reflected in the differential visibility of conserved framework areas. The distribution of paratope residues of sdAb C appears more concentrated than either sdAb A and B, resembling an oval focusing on the basin interior. Together with the absence of additional helix crosslinking mediated by CDR2 and Tyr100 as shown in Figure S10 in Supplementary Material, these deficits may help explain the conformational sensitivity of sdAb C. An additional view of the three sdAb docking is shown in Figure S11 in Supplementary Material. The diverse potential for protein-protein interactions within the MARV C-terminus appears striking, being leveraged by all three sdAb in both unique and overlapping ways, while still preserving the rule of hydrophobic core and hydrophilic surrounds for the complex (46,47).
Additional PDBePISA analysis of the crystal structures compares the antibody-antigen interfaces according to buried surface area, solvation free energy gain (Δ i G) from forming the interface, and the P-value of Δ i G which can be described as a value of interface specificity (a lower number <0.5 correlates with higher specificity). The buried surface area values are similar to 686, 653, and 663 Å 2 , respectively, for sdAb A, B, and C complexes. The interfaces have values for Δ i G and the P-value of −7.4 kcal/mol    (Figure 3 middle), the hydrophobic side chains destined to occupy the NP basin appear more diffuse in sdAb A and sdAb C, while sdAb B has equivalent density yet has a more glancing side-on CDR3 disposition. sdAb B might be in part due to the presence of only one large hydrophobic group in the basin accompanied by three small Gly side chains, while sdAb A and C have two aromatic side chains and bulkier hydrophobic Ile and Leu residues, respectively. When free and bound sdAb are compared (Figure 4), it becomes clear that each antibody still undergoes substantial restructuring as a means to improve antigen recognition (51). sdAb A exhibits a 180° flip for Trp100 and Trp55, with Ile31 also needing adjustment to present a more tightly knit array of hydrophobic side chains evident in the bound electrostatic surface shown in Figure 3. sdAb B CDR3 extends and flattens when bound to enable Trp100, Gly101-103, and Ile104 better access to the basin interior. Arg58 and, to a lesser extent, Arg50 at the landing and take-off sites of CDR2 also shift to reach their  Figure 1a. The amino acids prone to change are identified as sticks and labeled to reveal side chains that do not overlap the sdAb epitope, lying beneath and to one side of the domain. The electrostatic representation of the domain derived from the sdAb C complex is rendered on the right. salt-bridging partners on the basin overlook. sdAb C is unusual out of the three antibodies in that CDR3 appears to be a reasonable pre-existing fit already, with the majority of fitting occurring in CDR1. Here, the main chain undergoes an S curve reversal (i.e., S to S) to move Phe29 toward the basin with an ~11 Å maximal repositioning to displace the neighboring Thr28 which shifts by ~6 Å. The final position of Phe29 is almost a supporting role to Tyr100, but it does have modest contacts of its own. Amino acids Arg30 of CDR 1 and Lys1 of FR1 also move to meet their respective salt bridge partners on the overlooks with both having ~9 Å shifts. Comparison of free and bound forms of a highly unusual human broadly neutralizing Ab, capable of neutralizing all serotypes of influenza A, has recently been shown to exhibit dramatic CDR restructuring (52). The movements enable better accommodation of aromatic and hydrophobic residues within a hydrophobic groove of HA, with a key CDR3 Phe showing a ~5 Å shift. By virtue of having missing electron density in CDR3 of the free form, an anti-HIV gp120 immune llama sdAb capable of cross-clade neutralization may also employ restructuring to fit (53), though the bound form will be required to confirm this. It may well be that the potency of antibody repertoires for cryptic viral antigens not only relies on the total number of unique clones but also on the ability of the CDRs to accommodate such dramatic tertiary changes on transitioning from free and soluble forms to bound and potentially insoluble forms. . When residues prone to drifting are mapped on to the C-terminal structure, all reside on helix 3 or just beyond it with their side chains disposed away from the epitope (Figure 5B). The relaxed contact mapping analysis ( Figure S10 in Supplementary Material) also failed to predict these amino acids as involved in engaging the sdAb. We had previously shown that all four sdAb showed equivalent responses in sandwich capture of Triton-lysed RAVV when compared with Musoke and Angola viruses [ Figure 1 of Ref. (21)], showing experimentally that at least Asn654Ser alone did not appear to impact binding. Furthermore, any subtle impacts on affinity due to these mutations are likely to be overcome by avidity effects within the sandwich assay format as indicated by Note that while the V-shelf is at the extreme C-terminus of the MARV NP it is internal to the EBOV NP C-terminus. (B) Electrostatic surface potential reveals a much shallower and compact basin for EBOV. (c) Reduced basin width in EBOV is primarily due to Phe648 and Tyr652 from one helix stacking with Tyr667 from the other to form a wall-like structure that fills in the cavity as opposed to shorter side chains lining the MARV basin (cf. Figure 2D). The only other known anti-NP MARV antibody we are aware of that has been mapped to the MARV C-terminus is a mouse Mab shown by deletion mutagenesis to require amino acids 643-695 (55). Without structural information, it is difficult to assess exactly where and how this antibody binds, and whether it is likely to be impacted by MARV variation or not. It would be of great interest to compare and contrast the footprints of our sdAb with the conventional IgG, to determine if they share similar approaches to binding the NP C-terminus or not.

similarities and Differences between the MarV and eBOV c-Termini
That our sdAb epitope appears resistant to natural evolutionary variation suggests a critical function in viral replication such as interfacing with host proteins or other viral proteins. Such protein-protein interfaces are generally more conserved than non-interface surfaces (56) since mutations in one surface may require compensatory mutations in the other and will be less likely to occur. If the interface becomes part of the virion, as would occur if it was between two viral structural proteins, it will only be exposed upon virion dissociation (57). A 3D structural homology search using the Dali server (58) identified the C-terminal structures of Zaire (59), Bundibugyo, and Tai Forest (35) viruses as homologous to our MARV domain via the two last alpha helices. Perhaps surprisingly, overlaying the MARV and EBOV (Zaire) structures ( Figure 6A) reveals that the EBOV motif is not at the C-terminus but 66 residues upstream indicating there is plasticity in where the motif needs to be in order to function. Secondary structure prediction using JPred (60) was unable to identify the preceding residues as prone to alpha helix formation, suggesting that in EBOV the basin may well rely on just the V-shelf helices without a third helix forming the basin floor. Indeed, the EBOV basin is comparatively shallow ( Figure 6B) and smaller than MARV with a wall of stacked aromatic side chains between the helices occupying potential inter-helix space (Figure 6C). The more open end of the EBOV shallow basin appears to be across the axis of one of the helices between Ala664 and Val665 which create a dip rather than a route out over the Tyr667 of MARV (cf. Figure 2D). The basin overlooks of EBOV are not highly negatively charged with only Asp663 appearing to share a similar position to the Glu687 of MARV. The differences between MARV and EBOV motifs imply that if they do have similar roles in protein-protein interactions they may use alternative approaches to engage their particular partner protein(s). The differences also explain why our anti-MARV sdAb do not cross-react among the EBOV genus [ Figure 1 of Ref. (21)] since the shape and charge complementarities required for sdAb binding are absent.

DiscUssiOn
To our knowledge, our study represents the first high-resolution structural study of an antibody binding a filoviral NP. As such, the information can guide us through structure based design to improve the performance of the sdAb by focused in vitro evolution or educated mutagenesis. NP is an important biomarker for Marburg hemorrhagic fever, and high-end antibodies to conserved epitopes that may push the limits of detection toward nucleic acid test levels would be a significant step forward for point-of-care tests. The innate thermal stability of the sdAb format may make the resulting assays more suitable for resource poor environments where cold-chains are lacking. A mandate for conservation of the sdAb epitope, to play a vital role in viral FigUre 7 | (a) Marburg virus (MARV) VP40 appears to have loops that resemble complementarity-determining regions (CDRs). Loops in the region distal to the membrane binding patches (out of view) of the MARV VP40 dimer show a striking similarity to antibody CDRs, stemming from a scaffold crudely resembling frameworks. While one loop is visible in the crystal structure, the other is not which implies a flexibility that might be employed for restructuring. (B) A summary of our working hypothesis that the transition from disorder to order and vice versa within the nucleoprotein (NP) C-terminus is a molecular switch for virus assembly and disassembly by being able to host or release VP40.
replication, bodes well for its long-term utility in enabling sdAb to recognize MARV and RAVV strains yet to emerge.
While crystal structures of constructs bearing amino acids 19-370 (61) and 552-579 (25) of MARV NP have been resolved, the remaining C-terminal region has proved more challenging, existing as a molten globule (35). Herein, by engaging the MARV C-terminal region with sdAb we overcame this roadblock. While two of the sdAb performed well as crystallization chaperones, the third (sdAb C) required much optimization for success, suggesting the approach is still somewhat empirical. However, since we were unable to generate any crystals of NP600, NP632, or the fixed arm maltose binding fusion protein equivalents, trans-sdAb rather than cis-mbp chaperoning appeared essential for success in this case. While we cannot rule out contributions to crystal packing afforded by the hydrophilic surface of the sdAb, it is more likely their role was to reduce conformational heterogeneity (62) of the MARV C-terminus to allow crystals to form. We do not know the precise choreography that occurs when transitioning between free and bound sdAb, only the end-points. It could be that the sdAb architectures were encouraged to form a more focused hydrophobic apical core, around which the basin could form from the molten state and the overlooks could be subsequently crosslinked to "fix" the MARV C-terminus. Alternatively, the molten state may transition through a folded C-terminal structure, which was then selectively extracted by the sdAb over time. Since all of these recombinant fragments are highly productive and relatively small, it should be possible to further explore the contributions of induced fitting and conformational selection using biophysical techniques.
It is tempting to speculate that like EBOV (63), the MARV C-terminus engages VP40 matrix protein for virus particle assembly, resulting in a layer of matrix between the polyvalent NP of the ribonucleocapsid and the viral membrane (22,32). If we consider portions of the sdAb paratopes as mimics of VP40, much the same as some anti-influenza A virus broadly neutralizing antibodies can mimic portions of the influenza virus A HA receptor (64,65), the loops revealed in the crystal structure of the MARV VP40 dimer (66) could potentially play this role ( Figure 7A). The loops appear borne on scaffold-like structures that uncannily resemble CDRs borne on frameworks of antibodies. While one set of loops is visible in MARV VP40, there is missing electron density in the other set (Ser156, Thr157 and Ala71, Tyr72) indicating enough flexibility to undergo restructuring if required. Though it is impossible to draw definitive conclusions based on the structure of the complete MARV VP40 loop that is visible since it is involved in crystal packing, the occurrence of Phe, Thr, Tyr, and Arg residues may indicate involvement in protein-protein interaction since these residues are all highly favored at interfaces (47,56). The fit between VP40 and NP need not be perfect nor high affinity since the "unusual, flexible Velcrolike" interaction (22) when polyvalent nucleocapsids laterally meet VP40 lattices for assembly at the membrane (67) could capitalize on avidity. The NP C-terminus is regularly displayed on the outer face of the nucleocapsid several thousand times and would be an ideal candidate to be proximal to the loop regions of VP40. Furthermore, during disassembly following virus entry and fusion, a weak interaction between VP40 and NP would be preferable for rapid dissociation to enable the nucleocapsid to be delivered to the cytoplasm efficiently. The high prediction of disorder at the C-terminus of MARV (68) combined with prior observations of the molten globule with three alpha helices present (35) suggests that our current crystal structure may represent the more orderly end of a dynamic molecular switch for virus assembly and disassembly ( Figure 7B).

gluc-Based ec 50 Determination
The sequence encoding an E. coli codon-optimized Gaussia luciferase (gluc) gene within pUC19 from the NanoLight™ Technology website (Pinetop, AZ, USA) was used as the basis for designing overlapping oligonucleotides encoding the open reading frame plus a His6 sequence flanked by unique NcoI and HindIII compatible overlaps. Following kinasing, the oligonucleotides were heated and slowly cooled in Taq DNA ligase buffer, enzyme added and ligated to gel purified pecan22 from which a resident sdAb gene had been removed with NcoI and HindIII. A faithful clone was used to confirm active gluc enzyme could be expressed and purified at 500 mL scale as above and then the gene was re-engineered to enable insertion of recombinant antibody fragments. Hingeless sdAb A-D genes from pecan73 were subsequently inserted via NcoI and NotI to generate the pecan35 sdAb-gluc gene fusions. The resulting glucibodies were expressed and purified as for sdAb above within Tuner + pRARE. Recombinant NP of either MARV or negative control Bundibugyo Ebola in 100 µL of PBS at 1 μg mL −1 were used to coat duplicate wells of ELISA plates at 4°C overnight. Plates were washed three times with PBS and each well blocked to brimming with MPBS for 1 h. Wells were then probed with 100 µL of the gluc control or glucibody dilutions in MPBS for 1 h static. Probe was removed and plates washed three times with PBS containing 0.1% Tween-20 (PBST) and two times with PBS. Signals were developed with injection of coelenterazine (NanoLight™ Technology) in lucky buffer (10 mM Tris, 1 mM EDTA, 500 mM NaCl, pH 7.4) and collected using the luminometer with a 2 s integration. Duplicate wells of each dilution were averaged and the Bundibugyo NP signals subtracted from the MARV NP signals. The titrations were repeated twice with the final plots representing the mean of three experiments and the error bars representing ± SD. The EC50 y-value was calculated for each curve using the equation [RLUmin + (RLUmax − RLUmin)/2]. The corresponding x values were calculated using one observed point greater and one less than the y EC50 using the trend function in Excel and the three values averaged and presented ± SD. Statistical significance was determined using a paired two-sample Student's t-test with an alpha value of 0.05 within the Excel data analysis toolpak.
The malE gene from XL1-Blue was amplified to encode a modified N-terminus of MetLysIleHis6 (70) and a C-terminal fixed arm of Ala3 encoded by a NotI restriction site (71) and inserted into pE (see below) via NdeI and HindIII. An oligonucleotide bridge encoding Ala3GlySer was then inserted between NotI and HindIII sites to create a control maltose-binding protein (mbp) gene, while NP600 and NP632 were amplified and inserted between the NotI and HindIII sites to create the mbp-NP600 and mbp-NP632 fusion protein expression vectors. Proteins were expressed, purified, quantified, and analyzed by SDS-PAGE and then substituted for recombinant NP as immobilized antigen in the glucibody EC50 determination above. Signals on the mbp control protein were subtracted from the mbp-NP600 and mbp-NP632 signals and the experiments repeated three times to generate plots representing the means with error bars representing ± SD. Statistical significance was determined using a paired two-sample Student's t-test with an alpha value of 0.05 within the Excel data analysis toolpak.

nP Deletion Mutagenesis
Phagemid pecan42, a tac promoter-based vector harboring the MARV Musoke NP gene with a C-terminal His6 tag (21) was first used as a template for introducing an N-terminal FLAG tag by splice-overlap extension (SOE) PCR. Stepwise deletions of 100 amino acids (the C-terminal region was 95 amino acids) from the authentic NP initiation codon were then made using SOE-PCR. Clones were mobilized to Tuner + pRARE and 20 mL expression cultures used to generate lysates from 20 OD units in 2 mL tubes using a Mini-beadbeater 16 (Biospec Products). Lysates (10 µL) were Western blotted to Immobilon P (Millipore) for probing with anti-FLAG M1-HRP conjugate (Sigma), anti-His6-HRP (Sigma) or the hyperactive AP fusions of each sdAb from pecan16 described previously (21) at 100 nM in MTBS (where Tris-HCl replaces phosphate buffer). Signals were developed with Lumi-Phos WB (Thermo-Fisher) sufficiently for each clone to reveal as much signal as possible without blowout.

Production of nP600 for crystallization
Phagemid pE is a T7 promoter-based vector assembled from the high copy number backbone of pecan but bearing a T7 cassette assembled from overlapping oligonucleotides to enable high yield of DNA from mini-preps to afford facile sequencing and manipulation and high gene dosage for expression. The perfectly symmetrical lac operator (72) ensures tight regulation within expression hosts like BL21 (DE3) despite the high copy number. The MARV Musoke NP C-terminus was amplified from pecan42 MARV NP and inserted into pE such that a MetGlyHis6GlyGlyGlySer sequence preceded the NP sequence. 50  Cultures were centrifuged and the pellets drained of excess media and stored at −80°C until ready for beadbeating. Once thawed, the pellets were resuspended in 40 mL 1× IMAC plus a complete protease inhibitor tablet (Roche) and added to a 50 mL chamber filled halfway with 0.1 mm glass beads. The chamber was topped off with 1× IMAC buffer to remove any air bubbles and the cell/ bead mixture was blended on ice within a 4°C fridge for a total of 12 min with 2 min on and 2 min cooling on ice in between. Once contents settled, the cell debris was transferred to a 50 mL conical tube and centrifuged at 3,000 rpm for 15 min at 4°C (Beckman Allegra 6R, swing out). The supernatant was decanted into a new 50 mL tube and centrifuged at 9,500 rpm for 15 min at 4°C (Sorvall RC 6+, F13 FiberLite rotor). The supernatant was filtered through a 32 mm diameter 0.8/0.2 μm filter (Pall) and applied to a 5 mL HisTrapHP column equilibrated in 1× IMAC. Protein was eluted with a 0-500 mM imidazole gradient in 1× IMAC buffer. The fractions were pooled and dialyzed into 20 mM Tris-HCl pH 7.4, 5% glycerol and loaded onto a column (20 mL bed volume) of High-Performance Q-Sepharose resin (GE Healthcare) equilibrated in 20 mM Tris-HCl pH 7.4. The protein was eluted with a 0-500 mM sodium chloride gradient, pooled, and concentrated to 2 mL. The sample was further purified on a Superdex 75 16/60 column in 10 mM Tris pH 7.4, 150 mM NaCl. Protein was quantified by UV adsorption and analyzed by SDS-PAGE to access purity. For crystallography, preparations were diluted to 12 mg mL −1 , aliquoted and stored at −80°C.
Western blotting of tenfold dilutions of NP600 employed 100 nM of the sdAb-AP fusions in MTBS with each probed membrane subsequently aligned side-by-side for simultaneous development to ensure accurate comparison across the sdAb clones.

nluc-Based ec 50 Determination
A pE variant (pENCO1) was first engineered where the ATG start codon was within an NcoI site rather than an NdeI to allow genes coming from pelB leader constructs to be shuttled conveniently over. A synthetic gene encoding nluc based on the Promega website (Madison, WI, USA) with and without the single Cys had been explored for secretion capacity in pecan73 (26) (a tac promoter pelB leader vector) as a C-terminally His6-tagged motif and found very lacking. The nluc Cys minus gene was therefore mobilized from the periplasmic to the cytosolic system to create pENCO9 for control protein production. MARV Musoke NP600 and NP632 were separately fused to nluc using SOE-PCR such that the gene fusions sandwiched the His6 tag between the nluc and NP domains. Proteins were expressed, purified, and quantified as for NP600 except that the dramatic solubility enhancement afforded by the nluc fusions obviated the need for ion exchange. ELISA plates were coated overnight at 4°C with 100 µL of 1 μg mL −1 of neutravidin in PBS. Plates were washed three times with PBS and then blocked by filling to brimming with Bioplex buffer (2% bovine serum albumin, 0.05% Tween-20 in PBS) for 1 h. 100 µL of 100 nM sdAb as a BAP fusion purified from pecan126 as described above was applied to the well in Bioplex buffer for 1 h.
Wells were washed to brimming three times with PBST and two times with PBS. MPBS was added to the well to brimming for 1 h to further block the sdAb and then dilutions of nluc, nluc-NP600, or nluc-NP632 in MPBS were added to duplicate wells for 1 h. Following washing the same substrate and buffer as used for gluc was added to wells and signals captured as above. The experiment was repeated two more times and curves are the plots of three mean RLU of nluc-NP600 or nluc-NP632 minus the corresponding mean of the nluc alone with error bars representing SD. The EC50 values were determined from individual curves as above and statistical significance determined likewise.

Production of sdab for crystallization
Genes encoding sdAb A, B, and C were first mobilized to pecan73 using PCR to delete the flexible llama Ig hinges and fuse the His6 tag closer to FR4. Expressions and harvesting at 500 mL scale were initiated as above and the shockate was made to 100 mM NaCl, 10 mM imidazole, and 5% glucose and frozen at −80°C prior to purification. sdAb was captured using a 5 mL HiTrap sepharose column (GE Healthcare) charged with nickel and equilibrated with TIGS buffer (100 mM Tris-HCl pH 7.4, 100 mM NaCl, 10 mM imidazole, and 5% glycerol). Bound protein was washed with three column volumes of TIGS buffer and eluted with a 10-270 mM imidazole gradient over 18 column volumes, pooled and dialyzed into 50 mM sodium phosphate, pH 7.0 with 5% glycerol. The protein was further purified on a HiLoad 26/600 SP Sepharose column (GE Healthcare). Bound protein was eluted with a 0-500 mM sodium chloride gradient, pooled and concentrated to 1 mL via Centricon ultraconcentration. Final purification of the sdAb A and B samples were carried out with a HighLoad 16/60 Superdex 75 prep grade column (GE Healthcare) equilibrated in 10 mM Tris-HCl pH 7.4 while sdAb C required additional 150 mM NaCl to not precipitate. Complexes of sdAb A and B with NP600 were obtained by overnight equilibration of 1:1 mixtures.
Bait Prey strategy to generate sdab c/nP632 complex Splice-overlap extension PCR was used to re-amplify the sdAb C gene from pecan73 to delete an internal NcoI site and terminate the ORF immediately after FR4 with no His6 tag. The product was back inserted into pecan73 via NcoI and HindIII to create pecan219 sdAb C. The first 31 amino acids of the pE-NP600 construct were deleted by PCR and back cloning to create pE-NP632 which was used to drive expression of NP632 as for NP600 as above. Culture volumes (2 L) yielding approximately two wet weight pellets of 28 g were bead beated and each partially purified on the 5 mL HiTrap IMAC column and gradient eluted. The peak fractions were combined and applied to the Q-Sepharose column as before and then combined with osmotic shockate derived from 4 × 500 mL pecan219 sdAb C cultures made to 1× TIGS and the mixtures stirred at 4°C overnight. The complex was batch IMAC purified and eluted as for sdAb, and purified on the S75 16/600 column in 10 mM Tris pH 7.5, 150 mM NaCl. The final sample was concentrated to 2 mL, quantified by micro-BCA assay (12.8 mg mL −1 ) and evaluated for purity by SDS-PAGE. crystallization, structure Determination, and refinement