VectorMOD: Method for Bottom-Up Proteomic Characterization of rAAV Capsid Post-Translational Modifications and Vector Impurities

Progress in recombinant AAV gene therapy product and process development has advanced our understanding of the basic biology of this critical delivery vector. The discovery of rAAV capsid post-translational modifications (PTMs) has spurred interest in the field for detailed rAAV-specific methods for vector lot characterization by mass spectrometry given the unique challenges presented by this viral macromolecular complex. Recent concerns regarding immunogenic responses to systemically administered rAAV at high doses has highlighted the need for investigators to catalog and track potentially immunogenic vector lot components including capsid PTMs and PTMs on host cell protein impurities. Here we present a simple step-by-step guide for academic rAAV laboratories and Chemistry, Manufacturing and Control (CMC) groups in industry to perform an in-house or outsourced bottom-up mass spectrometry workflow to characterize capsid PTMs and process impurities.


INTRODUCTION
Recombinant adeno-associated virus (rAAV) is becoming the most widely used viral vector for gene delivery and genome editing, as it is naturally replication-incompetent, non-lytic, non-pathogenic, and largely non-immunogenic. It exhibits high transduction efficiency in nearly all tissues in vivo and can express payloads stably from unintegrated episomes in non-dividing tissues (1). Additionally, it can target integration in actively dividing tissues when homology arms are included in the payload construct (2). Despite decades of use in the clinic, new basic biology of this virus continues to be uncovered. In the last two years alone, we have seen two dogma-changing papers re-shape the AAV textbooks. First, despite the genome's notoriously small size, yet another new rAAV gene, MAAP, has been found using machine learning (3). MAAP codes for a protein that appears to be involved in rapid extracellular secretion from host cells during production. Second, we reported that during vector production, rAAV genomes are methylated and capsids acquire PTMs (4,5). These PTMs include acetylation, O-linked glycosylation, phosphorylation, and methylation (in addition to potential degradation products like deamidation and oxidation). Numerous groups have now also independently identified and validated our AAV capsid PTM discoveries (6,7).
It is known that PTMs on any therapeutic protein can elicit immune responses by inducing aggregates (8) (a known problem with high concentration rAAV), altering stability or functional activity, or by altering antigen processing and presentation. Despite historically being considered largely non-immunogenic, recent systemically-administered rAAVs at high doses have led to various unwanted immunogenic responses (9)(10)(11)(12). It remains unclear what critical quality attributes of rAAV vectors may be contributing to these effects; thus, additional product characterization and preclinical modeling is warranted. Among PTMs, glycosylation specifically can act as a strong modulator of immunogenicity (13). A concern for the rAAV manufacturing space is the potential risk for immunotoxicity from producing vector within insect cells like Spodoptera frugiperda in the baculovirus-Sf9 platform. Humans can have acute allergic responses to non-mammalian N-glycans (14), as well as any N-glycan with an a1,3-fucose or b1,2-xylose linkage on the basal N-acetyl glucosamine (GlcNAc), both of which are modifications found on insect glycoproteins (15). Thus, insect glycoprotein process impurities in baculovirus-Sf9 produced rAAV vector lots could pose potential immunogenicity risks. As we demonstrated previously (4,5), rAAV vectors produced in human cells are more potent than vectors produced in insect cells, allowing for lower, potentially safer, doses to be administered. Interestingly, while insect glycoforms can trigger negative human immune responses, conserved mammalian glycoforms (like those that would occur when producing vector in HEK293, HeLa, Vero, or BHK cells) generally enhance product solubility and reduce undesirable immune reactions and aggregation. Glycosylation can also shield potentially immunogenic protein epitopes from the immune system (16). The FDA has existing guidance on what protein PTMs they focus on when reviewing other therapeutic protein products (8,17,18), and while these do not yet apply to rAAV, they suggest where the agency may focus as we learn more about the impact these chemical modifications have on rAAV. At present, the FDA lists rAAV capsid PTM assessment as a recommended extended characterization assay (19). Thus, here we provide a detailed protocol for assessing rAAV vector lot PTMs on capsids and host protein impurities ( Figure 1) based on our years of developing these methods.

Proteolytic Digests of rAAV Samples for LC-MS/MS
Time Required: 2.5 Days 1. Determine the protein concentration of your rAAV samples.
We strongly recommend using the colorimetric bicinchoninic acid (BCA) assay as it is fast, accurate, and has the largest relevant dynamic range (20-2,000 µg/ml) suitable for AAV concentrations compared to other common assays like Bradford, Lowry, and NanoOrange. We prefer the Pierce BCA kit and recommend following the manufacturer's instructions for the 'Standard' assay rather than the 'Enhanced' assay for a more relevant working range of the bovine serum albumin controls. Read all assay results simultaneously on a microplate reader, rather than one-by-one on a spectrophotometer, given the rapid colorimetric change and large number of controls, samples, and replicates. *Stop point: you can keep your thawed AAV at 4°C until you are ready to continue. Although AAV is stable at 4°C short term, we recommend preparing your samples for digestion as soon as possible after thawing and performing the BCA assay.
2. The desired amount of total protein per rAAV sample is 50 mg, but we've successfully run samples with as little as 10 mg.
Whatever concentration you choose, if you are running multiple rAAV samples with the intention of comparing the results, prepare an equal amount of total protein for each sample and place each in a labeled protein low-binding 1.5 mL microcentrifuge tube.
3. Precipitate each rAAV sample in 4X the volume of LC-grade acetone overnight at -80°C with the tube wrapped in parafilm.
4. To separate out the precipitated rAAV proteins, centrifuge at 12,500 x g for 15 min at 4°C. Decant and discard the supernatant and dry the rAAV protein pellet within a chemical safety hood for 30 min with the tube lid open.

5.
Resuspend the dried pellet in 100 mL of 0.2% Protease Max surfactant trypsin enhancer*, 50 mM ammonium bicarbonate, and reduce the disulfide bonds with dithiothreitol to a final concentration of 10 mM at 55°C for 30 min in a thermomixer. This step loosens the protein secondary structure to make the full-length polypeptide accessible for enzymolysis in the next step. * Note: Protease Max surfactant helps solubilize proteins. This will help your AAV stay in solution, but it is not strictly necessary for digestion, should it be difficult to acquire. 7. To digest full-length rAAV polypeptides into short peptides, add 500 ng of Trypsin, wrap the tubes in parafilm to prevent evaporation, and allow the solution to digest overnight for 18 h at 37°C in a thermomixer. Other digestion enzymes can be used in place of or in combination with Trypsin for varied digestion specificity (e.g. Chymotrypsin, etc.).
8. To stop the protease activity, add 1% total volume formic acid to the reaction in a chemical fume hood. *CAUTION: Formic acid is a harmful chemical and should be handled with appropriate safety precautions in a chemical fume hood. When handling formic acid at any concentration, wear appropriate PPE including a lab coat, goggles/face shield, closed toe shoes and gloves. Dispose of materials that contact formic acid in a labeled solid waste container (weigh boats, pipette tips, etc.). Store formic acid at 4°C. *Stop point: you can keep your quenched AAV peptides at -20°C for long term storage or 4°C for clean up the next day. 9. Purify the digested peptides by running the solution through a C18 MonoSpin SPE column, and then dry the peptides to completion in a speed vac. The speed vac time needed will vary with the elution volume. For example, a 300 mL solution with 80% acetonitrile can take 4-6 h.

Liquid Chromatography Tandem Mass Spectrometry
Time Required: 1.5 h 1. Reconstitute your dried rAAV peptides in 0.1% formic acid in ultrapure water and then inject onto an HPLC system such as a Dionex Ultimate 3000. *CAUTION: Formic acid is a harmful chemical and should be handled with appropriate safety precautions in a chemical fume hood. When handling formic acid at any concentration, wear appropriate PPE including a lab coat, goggles/face shield, closed toe shoes and gloves. Dispose of materials that contact formic acid in a labeled solid waste container (eppendorf tubes, pipette tips, etc.).
*CAUTION: Acetonitrile is a highly dangerous chemical that is flammable in both liquid and vapor phases and can ignite with moist air or water. Is harmful if swallowed, inhaled, or absorbed through the skin. May cause skin and respiratory tract irritation. It is metabolized to cyanide in the body, which may cause headache, dizziness, weakness, unconsciousness, convulsions, coma and possibly death. Use only in explosion-proof chemical fume hoods equipped with proper grounding procedures to avoid static electricity.

We recommend running rAAV samples on a Thermo Fisher
Fusion™ Tribrid™ Series or Q-Exactive™ mass spectrometer to improve proteome coverage, detection limits, peptide spectra acquisition, and identification rates.
Unlike a linear hybrid model, which combines an ion trap and an Orbitrap, a parallelized tribrid combines a quadrupole, Orbitrap, and linear ion trap. We prefer the Orbitrap Fusion™ Tribrid™ or the Orbitrap Fusion™ Lumos™ Tribrid™ to acquire PTM data. Set up the instrument to acquire data in a dependent fashion using higher-energy collisional dissociation (HCD). If you are going to be analyzing labile modifications such as phosphorylation or glycosylation, we recommend also using electron-transfer dissociation (ETD) (20). The instrument parameters we suggest are as described below.
3. Generate HCD data with the precursor mass resolution set to 60,000 at full width at half maximum 400 m/z, a mass range of 350-1,500 m/z, and sample charge states 2-6. Set the precursor automated gain control (AGC) settings to 3e5 ions, and use the "fastest" mode in the MS/MS ion trap. Set the isolation window for HCD to 1.6 Da and the collision energy to 30. Enable dynamic exclusion with a repeat count of 3, repeat duration of 10 sec, and an exclusion duration of 10 sec. MS2 spectra should be generated at top speed for 3 sec.  0  3  3  3  93  35  103  42  104  98  109  98  110  3  140  3 8. Following your final rAAV sample, re-run your calibration standards as before (see Step #1) to ensure the instrument is still within tolerance at the end of your rAAV sample run.

ANTICIPATED RESULTS
Mass Spectra Data Analysis Using Byonic  (21,22) that typically originate from the user (e.g. keratin from hair and skin), and from common reagents (e.g. Trypsin) used in sample preparation.
2. Search your raw files with 10-12 ppm mass tolerances for precursor mass ions, with 10-12 ppm or 0.1-0.4 Da fragment mass tolerances for HCD and ETD fragmentation, respectively. Allow up to two missed cleavages per peptide and semi-specific, N-ragged tryptic digestion. Use a 1% false discovery rate using standard reverse-decoy techniques. Methionine oxidation (common 2), asparagine deamidation (common 2), and N-term acetylation (rare 1) should be set as variable modifications with a total common max of 3, rare max of 1. Depending on the goals of your experiment, you may also want to add phosphorylation, methylation, acetylation, and/or glycosylation as variable modifications. Note that the search time will increase exponentially with additional modifications, so it may be advantageous to search these modifications separately.
3. After the search is complete, open the.RAW file in the ProteinMetrics Preview software to view the trypsin digestion efficiency (Figure 2). The resulting identified peptide spectral matches and assigned proteins should then be exported for further analysis and validated using custom tools to provide visualization and statistical characterization.
4. PTM mass spectra should be manually validated by an expert. Do not simply trust PTM reports produced by the software. If you are not skilled in MS analysis of PTMs, seek help from an expert. Here we refer readers to an excellent manuscript summarizing common errors (23) we've seen among groups incorrectly interpreting PTMs in rAAV vector lots: a. Assigning the wrong PTM to a peptide b. Assigning the correct PTM to the wrong protein c. Assigning a PTM to the wrong residue on a correctly identified peptide d. Missing modified peptides because of a flawed database search strategy De Novo Glycan Identification 1. All potential glycopeptide sequences should be validated by de novo manual interpretation of HCD and ETD mass spectra. For a thorough guide on this subject, please see Malaker et al. (24). Briefl y, generate extracted chromatograms for all MS2 spectra containing the "HexNAc fingerprint," which consists of a 204.0867 m/z ion and 5 additional fragment ions.
2. First, use the HCD spectrum containing the HexNAc fingerprint to identify glycan structures. Distinguish whether the glycopeptide is modified by an N-or O-glycan by analyzing the ratio of 138 m/z to 144 m/z ions ( Figure 3). Then, calculate the intact mass of the glycopeptide using the high-resolution MS1 spectrum. From the intact mass, you will see sequential glycan losses that will lead to the largest peak in the spectrum. For N-glycopeptides, this will be the mass of the peptide modified by one HexNAc. For O-glycopeptides, this will be the mass of the naked peptide backbone. From this, you can calculate the glycan monosaccharide composition. Finally, sequence the peptide backbone using techniques described in detail elsewhere (25). 3. Next, use the ETD spectrum to site-localize glycan modifications. This is especially important for Oglycopeptides, due to the labile nature of this modification. A detailed tutorial for manual interpretation of ETD spectra is available here (26).

TROUBLESHOOTING
1. Your purified input rAAV is not sufficiently concentrated to achieve 10-50 mg in 50 mL volume.
If your purified rAAV production has a low titer, below 1e12 vg/mL, you will likely not have sufficient rAAV capsid protein concentration for the initial digestion and subsequent steps. You can concentrate your rAAV vector with an Amicon™ 50 kDa spin column. We recommend a 20-30 A B  minute spin at 1,500 x g for~10-20X concentration as suggested by the manufacturer for purified AAV (27). You will need to remeasure your protein concentration by BCA assay after completing the spin concentration, do not assume you will get a certain fold improvement. Make sure to use the 50 kDa Amicon™ columns for optimal rAAV retention as intact rAAV capsids are 20-24 nm in diameter (28) 3. AAV sample degradation from freeze/thaw. For long term native AAV storage, samples should be kept at -80°C, however digested rAAV peptides can be stored at -80°C. This will prevent rAAV VP1/2/3 peptide degradation and help to retain labile PTMs. For short-term storage once thawed, samples can be kept at 4°C; however, it is important to avoid repeated freeze/thaw cycles, as the extreme temperature fluctuation can lead to more rapid sample degradation. We typically avoid more than 1-2 freeze/thaw cycles per sample prior to analysis.

Contaminating peptides obscuring your results.
Perhaps one of the biggest complicating factors in analyzing rAAV samples is the potential presence of contaminating peptides in your sample. Notably, these peptides might also have PTMs that can interfere with the analysis of your rAAV sample. Contaminant peptides can come from a variety of sources, including sample handling, host cell impurities, and vector purification inefficiencies. Therefore, it is crucial to manually validate all search results to ensure the correct sitelocalization of PTMs on the proper protein. We recommend searching for not only the sequence of interest (i.e. rAAV of the appropriate serotype), but also the proteomes of potentially contaminating species, such as Sf9 or other sources involved in vector production. In our hands, upon manual validation we found several hundred Nglycopeptides from ferritin proteins that were initially assigned as N-glycopeptides of rAAV. This underscores the importance of carefully manually validating automated search results by the methods outlined above.

5.
If you think your sample has glycopeptides but you are having trouble detecting them.
Following the reduction of disulfide bonds in Step 6, treat with Endo-H. Heat your reduced rAAV sample to 95°C for 5 min, then briefly chill on ice to reduce the temperature and add 5 mL of the supplied Endo-H reaction buffer and 5 mL Endo-H. Deglycosylate for 4 h at 37°C in a thermomixer. *Note: Endo-H is a recombinant glycosidase which hydrolyses the bond connecting the two GlcNAc groups modifying Asn within the chitobiose core, leaving a single GlcNAc covalently bound to Asn for mass spectrometry detection.

Sample analysis challenges and limitations.
With all PTM assignments using LC-MS/MS, it is important to remember that absence of a peptide with a PTM does not mean that no PTM was present. Labile modifications like phosphorylation and glycosylation can be lost during sample preparation or ionization prior to detection. It is also important to note that quantifying PTM frequency is challenging. In many cases, PTMs are not sitelocalizable, which can dramatically alter the accuracy of quantitative site occupancy evaluations. Additionally, different peptides from an individual protein can differ in cleavage efficiency.

DISCUSSION
Preclinical investigators and existing clinical stage gene therapy companies should catalog and track potentially immunogenic vector lot components. These include capsid PTMs and PTMs on host cell protein impurities, especially given the recent concerns around immunogenicity with systemic high dose rAAV. The FDA currently lists rAAV capsid PTM assessment as a recommended extended characterization assay (19). The methods detailed above provide a powerful platform to easily interrogate the proteomic and PTM landscapes of rAAV vectors. As improvements to both vector process/product development and analysis by mass spectrometry continue, the quality, complexity, and utility of these data will continue to grow. These methods can be extended to other viral gene therapy vectors by changing the appropriate search parameters and optimizing the initial proteolytic digest conditions (for example, enveloped viruses will need a detergent step to remove the envelope to allow proteases to digest the capsid proteins). While we regularly detect common PTMs on rAAV capsids and host impurities (N-terminal start methionine acetylation, serine/threonine/tyrosine phosphorylation, lysine acetylation, arginine methylation, O-linked glycosylation, and asparagine deamidation), other modifications, if present, can also be detected with this method.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: exemplary datasets analyzed for this method are available from the corresponding authors on reasonable request. Requests to access these datasets should be directed to NKP, nicole.paulk@ucsf.edu.

AUTHOR CONTRIBUTIONS
NKP conceived the study. NGR, SAM, and NKP designed experiments. NGR, SAM, and NKP generated reagents, protocols, performed experiments, and analyzed data. NGR, SAM, and NKP wrote the manuscript. SAM and NKP generated the figures. All authors contributed to the article and approved the submitted version.