The Antibody Genetics of Multiple Sclerosis: Comparing Next-Generation Sequencing to Sanger Sequencing

We previously identified a distinct mutation pattern in the antibody genes of B cells isolated from cerebrospinal fluid (CSF) that can identify patients who have relapsing-remitting multiple sclerosis (RRMS) and patients with clinically isolated syndromes who will convert to RRMS. This antibody gene signature (AGS) was developed using Sanger sequencing of single B cells. While potentially helpful to patients, Sanger sequencing is not an assay that can be practically deployed in clinical settings. In order to provide AGS evaluations to patients as part of their diagnostic workup, we developed protocols to generate AGS scores using next-generation DNA sequencing (NGS) on CSF-derived cell pellets without the need to isolate single cells. This approach has the potential to increase the coverage of the B-cell population being analyzed, reduce the time needed to generate AGS scores, and may improve the overall performance of the AGS approach as a diagnostic test in the future. However, no investigations have focused on whether NGS-based repertoires will properly reflect antibody gene frequencies and somatic hypermutation patterns defined by Sanger sequencing. To address this issue, we isolated paired CSF samples from eight patients who either had MS or were at risk to develop MS. Here, we present data that antibody gene frequencies and somatic hypermutation patterns are similar in Sanger and NGS-based antibody repertoires from these paired CSF samples. In addition, AGS scores derived from the NGS database correctly identified the patients who initially had or subsequently converted to RRMS, with precision similar to that of the Sanger sequencing approach. Further investigation of the utility of the AGS in predicting conversion to MS using NGS-derived antibody repertoires in a larger cohort of patients is warranted.


INTRODUCTION
Diagnosing diseases that affect the central nervous system (CNS) is inherently challenging. Multiple sclerosis (MS) is an autoimmunemediated disease that exemplifies this challenge since clinicians must use multiple diagnostic tools to obtain the required evidence of dissemination of disease separated in time and space according to the current McDonald criteria (1). This includes radiological tests that detect lesions in the brain and spinal cord by magnetic resonance imaging (MRI) and is supported by biological tests that detect a unique pattern of oligoclonal banding (OCB) in the cerebrospinal fluid (CSF).
Due to the complexity associated with the current standard of care for MS diagnosis, patients who suffer an initial acute onset of "MS-like" symptoms [referred to as a clinically isolated syndrome (CIS)] often have to wait before a diagnosis of MS is confirmed and treatment is initiated (2). Steps to shorten this time frame are an urgent matter in the field, considering that patients have a better prognosis if treated early (3). Radiological testing (i.e., MRI) has been instrumental in the diagnosis of MS, but the most frequently used biological test that supports MS diagnosis is the OCB test, which has relatively low diagnostic specificity when comparing test performance for MS vs. other neuro-inflammatory diseases (about 61%) (4)(5)(6).
The standardization of the OCB test to support an MS diagnosis led many neuroimmunologists in the field to focus on determining the role of B cells and their antibodies on the pathogenesis of MS (7)(8)(9)(10)(11). Early work by our group and others demonstrated that CSF-derived B cells from MS patients and CIS patients that convert to MS undergo extensive clonal expansion, skewing toward heavy chains of the fourth family, and accumulate somatic hypermutations (SHM) at an advanced rate (12)(13)(14). These features of antibody genetics are suggestive of a hyperresponse to CNS antigens, but the targets of these CSF-derived B cells from MS patients remain elusive (15). More recently, however, our laboratory has discovered that the fourth family of heavy-chain antibody genes of CSF-derived B cells from MS patients accumulates replacement mutations at six codon positions more frequently than patients with other neurological diseases (OND) (16). B cells in MS lesions also display this pattern (17).

www.frontiersin.org
Using a custom algorithm to indicate the extent of mutation accumulation at these six codons in antibody gene repertoires, we developed a new biological test called the antibody gene signature (AGS), which demonstrated promise in a small pilot cohort in identifying patients who had one demyelinating event and who would convert to MS (16). However, these initial studies on the utility of AGS were based on Sanger sequencing, which is too laborious and expensive for routine use if this technology is developed as a clinical diagnostic test for MS in the future.
Next-generation DNA sequencing (NGS) might potentially provide a useful alternative in acquiring antibody gene repertoires to use for AGS calculations and is becoming routine in the field as evidenced by its commercial availability as a fee for service (Life Technologies, Illumina, and Seqwright among many others). The most common application of NGS to antibody genetics has focused on VDJ recombination gene selection for the purpose of analyzing lymphocyte clonality (18)(19)(20)(21), and is now being utilized in the MS field (22). Since gene and SHM distributions are at the core of antibody genetics analysis (as well as AGS scoring), careful scrutiny of this platform and its ability to properly represent the antibody gene repertoire is warranted.
Our primary goal was to provide confirmation that the antibody gene repertoires generated by NGS would sufficiently represent the CSF-derived B-cell pool from MS patients. The data presented here demonstrate for the first time that antibody gene repertoires from individual CSF-derived B cells from the CSF of MS patients and those at high risk to convert, generated by the gold standard Sanger method, are reliably reflected in NGS-generated antibody gene repertoires from paired CSF-derived B-cell pools of the same patients. Furthermore, we confirmed that AGS scoring, generated using a high-throughput NGS approach of pooled CSF cells, also identified MS patients and those that would convert to MS with the same accuracy as AGS scoring using Sanger DNA sequencing of individual CSF B cells. This NGS approach provides a new method for measuring the biological changes observed in MS patients and demonstrates its potential as a diagnostic tool.

MS/CIS PATIENT DESCRIPTION AND CSF SAMPLE PREPARATION
Cerebrospinal fluid B cells from six CIS and two relapsingremitting multiple sclerosis (RRMS) patients were used for this paired analysis. All CSF samples were collected in accordance with a protocol approved by the UT Southwestern Medical Center (UTSWMC) Institutional Review Board (IRB). CSF samples selected for comparative analysis were collected from eight patients who were either diagnosed with RRMS or CIS at the time of collection or who were subsequently diagnosed with RRMS ( Table 1). Single CD19+ B cells were sorted into individual wells of a 96well microtiter plate for single-cell Sanger DNA sequencing. At the same time, a pool of sorted CD19+ B cells from each patient was collected for NGS analysis. One pooled B-cell sample (C8) did not produce a detectable PCR product after nested PCR and thus was removed from the NGS cohort.

NEXT-GENERATION SEQUENCING CONTROLS
Naïve (CD19+ CD27−) and memory (CD19+ CD27+) peripheral blood B-cell pools were isolated from three healthy control samples and used as process controls to evaluate batch to batch variation and to aid in the evaluation of potential sequence errors generated during processing. Peripheral blood from healthy control donors was collected in blood tubes containing heparin as an anti-coagulant (BD, Franklin Lakes, NJ, USA). Peripheral blood mononuclear cells (PBMCs) were isolated by centrifugation through Ficoll-Paque (GE Healthcare, PA, USA). PBMCs were washed, counted, and stained before being used to isolate naïve and memory B cells as described previously (23). The naïve NGS sequences had average nucleotide mutation frequencies (MF) of 1.3% and average replacement mutation frequencies (RMF) of 1.4% for over 10,000 sequences, thus indicating low frequency of mutation errors due to PCR amplification and NGS sequencing. The memory NGS sequences had average MFs of 8.7% and average RMFs of 17.7% for roughly 3,700 sequences, which is similar to Sanger sequencing calculations (24). Previous work examining base-specific error rates identified a skewing toward the following order: A ≥ T > G > C (25) in sequences that had been PCR amplified prior to NGS. We also observe an overall increase in A and decrease in G mutations in the paired samples as expected from this earlier work, even though we used a different polymerase for PCR amplification (we used Phusion High-fidelity DNA polymerase from New England Biolabs, while Shao and colleagues used Hi Fidelity Platinum Taq from Invitrogen) ( Figure S1 in Supplementary Material).

SINGLE B-CELL RECEPTOR DATABASE GENERATION USING SANGER DNA SEQUENCING
Sanger fourth family of variable heavy-chain region (VH4) sequence databases were generated at UTSWMC by nested PCR of single-sorted CD19+ CSF B cells using degenerate PCR primers, Taq DNA Polymerase (Promega, Madison, WI, USA), and Sanger DNA sequencing as previously described (12,26,27).

PCR OF ANTIBODY GENES FROM CSF-DERIVED B-CELL POOLS
Details of this method are provided in the Supplementary Material. All PCR reactions were performed using Phusion High-fidelity DNA polymerase (New England Biolabs, Ipswich, MA, USA) to minimize amplification errors.

NEXT-GENERATION SEQUENCING OF CSF-DERIVED B-CELL POOLS
Details of this method are provided in the Supplementary Material. Sequencing was done on the 454 GS FLX DNA Sequencer using the 454 Titanium chemistry (Roche/454, Branford, CT, USA) according to the manufacturer's recommended protocols.

NGS 454 DATA PROCESSING
Details of this method are provided in the Supplementary Material. In total, we analyzed 212 Sanger-generated sequences from single B cells and 16,984 unique NGS-generated sequences. Sanger sequencing produced an average of 30 unique VH4 sequences per patient, although fewer than 20 sequences were obtained from three of the patients (C1, C4, and C7) ( Table 2). Although NGS sequencing produced an average of 2,426 unique VH4 sequences per patient, fewer than 1,000 sequences were obtained from two of the patients (C3 and C4) and one of these patients only yielded 14 unique VH4 sequences. The large number of unique sequences in Frontiers in Neurology | Multiple Sclerosis and Neuroimmunology  the NGS database relative to the number of B cells in the cell pellet is a consequence of the accumulation of PCR-and NGS-generated errors in the sequence database. Our focus here is to examine how well the sequence characteristics of the original patient template pools are maintained through NGS sequencing by comparing the patient's Sanger database.

MUTATION ANALYSES
Sequence and mutation information was available and calculated from Chothia codons 31-92 (28,29). This region includes complementarity determining region (CDR) 1 through framework regions (FR) 3 as originally defined by Kabat (30). Analyses were done for both nucleotide mutation frequency (MF) and amino acid RMF. CDR and FR region mutation data were obtained by separating mutations in CDR1 and CDR2 from those in FR2 and FR3 and normalizing based on the lengths of the specific region.
At the codon level, mutations were characterized as either replacement or silent mutations (RM or SM) and R:S ratios were calculated as RM divided by SM. AGS scores were calculated as previously described: they are the sum for each AGS codon (31b; 40; 56; 57; 81; 89) of [RMF at the AGS codon minus the average RMF (1.6) in a healthy control peripheral blood database divided by the standard deviation (0.9) of the average RMF of the same healthy control database] (16). Patients with AGS scores above 6.8 are identified as "RRMS."

STATISTICAL ANALYSES
VH4 and JH gene frequencies, mutated nucleotide frequencies, and AGS-contributing codon frequencies were grouped by platform and compared by Chi-squared analysis. MF, R:S ratios, and AGS scores were evaluated as patient-specific data points and their distributions between platforms were compared by Wilcoxon matched-pairs signed rank test. Statistical significance for all methods was attributed to p-values ≤0.05. Using the follow-up diagnosis as the basis for evaluation, specificity, sensitivity, and accuracy were calculated for OCB, Sanger AGS, and NGS AGS. Specificity was calculated as (no. of correct CIS assessments)/(no. of CIS samples); sensitivity was calculated as (no. of correct RRMS assessments)/(no. of RRMS samples); and accuracy was calculated as (no. of correct assessments)/(no. of samples).

RESULTS
Sanger sequencing has been the gold standard to define the antibody repertoires of patients with autoimmune diseases such as MS (26,(31)(32)(33)(34)(35)(36)(37). Such findings have provided necessary information to further our understanding on the role of B cells and their antibody products on the pathology of MS, the application of new targeting therapeutics, and the development of new diagnostic tools. NGS represents an advanced sequencing method to query even massive B-cell pools, and has already been applied to defining B-cell clonality in MS patients (22). However, it is critical to evaluate whether this new sequencing technology properly represents the unique www.frontiersin.org features that were previously established by Sanger sequencing for antibody genetics in B cells from the CSF of MS patients.
Thus, we compared the antibody gene repertoires generated from single CSF-derived B cells using Sanger sequencing and those generated from CSF B-cell pools using NGS in a cohort of MS/CIS patients. There were significant differences in the frequency of individual VH4 gene usage between the platforms, although the relative abundance of individual VH4 gene segments by rank was globally consistent (Figure 1A). In the comparison of the Sanger and NGS databases, VH4-30, VH4-34, and VH4-39 sequences show significant differences in abundance. VH4-39 was the most abundant gene segment in the Sanger database, but is the third most abundant gene segment in the NGS database. All the other VH4 gene segments remain in the same ranked order of abundance in both databases. The rank order of the VH4-b, VH4-4, and VH4-61 gene segments do not significantly vary between platforms. VH4-59 has a significant increase in NGS (15-24%; p = 0.004), which does not alter its rank. One noticeable difference is the lower abundance of long VH4 gene segments (VH4-30, VH4-31, VH4-39, and VH4-61) in the NGS database (23%) compared with the Sanger database (54%).
JH usage is important because skewing from the normal distribution of dominant JH4 usage (38) can be evidence of self-reactivity (39). JH4 remained the most abundant gene segment in both the Sanger and NGS databases (compare 38-40%; p = 0.53) and JH3 remained the fourth most abundant gene segment in both databases (compare 11-9%; p = 0.18) (Figure 1B). JH5 and JH6 were significantly decreased in the NGS database, whereas JH1 and JH2 were significantly increased and resulted in significant differences in frequencies of these four JH genes between the platforms.
Skewing of mutation frequency and/or placement of mutations in antibody genes from the CSF of MS patients is well established (12)(13)(14)26). It is important, therefore, that the identification of the mutation accumulation and distribution is similar regardless of the platform by which it was generated. With regard to the accumulation of mutations, the overall nucleotide MF for individual patients by Sanger and NGS were similar (5.4-7.1%; p = 0.16) (Figure 2A; Table S1 in Supplementary Material). The RMF was also consistent between platforms ( Figure 2B; Table S1 in Supplementary Material), again with a non-significant increase in NGS (9.7-12.5%; p = 0.11). With regard to the distribution of mutations, the MF and RMF were also appropriately highest in the CDRs, which are the antigen-contacting sites. The FRs, which are the structural support regions of the antibody genes, had relatively few MF and RMF accumulations as expected (Figures 2A,B). The replacement to silent mutation ratios (R:S ratios) in the CDR regions increase from patient to patient (average 4.4-7.3; p = 0.58) in the NGS platform, but without a significant trend emerging ( Figure 2B). The R:S ratios in the FR regions were not significantly altered across platforms (1.4-1.5; p = 0.94).
Antibody gene signature scoring by Sanger sequencing showed initial success on a pilot cohort in identifying MS patients or CIS patients who will convert to MS (16), which has been confirmed in larger sample cohorts (Figure 3A). To understand how antibody repertoire generation by NGS might affect AGS scoring calculations, we analyzed and compared the RMF at each codon   codons 31B and 81 were significantly decreased in comparison to the Sanger repertoires.
Despite these fluctuations in mutation distributions among the six AGS codons, we observed a non-significant change (14.9-12.2; p = 0.22) in the paired samples of the average AGS score with the NGS platform ( Figure 3C) (16). Two patients who have not yet received a confirmed RRMS diagnosis (patients C1 and C2) did not have consistent AGS scores between the Sanger and NGS databases ( Figure 3D). However, all of those patients who did have RRMS or converted to RRMS after sampling showed consistent classification of disease by both Sanger sequencing and NGS. In addition, the specificity (50%), sensitivity (100%), and accuracy (85.7%) of properly identifying patients that have MS or would convert to MS in the future in this small cohort was the same for NGS-based, Sanger-based, and OCB biological testing. However, the small size of the cohort precludes any conclusion regarding the utility of NGS-based AGS scoring as a viable diagnostic test.
Finally, to understand these fluctuations in AGS scores between the two platforms, we show the distribution of AGS codon RM frequency and how it affects AGS scores for three representative samples. For example, in the Sanger repertoire of patient C2, approximately 21% of all RMs are within the AGS codons ( Figure 3E) resulting in an AGS score of 13.07. In the NGS repertoire of this same patient, only 14% of all RMs are within the AGS codons resulting in a decreased AGS score of 4.43. Conversely, the NGS repertoire of patient C1 had an increased AGS score compared to the Sanger repertoire because of an increased percentage of RMs in AGS codons relative to all codons (compare 15% in Sanger vs. 22% in NGS). Patient C4 had similar percentages of RM in AGS codons on both platforms (26 vs. 25%), and thus had similar AGS scores on both platforms (17.90 vs. 17.55%).

DISCUSSION
Radiological testing to support MS diagnosis has excelled and is indispensable in the diagnosis of MS, whereas development of biological tests to support MS diagnosis has been more challenging. One type of biological testing that is on the horizon is next-generation DNA sequencing (NGS), which can be used to query the antibody genetics of even massive B-cell pools (18)(19)(20)(21)(22). Historically, this technology has been very successful in tracking minimal residual disease in cancer patients (18). More recently, the power of this technology has been used to demonstrate that focused B-cell clones in the CSF of MS patients are identifiable in the vast peripheral B-cell pools of the same patients (22). Thus, the use of NGS to pursue biological questions in MS has become a reality.
Our goal for this study was to advance beyond clonality queries and address whether the features of antibody genetics that we had observed in CSF-derived B cells from MS patients with regard to antibody gene distribution (i.e., skewing toward VH4 family usage) and somatic hypermutation accumulation (i.e., AGS) could be confirmed using this deeper sequencing method. This is important because NGS is now readily available commercially, and its possible limitations must be understood to best translate the information that we obtain from it. To do this, we compared paired antibody repertoires generated from single CSF-derived B www.frontiersin.org cells using Sanger sequencing and antibody repertoires generated from CSF B-cell pools using NGS. This is the first time that there has been a direct comparison of this new technology to Sanger sequencing, which is the gold standard in the field.
Overall, we found that NGS and Sanger sequence data were similar with regard to general mutational profiles but differed somewhat in the distribution of VH4 sub-family members recovered. Due to the similarity between the sequences of the VH4 sub-family gene segments, the divergence in VH4 distribution may be partially due to an increase in sequencing errors in the NGS database, the most common of which is insertion and deletion (indel) errors, particularly in regions that contain homopolymers or stretches containing two or more identical nucleotides (40). The reported frequency of indels generated by the Roche/454 platform is in the range of 3.8 to 5 × 10 −3 (41,42). Indels are easily detected by alignment of NGS-generated sequences to published VH4 sequences using the IMGT/High V-Quest tool (43). Since we remove all non-productive (with stop codons or frameshift mutations) or misaligned (<85% homology) antibody sequences, our NGS databases should contain very few sequences with indels. In order for a sequence with indels to pass our filters, they would have to contain multiple complementary indel events in close proximity -an extremely unlikely scenario. Nucleotide substitution errors can also occur (44), but we used a very high-fidelity DNA polymerase to generate our NGS-based antibody repertoires so that the MF between the Sanger and NGS databases would be similar.
All five patients who had or converted to RRMS were properly identified using our AGS biological test method by Sanger sequencing or NGS. There was some fluctuation in the AGS scores obtained for these paired samples between the two platforms, which could be due to a decreased representation in NGS of the long VH4 genes that contain codon 31b. The AGS scoring system is based on MF at six codons, which includes 31b. Thus, if genes containing 31b are not properly represented in the NGS repertoire database in comparison to Sanger database, a decrease in AGS scores would be a natural consequence. Despite these differences in Sanger and NGS repertoire generation, identification of MS patients or CIS patients that would convert to MS remained the same between the two platforms.
The two CIS patients who did not convert to RRMS at followup are representative of biological testing complications due to patient care received. CIS patient C1 was at high risk to develop MS. The Sanger-based AGS score was below the 6.8 cut-off point, but the NGS-based AGS score was above the cut-off point suggesting that this patient would convert to RRMS in the future. CIS patient C2 was OCB positive at the time of sampling, with a single brain lesion noted by MRI, and was thus considered at low risk to develop MS. The Sanger-based AGS score was above the 6.8 cut-off point, but the NGS-based AGS score was below the cut-off point. In both of these cases, the patients were placed on disease modifying therapy shortly after sampling, making it difficult to determine what the natural progression of their demyelinating event may have been. Of note, patient C5 who was on steroids at the time of sampling and converted to RRMS had an AGS score above the 6.8 cut-off point by both platforms.
This study suggests that the transition from single B-cell Sanger sequencing to high-throughput NGS of pooled B cells is feasible with the application of appropriate sequence filtering methods to efficiently remove sequences containing errors generated during sample processing and sequencing. The implementation of appropriate quality metrics to identify and remove as many processgenerated errors as possible will be critical for the successful use of NGS to better understand the antibody genetics of MS and for the future development of a clinically useful NGS diagnostic test based on the AGS scoring algorithm. These results will need to be confirmed in a larger cohort of patients using NGS-based antibody repertoire generation before consideration as a diagnostic tool can be made.

AUTHOR CONTRIBUTIONS
William H. Rounds participated in data processing and filtering, wrote the database comparison programs, performed the data analysis, and drafted the manuscript. Ann J. Ligocki collected the samples, generated the Sanger data, and participated in NGS database generation. Mikhail K. Levin carried out sequence alignment and processing. Benjamin M. Greenberg participated in patient recruitment, study design, and helped draft the manuscript. Douglas W. Bigwood participated in data processing and filtering, as well as data analysis and study design. Eric M. Eastman participated in study design, data analysis, and helped to draft the manuscript. Lindsay G. Cowell participated in study design, sequence alignment and processing, and data analysis. Nancy L. Monson conceived of and coordinated the study, participated in patient recruitment, study design, data analysis, and helped to draft the manuscript. All authors read and approved the final manuscript.