Maturation of the Human Immunoglobulin Heavy Chain Repertoire With Age

B cells play a central role in adaptive immune processes, mainly through the production of antibodies. The maturation of the B cell system with age is poorly studied. We extensively investigated age-related alterations of naïve and antigen-experienced immunoglobulin heavy chain (IgH) repertoires. The most significant changes were observed in the first 10 years of life, and were characterized by altered immunoglobulin gene usage and an increased frequency of mutated antibodies structurally diverging from their germline precursors. Older age was associated with an increased usage of downstream IgH constant region genes and fewer antibodies with self-reactive properties. As mutations accumulated with age, the frequency of germline-encoded self-reactive antibodies decreased, indicating a possible beneficial role of self-reactive B cells in the developing immune system. Our results suggest a continuous process of change through childhood across a broad range of parameters characterizing IgH repertoires and stress the importance of using well-selected, age-appropriate controls in IgH studies.


INTRODUCTION
B cells play a central role in physiological adaptive immune processes and exert their main effector function through production of antibodies (1). B cells also contribute to the pathogenesis of autoimmune disease via generation of auto-reactive antibodies and modulation of T cell responses (2,3). The heavy and light chains of the B cell receptor (BCR) are generated in the bone marrow by recombining individual variable (V), diversity (D), and joining (J) genes through a process called VDJ recombination. Upon antigen recognition, immunoglobulin heavy (IgH) and light chains of a BCR are further diversified through rounds of somatic hypermutation (SHM) leading to affinity maturation whereby B cells with improved antigen-binding properties are selected in the germinal center. Class switch recombination (CSR) is also initiated following antigen encounter, causing a change in the IgH constant region of the BCR and in its effector function.
Detailed characterization of B cells and their respective BCR sequences offers important information on B cell generation and selection as well as immune competence in health and disease. High-throughput sequencing of antibody genes (AIRR-seq) has become a widely used tool in human translational research (4,5). Abnormal B cell responses can be explored by investigating IgH repertoires from patients and comparing their characteristics to those of healthy controls. The limited data already available suggest that significant changes occur in the properties of IgH repertoires with age (6). It is therefore important to establish robust data on normal IgH repertoires within sufficiently narrow age-bands to fully understand the process of IgH maturation. This will facilitate the use of AIRR-seq to understand changes of relevance to childhood disease. Given the high burden of infectious diseases in childhood and the importance of effective immune response to vaccines to prevent infection, pediatric individuals constitute an important group from which to have normative data. There are very few studies that have used AIRRseq to investigate the healthy IgH repertoire, and these studies include a limited age range of participants (7)(8)(9)(10). In a more detailed study, Ijspeert et al. reported on the antigen-experienced (i.e., IgA and IgG) IgH repertoires of 38 healthy control (HC) samples with their ages ranging from newborn to 74 years (11). The authors found several characteristics of the studied IgH repertoire varying with age and identified patterns that are specific for isotype subclasses. However, their study was limited by the number of samples from children, the low depth of sequencing, and the small number of B cell subsets analyzed.
We aimed to assess in detail the naïve and antigenexperienced IgH repertoires in children and young adults using isotype-resolved barcoded RNA-based AIRR-seq technology and extensive bioinformatic analysis. This approach allowed us to comprehensively address the age effect on the IgH repertoire in healthy individuals and also provides a robust data set that can serve as a future reference for studying IgH repertoires in children as well as young adults with disease.

Study Participants and Cell Isolation
Healthy individuals who did not have an immunologically relevant disease or a current infection were recruited to the study. Written informed consent was obtained from study participants or their legal guardians including any potentially identifiable data included in this article under ethical approval (KEK-ZH 2015-0555 and EKNZ 2015-187). Blood samples (5-9 mL) were collected at a single time point from 53 healthy participants aged 6 months to 50 years (Supplementary Table 1). Peripheral blood mononuclear cells (PBMC) were isolated by centrifugation of PBS-diluted blood over Ficoll-Paque Plus (Sigma-Aldrich). Either PBMC or B cells magnetically sorted using the AutoMACS Pro cell separator and CD19+ microbeads (both Miltenyi Biotec), were lysed in RLT buffer (Qiagen), snap frozen on dry ice and then stored at −80 • C prior to use. Cells were counted using an optical microscope and an improved Neubauer chamber. The B cell number was recorded based on actual counts or estimated using PBMC counts and either B cell frequencies from flow cytometry performed on the same blood sample or the median percentage of age-dependent reference values (12) if the former was not available. No B cell subpopulations were isolated.

RNA Isolation and Library Preparation
RNA was extracted from stored samples using the RNeasy Mini Kit (Qiagen). Reverse transcription was performed using SuperScript III/IV (Invitrogen) according to the manufacturer's instructions and IgH constant region primers that included 14 nt unique molecular identifiers (UMI), and partial p7 adaptors. Two reverse transcription reactions were carried out for each sample: one with a mix of IgM and IgD-specific reverse primers and another with a mix of IgA, IgG, and IgE-specific reverse primers. For 6 samples, one mix with all constant region primers was used in a single reaction. Primer sequences with concentrations are included in Supplementary Table 2. IgH gene rearrangements were amplified in a two-round multiplex PCR; the first round using a mix of FR1 V family specific forward primers with partial p5 adaptors, and the second round to complete the adaptor sequences. PCR conditions for the first round were 95 • C for 5 min, either 8 cycles (IgD/IgM) or 12 cycles (IgA/E/G) of 98 • C for 20 s, 60 • C for 45 s and 72 • C for 1 min, and 72 • C for 5 min. The PCR conditions for the second round were 95 • C for 5 min, 22 cycles of 98 • C for 20 s, 69 • C for 20 s and 72 • C for 15 s, and 72 • C for 5 min. PCR amplicons were gel-extracted, purified and quantified using the Illumina qPCR library quantification protocol. Individual libraries were normalized based on concentration and then multiplexed in batches of 24 for sequencing on the Illumina MiSeq platform (2 × 300 bp paired-end chemistry), offering a read length to sequence far enough into the IgH constant region to allow accurate distinction between isotype subclasses.

Sequence Processing, Annotation, and Somatic Hypermutation
Samples were demultiplexed via their Illumina indices, and initially processed using the Immcantation toolkit (13,14). Briefly, raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI (i.e., originating from the same mRNA molecule) were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed resulting in a dataset containing a set of unique sequences per sample and isotype. Sequences were then submitted to IgBlast (15) for VDJ assignment and sequence annotation using the IMGT germline database as a reference, and unproductive sequences were removed. Constant region sequences were mapped to germline using Stampy (16) for isotype (subclass) annotation, and only sequences with a defined constant region were kept for further analysis. The number and type of V gene mutations was calculated using the shazam R package (14). Levels of somatic hypermutation (SHM) were determined by calculating V gene mutations in individual sequences, and mean values were calculated across samples and cell subsets.

Sequence Clustering, Clonal Lineages, and Antigen-Driven Selection
Sequences were independently clustered for each sample to group together those arising from clonally related B cells. The clustering required identical V and J gene use, identical complementarydetermining region (CDR) 3 length, and allowing a 1 in 15 nucleotides mismatch in the CDR3 as previously determined (7). Lineages were constructed from clusters using the alakazam R package (17). To account for read depth variation, lineage trees were constructed on subsamples of the original data. Specifically, we randomly sampled 25,609 unique collapsed sequences (corresponding to the lowest number of reads available for a sample) from every HC sample. For calculation of selection pressure of samples, individual sequences within clusters are not independent events, so an effective representative sequence of each clonal group was determined using the default settings of shazam. Selection pressure was calculated using BASELINe (18) implemented within shazam. The statistical framework used to test for selection was CDR_R/(CDR_R + CDR_S), which normalizes for the observed increase in the total number of mutations with age. The replacement/silent (R/S) mutation ratio was measured separately in framework regions (FWRs) and CDRs. In sequences with replacement but no silent mutations, the number of silent mutations was set to 1.

From Sequence to Structure
The SAAB+ pipeline was employed to annotate IgH repertoires with structural information (19). Briefly, IgH repertoires were numbered with the IMGT scheme (20) and filtered for structural viability using "ANARCI parsing" (21) as per the first steps of the ABOSS algorithm (22). Sequences were filtered out that (i) could not be aligned to the human Hidden Markov Model (HMM) profile of an IMGT germline (ii) had a J gene sequence identity of <50% to a human IMGT germline or (iii) contained non-amino acid entries in CDRs. Since the primer masking step in pRESTO (13) can remove the first framework region and positions 127 and 128 of some sequences, ANARCI parsing was customized to account for these exceptions. To retain as many sequences as possible for structural annotation, we substituted undetermined residues in the framework region with the residues from their respective parent germline genes.
To annotate the numbered sequences with canonical loop class information, SAAB+ employs SCALOP (23) with the IMGT CDR definition (20). The expected coverage of canonical loop class sequences with SCALOP is 93%, where 89% of predicted templates will have root-mean-square deviation (RMSD) values for the backbone atoms within 1.5 Å of the correct structure. The SCALOP database dated July 2018 was used in this study. SAAB+ employs FREAD (24) to annotate CDR-H3 loops with the Protein Data Bank (PDB) code (25) of the closest crystallographically-solved CDR-H3 structure (template). Only CDR-H3 sequences with loop lengths between 5 and 16 were investigated. The expected average RMSD of CDR-H3 template prediction for the human IgH repertoire data is 2.8 Å, with an expected coverage of 48% (19). PDB templates within a 0.6 Å RMSD radius were clustered together (19), reducing 2,943 PDB templates to 1,169 CDR-H3 PDB clusters.

Statistical Analysis and Graphing
To test for statistical significance, age groups were defined with the following ranges: 0-3, 4-8y, 9-16y, 17-25y, 26-39y and older than 40y. The number of individuals by age group, the age range and mean number of sequences are detailed in Supplementary Table 3. Statistical analysis and plotting were performed using R (26); all plots were produced using the ggplot2 and ggpubr packages (27,28). Heatmaps were visualized using the ComplexHeatmap R package (29). PCA plots were created using the R package factoextra (30). The Wilcoxon test was used to evaluate the differences between older age group and the youngest 0-3y age group. When comparing variables among the groups, the Kruskal-Wallis test was used. The significance level was set at 0.05. In cases where a model was fitted to the data, the R squared of the model and the p-value of the chi-squared goodness-of-fit test are shown in the bottom right of the graphs. Other specific tests used are detailed in the figure legends.

Classification of Sequences Into Cell Subsets Using Isotype and Number of Mutations
Since no B cell populations were isolated, we used constant region annotation and mutation number to group individual sequences into biologically different subsets based on known B cell subpopulations. Based on the frequency distribution of mutations for IgD and IgM sequences, those with up to 2 nt mutations across the entire V gene were considered "unmutated" (naïve) to account for allelic variance (31) and remaining PCR and sequencing bias (Supplementary Figure 1). All class-switched sequences were defined as antigen-experienced irrespective of their V gene mutation count. Because of very low sequence numbers, IgE and IgG4 transcripts were excluded from most analysis. The number of sequences of the different subsets among total transcripts by individual are found in Supplementary Table 1.

Data Availability
Raw sequence data used for analysis in this study are available at the NCBI Sequencing Read Archive (www.ncbi.nlm.nih.gov/ sra) under BioProject number PRJNA527941 including metadata meeting MiAIRR standards (32). The processed and annotated final dataset is available in Zenodo (https://doi.org/10.5281/ zenodo.3585046) along with the protocol describing the exact processing steps with the software tools and version numbers.

RESULTS
We obtained 78,702,939 raw sequences from samples of 53 healthy study participants. Processing, filtering and collapsing resulted in a final dataset of 8,341,669 unique IgH sequences used for downstream analysis. The numbers of unique sequences were significantly reduced after UMI-based collapsing resulting in a stronger correlation with the B cell numbers per sample (Supplementary Figure 2).

V Family and J Gene Usages Change With Age
Although previous work has observed common patterns of gene usage and has suggested a strong dependence on an individual's germline genetic background (33,34), the relative contributions to variance from age remained unclear. Proportions of sequences assigned to the different V gene families and J genes were calculated for each sample and B cell subset. The overall distribution of V family and J gene usage were different in older individuals compared with younger age groups. In particular, frequencies of V1 family sequences significantly decreased with age in naïve and mutated IgD and IgM sequences. This decrease was also observed in IgG and IgA transcripts although with higher individual variation in older age groups ( Figure 1A). No clear pattern was found in the usage of the other V families by age (Supplementary Figure 3A). Such changes in V1 family genes were due to age-related alterations in several V genes, particularly VH1-8 (Supplementary Figure 4). In order to assess whether the decreased usage of V1 genes with age was due to a lower number of copies of these genes in older individuals, we calculated the copy numbers of V1 genes and looked for deleted genes using Rabhit (35). Double chromosome deletions in V1 genes were found to be equally distributed across age groups (Supplementary Figure 5A). Haplotyping was inferred for 30 heterozygous individuals for J6 and/or D2-21 genes and copy numbers of V1 genes were not significantly different across age groups (Supplementary Figure 5B).
There were also changes in the overall J gene usage over the first 10 years of life marked by a significant decrease in the frequencies of sequences assigned to J6 in IgG transcripts ( Figure 1B). Frequencies of the other J genes by age group are shown in Supplementary Figure 3B. In line with previous work (36,37), we find that IgH sequences with rearranged J6 gene have longer junctions (Supplementary Figure 3C). Along with a declining J6 usage with age, a significant decrease in junction length was observed in IgG subsets of older individuals ( Figure 1C). However, even within IgG J6 transcripts, junction length significantly decreased with age indicating that shorter junctions in older individuals are not solely the result of altered J gene usage (Supplementary Figure 6).

Somatic Hypermutation Exponentially Increases in the First 10 Years of Life
There was a significant increase in SHM in all mutated subsets with age, which was most prominent in the first 10 years of life (Figure 2A). Substantial changes in mutation counts were found in all IgA and IgG subsets with exponential increases in children under 10 years and more linear progression between 10 and 50  Figure 2B). An age-related increase in the proportion of mutated sequences was also seen for IgA and IgG although at a higher level ( Figure 2B).

Sequences With Predicted Antibody Structures Diverging From Germline Increase With Age
Crystallographic studies have shown that antibody CDR-H1 and CDR-H2 loops can adopt a very limited number of structural conformations, known as canonical loop classes (38,39). These canonical classes are considered to be separate and distinct structures of the CDRs and can be rapidly and accurately annotated by SCALOP (23). The proportion of sequences in which either CDR-H1 and CDR-H2 had switched from the canonical class of their germline increased with age for most mutated subsets, similar to the increasing mutation number with age ( Figure 2C).
Structures of CDR3 were predicted by mapping sequences to antibody structures in the PDB and annotated with a PDB code identifier. The proportion of every PDB cluster within individual and repertoire was calculated and normalized to zero mean and unit variance across individuals. PDB cluster usages were similar across individuals and age with a small number of positive outliers (frequent usage) that were private to each individual (Supplementary Figure 7).

Structural but Not Sequence Clusters Are Commonly Shared Between Individuals
For each of the 53 individuals in this study, we calculated the frequency of sequence clusters (i.e., clonally related sequences) that are unique to the individual, the frequency of clusters that are shared with two, three or more subjects. Overlap with n subjects was quantified as the number of clusters shared with only n individuals divided by the total number of clusters in an individual's repertoire. We found that on average, 84.1% of clusters were unique to the individual, while 2.5, 0.2, 0.04, and 0.02% of clusters were shared with 2, 10, 25, and 45 or more other individuals, respectively ( Figure 3A). Sharing of structural clusters, however, was much more frequent with the majority of clusters (57%) shared by all 53 individuals and on average only 0.03% of clusters unique to the individual (Figure 3B). Neither sequence nor structural cluster sharing showed agerelated changes.

Older Individuals Display More Mature Clonal Lineages and Transcripts With Antigen-Driven Selection
Lineage trees were constructed from clusters of clonally related sequences and used to determine the evolutionary relationship within clusters (Figure 4A). The mean trunk length, representing the distance between the most recent common ancestor and germline sequence as a measure of the maturity of a lineage (40), greatly increased with age ( Figure 4B). There was no relationship between age and the Gini index, which predicts whether lineages are dominated by a single clone (high index) or has a broad branching structure (low index) (Figure 4C). To account for differences in read depth, these characteristics were calculated on subsampled data so that the numbers of sequences were similar between individuals. Insights into the process of antigen-driven selection can be gained by analyzing the mutational pattern in antigen-experienced repertoires. The R/S ratio in CDRs showed a marked increase in all mutated subsets between 0 and 10 years of life ( Figure 5A). In samples from study participants older than 10 years, the R/S ratio was largely constant with values of around 3-3.5 in all B cell subsets. In contrast, the R/S ratio was less variable and lower in FWRs compared with CDRs and no association with age was found (Supplementary Figure  8). Next, we determined selection pressure using a Bayesian estimation of antigen-driven selection (BASELINe), which calculates selection by comparing the observed mutations to expected mutations derived from an underlying SHM targeting model (18). In CDRs, there was a general trend toward an age-associated decrease in selection strength for IgA and IgG1 transcripts whereas this was constant across age for IgD or IgM sequences ( Figure 5B). The statistical framework used to test for selection was CDR_R/(CDR_R + CDR_S), which normalizes for the observed increase in the total number of mutations with age.

Usage of IgG2 and IgA2 Subclasses Increase With Age
Subclass usages were calculated within IgA and IgG repertoires to explore age-dependent class-switching patterns. In most age groups, IgG1 sequences were the most commonly detected, followed by IgG2, IgG3, and IgG4 sequences. However, the proportion of IgG2 sequences increased with age (p=0.0140, Kruskal-Wallis by age group) at the expense of lower usage of IgG1 (p = 0.0086, Kruskal-Wallis) and IgG3 (p = 0.1900, Kruskal-Wallis) sequences in older individuals. Similarly, IgA1 was most commonly used in all age groups and there was a non-significant trend toward a higher proportion of IgA2 sequences with age (p = 0.0960, Kruskal-Wallis) (Figure 6).

Repertoires From Older Individuals Contain More Self-Tolerant Sequences
Although self-reactivity can't be predicted with certainty from AIRR-seq data yet, sequence characteristics known to be associated with self-reactivity can be explored. These include an increased usage of certain V genes, mainly VH4-34, and usage of longer CDR3 with positively charged or hydrophobic residues (41)(42)(43). We investigated how these metrics vary with age in healthy individuals. Apart from the decreasing junction length in IgG subsets (Figure 1C), we found that age has no impact on charge or hydrophobicity of IgH repertoires (Supplementary Figure 9). VH4-34 usage was also unrelated to age whereas a more detailed SHM analysis including self-reactive motifs of VH4-34 sequences revealed an age-specific pattern. The VH4-34 germline contains an Ala-Val-Tyr (AVY) hydrophobic patch in FWR1 that is not present in other V genes and is thought to contribute to the self-reactive property of this gene (44,45). Another feature of the VH4-34 germline associated with autoimmunity is the presence of an Asn-X-Ser N-glycosylation sequon (NHS) in CDR2 that modulates antibody avidity (46). Previous research has shown that mutating one or both of these motifs drives specificity of these sequences away from self, thereby contributing to peripheral tolerance. Lower frequencies of both unmutated AVY and NHS were present in healthy older individuals while there was a relative accumulation of single and double-mutated motifs in VH4-34 with age (Figure 7). This pattern was observed across all mutated subsets but was only statistically significant for IgA and IgG transcripts (p = 0.0110

Combining Age-Related Repertoire Features Distinguishes Between Children and Adults
Principal component analysis (PCA) based on the age-driven variables including mutation, R/S ratio, junction length, gene usage and proportion of sequences structurally divergent from germline clearly showed distinct grouping of children younger than 9 years old and individuals older than 10 years old in mutated repertoires (Figure 8). This distinction was most clearly observed in the class-switched IgG and IgA repertoires. In IgD/IgM mutated sequences, children <3 years old were separate from other individuals whereas the repertoire characteristics in older age categories overlapped.

DISCUSSION
In this study, we found an extensive maturation of B cell responses in the first 10 years of life consistent with what would be expected with cumulative antigen exposure and a generally more developed and stable B cell compartment in older individuals. Further antibody repertoire alterations continue to be made thereafter, although at a lower rate. Although sample sizes in our study were relatively small in some age groups they were adequate for detecting age-related changes and providing insight into the developing IgH repertoire. The age distribution of our participants, number of samples, number of sequences analyzed and the results presented here constitute the most indepth evaluation of the IgH repertoire with age. This study also provides a detailed reference data set of isotype and subclassspecific IgH repertoires of healthy individuals across a relevant age range and stresses the importance of using well-selected, age-appropriate controls in future studies. Development and maturation of B cell repertoires throughout human life have been the subject of prior research. Studies of the immune system in ontogeny have shown that fetal repertoires are characterized by short CDR3 junctions, decreased diversity and a lack of advanced maturation features (47,48). Evidence of fetal B cells undergoing class-switch recombination and somatic hypermutation has been found in cord blood samples taken as early as 12 weeks' gestation (49,50). It has been shown that B cell repertoires develop throughout gestation to reach a maturity equivalent of those of postnatal infants by the end of the second trimester (51). However, the B cell repertoire at birth remains considerably different compared with those of adults. Our results reveal the underlying mechanisms of B cell maturation and how the repertoire continues to be shaped from childhood to adulthood. The oldest participant in our cohort was 50 years old, therefore our data does not inform about immunological changes that occur later in life. However, according to previous studies in B cell repertoire aging and immunosenescence, no evidence of an altered mechanism of somatic hypermutation is seen in the elderly. Furthermore, aged repertoires are skewed toward an increased usage of longer CDR3 junctions and a general trend toward the IgG2/IgM memory repertoire is observed (52)(53)(54)(55).
Previous studies have suggested that immunoglobulin gene usage is strongly genetically determined as it was conserved between monozygotic twins and across multiple time points within a given individual (7,33). Both heritable and stochastic mechanisms have been suggested to be involved in shaping the antibody repertoire V gene usage (56). We found age-dependent alterations in both V family and J gene usage in antigenexperienced repertoires suggesting either polyclonal negative selection of V1-and J6-containing B cells or positive selection of non-V1/J6-bearing B cells during maturation of the adaptive immune system. However, here we also saw that V family gene usage changed in naïve repertoires that are supposedly unaffected by antigen exposure and not subject to antigen-driven selection pressure, indicating preferential development and/or survival of V1-bearing B cells in young children. However, without longitudinal data it remains unclear whether this observed change is a result of genetics, differences in antigen exposure or a variation in the V gene usage over time. The potential benefit and mechanism behind these age-related V family gene alterations also remain uncertain.
In line with earlier findings (11,57,58), we observed extensive maturation of antigen-experienced repertoires characterized by accumulation of somatically hypermutated B cell antibody with evidence of strong positive selection in older individuals. IgH repertoire sequencing in a longitudinal birth cohort showed that IgM and IgD transcripts reach adult SHM frequencies by 2 years of age and class-switched IgA and IgG reach about 60-75% of adult SHM frequencies by the age of three (59). We showed that the mutation rate progressively increases in the first years of life and reaches the adult level at the age of 10 consistently in mutated IgM/D and class-switched transcripts. The observed decrease in selection pressure in some class-switched subsets indicates that young individuals show accelerated dynamics to achieve highly selected sequences compared with older individuals. Of note, detailed analysis allowed to investigate characteristics of mutated IgM/D transcripts separately, which were observed at a higher frequency and with a greater number of mutations in older individuals. These findings indicate that the pool of circulating peripheral blood naïve B cells is continuously diminishing with age, possibly contributing to a decreasing capacity to effectively respond to novel antigens in older individuals (60). We also observed a substantially higher proportion of unmutated IgA/G transcripts in young children compared with adults (61), in agreement with previous findings of unmutated IgG transcripts in intrauterine development and in neonates (62). These results are also in line with previous in vitro studies (63) demonstrating that class-switch recombination and somatic hypermutation can occur independently and suggest class-switching to be an important element of B cell responses in young children.
Along with other characteristics indicative of antigen-driven maturation we found that the proportion of sequences with structures differing from germline increased with age, which was most pronounced for IgG1 and IgA1 subsets. To date, there is limited information on predicted antibody structures derived from high-throughput adaptive immune receptor repertoire sequencing data (64,65). In line with measures of antigendriven selection, there was a positive linear relationship between number of mutations and structural alterations of antigenexperienced sequences indicating that alteration of the threedimensional structure is important to achieve high specificity and affinity of the antibody. By annotating individual sequences with PDB codes, we were able to investigate commonalities of CDR3 structures between individuals. In particular, in contrast to sharing on the sequence level, the majority of PDB clusters were public while only a very small percentage of PDB clusters were private to the individual. Although this comparison is influenced by the much smaller number of potential PDB clusters, the use of common PDB clusters indicates that a large number of different sequences can underlie similar antibody structures. Future work, such as the investigation of PDB usage in patients with immune disorders, will help determine how antibody structures can be used to assess global immune responses.
We found an increase in the usage of IgA2/IgG2 transcripts with age, similar to what has been seen in a recent study on the isotype subclasses surface expression of peripheral blood B cells (66). While human IgG subclasses have been extensively studied (67), there is limited information on the functional difference between the two IgA subclasses, whose structures mainly differ in the length of the hinge region (68). IgG2 has been implicated in the immune responses to capsular polysaccharides of bacteria such as S. pneumoniae that are commonly colonizing the oropharynx of young children and thereby induce polysaccharide-specific serum antibody (69). Our findings also match the sequential model proposed for CSR: with age, and after multiple encounter with the same antigen, classswitched memory B cells re-enter the germinal center to undergo a second round of CSR and switch toward more downstream constant region genes (70).
The majority of early immature human B cells display selfreactivity and although most of these are removed during B cell development, a substantial proportion of mature B cells may still be directed against autoantigens (41). Antibodies encoded by germline VH4-34 are intrinsically self-reactive antibodies mediated by a hydrophobic patch and a glycosylation sequon (44,46). Unmutated VH4-34 antibody are more common in naïve than antigen-experienced repertoires as receptor editing of these antibodies drives specificity away from self (45,71). In contrast to adults, we found that a substantial proportion of VH4-34 IgG and IgA transcripts from children are unmutated, with frequencies gradually decreasing with age. Previous work has shown that germline VH4-34-expressing IgG B cells recognized antigens from commensal gut bacterial (71) and hence, the higher frequency of these cells in children may be related to ongoing immune responses against gut pathogens in this age group.
This study used AIRR-seq technology coupled with bioinformatic methods to study in detail the IgH repertoires of healthy individuals and investigate the effect of age on repertoire characteristics. We chose a cross-sectional study design andalthough unlikely-can therefore not exclude that longitudinal assessment of maturation on an individual basis may differ from the presented findings. We performed bulk sequencing on total PBMC or B cells with constant region primers that allow isotype subclass resolution. Although individual B cell subpopulations were not isolated prior to downstream processing, we used careful bioinformatic analysis to still inform about features of B cell subgroups in this healthy control cohort. For practical reasons, the number of input cells was variable between study participants, which resulted in variable sequence numbers per sample. For analysis where sequence number variability was considered to be of major relevance, such as constructing lineage trees, subsampling to an equal number of sequences per individual was performed.
We were able to map in detail the characteristics, magnitude and rate of age-dependent maturation of IgH repertoires. Combining age-related variables using a PCA allowed clear separation of individuals younger than 10 years from older study participants, which was most pronounced in IgG repertoires. Our analysis now allows comparisons to be made in the IgH repertoires of healthy individuals to patients with altered immune states such as primary or secondary immunodeficiency (4) or infectious disease (72,73). By elucidating patterns that are associated with cumulative antigen exposure and an evolving immune system, this research offers important insight into adaptive immune system responses in humans. The mechanisms behind the development of clinically relevant autoimmunity is still poorly understood and the findings in this study show a substantial intrinsic capacity to produce self-reactive B cells, which may be essential to achieve the diversity needed for the defense against commensal pathogens in early life.
In summary, by studying the maturation of the healthy IgH repertoire with age, we found characteristics indicative of a maturing B cell system consisting of alterations in immunoglobulin gene usage, increased levels of SHM associated with strong positive selection, and canonical class usage that differed considerably from germline structures. Repertoires from older individuals more frequently contained antibody using more downstream constant region genes that are involved in the immune response to polysaccharide antigens. With accumulating mutations, germline-encoded self-reactive antibody were seen less with advancing age indicating a possible beneficial role of self-reactive B cells in the developing immune system. Finally, this study provides a reference data set of isotype subclassspecific IgH repertoires and stresses the importance of using well-selected, age-appropriate controls in future studies.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Zurich ethics committee (KEK-ZH 2015-0555) and the Basel ethics committee (EKNZ 2015-187). Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
JT designed and supervised the study, oversaw analyses, had full access to all the data in the study and takes responsibility for the integrity of the data, and the accuracy of the data analysis. The