Accumulation of VH Replacement Products in IgH Genes Derived from Autoimmune Diseases and Anti-Viral Responses in Human

VH replacement refers to RAG-mediated secondary recombination of the IgH genes, which renews almost the entire VH gene coding region but retains a short stretch of nucleotides as a VH replacement footprint at the newly generated VH–DH junction. To explore the biological significance of VH replacement to the antibody repertoire, we developed a Java-based VH replacement footprint analyzer program and analyzed the distribution of VH replacement products in 61,851 human IgH gene sequences downloaded from the NCBI database. The initial assignment of the VH, DH, and JH gene segments provided a comprehensive view of the human IgH repertoire. To our interest, the overall frequency of VH replacement products is 12.1%; the frequencies of VH replacement products in IgH genes using different VH germline genes vary significantly. Importantly, the frequencies of VH replacement products are significantly elevated in IgH genes derived from different autoimmune diseases, including rheumatoid arthritis, systemic lupus erythematosus, and allergic rhinitis, and in IgH genes encoding various autoantibodies or anti-viral antibodies. The identified VH replacement footprints preferentially encoded charged amino acids to elongate IgH CDR3 regions, which may contribute to their autoreactivities or anti-viral functions. Analyses of the mutation status of the identified VH replacement products suggested that they had been actively involved in immune responses. These results provide a global view of the distribution of VH replacement products in human IgH genes, especially in IgH genes derived from autoimmune diseases and anti-viral immune responses.


INTRODUCTION
To protect our body from various infectious agents, the adaptive immune system has evolved the capability to generate a vast number of antibody (Ab) specificities through somatic rearrangement of previously separated variable (V), diversity (D) (for heavy chain only), and joining (J) gene segments to form the variable domain exons of immunoglobulin genes (1)(2)(3). V(D)J recombination is catalyzed by a pair of recombination activating gene products (RAG1 and RAG2) (4)(5)(6). Specific joining of the V, D, and J gene segments is directed by the recombination signal sequences (RSS) flanking each rearranging gene segment (7). The RSS is composed of a highly conserved heptamer (5'-CACTGTG-3') and a nonamer (5'-ACAAAAACC-3') separated by a non-conserved spacer region with either 12 or 23 bp in length (7)(8)(9). There are 44 functional V H genes, 27 D H genes, and 6 J H genes within the human IgH locus. The diversified IgH repertoire is generated at different levels, including the random recombination of V, D, and J genes segments, imprecise processing of the coding-ends, addition of non-template nucleotides by terminal deoxynucleotidyl transferase (TdT), random pairing of IgH with Igκ or Igλ light chains, and later through somatic hypermutation and class switch recombination during antigen dependent germinal center reaction (2). Previous analyses of the IgH repertoire have provided important information regarding the developmental process and function of B lineage cells (10,11). For examples, earlier studies on the expression and rearrangement status of IgH genes demonstrated that IgH gene are rearranged sequentially during early B lineage cell development, in which D H to J H rearrangements occurs prior to V H to DJ H rearrangements followed by rearrangement of the Igκ and then Igλ light chain genes (12,13). Analyses of the Ig gene repertoires of different autoimmune diseases such as rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE) revealed skewed usages of specific germline V H genes (14)(15)(16), unusually long CDR3 regions within the IgH and IgL genes (17,18), and www.frontiersin.org accumulation of somatic hypermutation in the variable regions of IgH and IgL genes (15,19).
The random process of V(D)J recombination is essential for generating a diverse IgH repertoire, however, it also produces nonfunctional IgH genes or IgH genes encoding autoreactive antigen receptors (2,20). Early B lineage cells carrying non-functional IgH rearrangements must re-initiate the V(D)J recombination process to generate functional B-cell receptors (BCRs) for subsequent development; on the other hand, B-cells expressing autoreactive receptors will be removed from the repertoire through receptor editing, clonal deletion, or anergy to establish central tolerance (1,21,22). Receptor editing refers to RAG-mediated secondary recombination of previously rearranged IgH or IgL genes (1,21,22). The organizations of the Igκ and Igλ loci allow continuous secondary recombination by joining an upstream V L gene with a downstream J L gene segment. The previously formed V L J L joints are deleted during secondary recombination leaving no trace in the newly formed V L J L junctions; the only indication of extensive light chain gene editing is the elevated usage of the 3 Jκ or Jλ genes and the deletion of the Igκ locus (23,24).
The unwanted IgH genes can also be changed through a RAGmediated V H replacement process using the cryptic recombination signal sequences (cRSSs) embedded within the framework-3 regions of previously rearranged V H genes (21,22,25). The concept of V H replacement was originally proposed to explain the observation that functional IgH genes were generated in mouse pre-B-cell leukemia lines initially harboring non-functional IgH rearrangements (26)(27)(28). Comparison of the functional IgH genes versus the non-functional IgH rearrangements suggested a V H to V H DJ H recombination process mediated by the cRSS sites (26,27). Subsequently, the occurrence of V H replacement had been demonstrated in mouse models carrying knocked-in IgH genes encoding anti-DNA Abs, anti-NP Abs, or non-functional IgH genes in both alleles (29)(30)(31)(32)(33)(34). Despite these findings, the natural occurrence of V H replacement during early B-cell development in mouse remains to be determined (35,36).
Ongoing V H replacement in human B-cells had been found in a human leukemia cell line, EU12, by detection of RAG-mediated cRSS double stranded DNA breaks (DSBs) and by amplification of different V H replacement excision circles (37). The detection of DSBs at the V H3 -cRSS borders in human bone marrow immature B-cells provided the first evidence for the natural occurrence of V H replacement during normal B-cell development in humans (37). The occurrence of V H replacement in bone marrow immature B-cells is consistent with the observation that RAG1 and RAG2 genes can be reinduced in these cells to catalyze IgL gene editing (24,38,39). Our recent studies showed that V H replacement occurs in the newly immigrated immature B-cells in the peripheral blood of healthy donors, which can be further induced through BCR-mediated signaling in Ref. (40). The cRSS-mediated V H replacement was of particular interest because the cRSS motifs are found in 40 out of 44 human V H germline genes and in the majority of mouse V H germline genes (22,41). V H replacement renews almost the entire V H gene coding region but retains a short stretch of nucleotides as a V H replacement footprint at the V H -D H junction (37). Such footprints can be used to identify V H replacement products through analysis of IgH gene sequences. The initial analyses of 417 human IgH gene sequences estimated that V H replacement products contribute to about 5% of the normal IgH repertoire (37). Interestingly, analyses of the amino acids encoded by the V H replacement footprints revealed that these footprints preferentially contribute charged amino acids into the IgH CDR3 regions, which is different from the low frequency of charged amino acids encoded by human germline D H genes or N region sequences added by TdT (37).
To explore the biological significance of V H replacement, we developed a Java-based computer program and analyzed 61,851 human IgH gene sequences from the NCBI database to determine the distribution of V H replacement products.

DEVELOPMENT OF THE V H REPLACEMENT FOOTPRINT ANALYZER PROGRAM
The V H replacement footprint analyzer (V H RFA) program was developed using the NetBeans 7.01 IDE with Java development kit (JDK) and tested under Windows, Mac OS X, and Ubuntu Linux (42). The reference human V H germline gene sequences were downloaded from the IMGT database to generate the library of V H replacement footprints with different lengths. For the initial test of the V H RFA program, we used 417 IgH sequences that had been analyzed in our previous study to manually identify potential V H replacement products (37,43). The 61,851 human IgH gene sequences were downloaded from the NCBI database on April 20, 2011.

ANALYSIS OF IgH GENE SEQUENCES AND IDENTIFICATION OF POTENTIAL V H REPLACEMENT PRODUCTS USING THE V H RFA PROGRAM
The IgH gene sequence files from NCBI database were first converted into FASTA files and uploaded to the V H RFA program. The V H , D H , and J H germline gene usages were assigned by automatic submission of sequences in batches to the IMGT/V-Quest program (http://www.imgt.org/IMGT_vquest/share/textes/) (44) and the results were exported as Microsoft Excel files to a local computer. Identical IgH gene sequences in the original NCBI database were removed based on their V H -D H -J H junctions and the remaining 39,438 unique human IgH gene sequences with identifiable V H , D H , and J H genes were further analyzed to identify potential V H replacement products and calculate the frequencies of V H replacement products in subsequent analyses. Briefly, the IgH gene sequences with clear identifiable V H , D H , and J H genes were analyzed to identify V H replacement footprints with 7, 6, 5, 4, and 3-mer V H replacement footprint motifs at their V H -D H junction (N1) regions and D H -J H junction (N2) regions. The frequency of V H replacement products was calculated by dividing the number of IgH genes with V H replacement footprints in the N1 regions with the total number of unique IgH gene sequences. IgH genes with 7, 6, 4, and 3-mer V H replacement footprint motifs within their N1 regions were also analyzed and discussed. The positive prediction value with 95% confidence interval using the 6, 5, 4, and 3-mer V H replacement footprint motifs to assign V H replacement products are 68, 59, 54, and 52%, respectively. In the following comparison, the V H replacement products mainly refer Frontiers in Immunology | B Cell Biology to IgH genes with 5-mer V H replacement footprint within their N1 regions.
The distribution of V H replacement products in IgH genes derived from different keyword sub-categories were analyzed based on the information linked to each sequence in the NCBI GenBank files. The frequencies of V H replacement products with pentameric footprints were used for all these comparisons. For mutational analysis the IgH gene sequences had a minimum of ≥80% nucleotide similarity to the assigned germline V H gene sequences.

STATISTICAL ANALYSIS
Statistical significance was determined by using either the twotailed Chi square test with Yates' correction or the unpaired t -test. p < 0.05 is considered statistically significant and p < 0.0001 is considered extremely statistically significant.

DIFFERENTIAL USAGE OF GERMLINE V H , D H , AND J H GENES IN HUMAN IgH GENE SEQUENCES
We have developed a Java-based V H RFA computer program to analyze large number of IgH gene sequences and to identify potential V H replacement products (42). In the current study, the 61,851 human IgH gene sequences were downloaded from the NCBI database. The initial analysis showed that 54,970 IgH genes have identifiable V H , D H , and J H gene segments. After removal of duplicate IgH sequences, the remaining 39,438 unique IgH genes with identifiable V H , J H , and D H genes were further analyzed. The usages of the V H , J H , and D H germline genes in these sequences represent a combinatorial view of the human IgH repertoire from many studies (Figure 1). The usages of all the 44 functional human germline V H genes were confirmed in this dataset ( Figure 1A); the frequencies of individual V H germline gene usage varied considerably. For different families of V H genes, the V H 3 family of genes was predominantly utilized, followed by the V H 4 and V H 1 families of genes ( Figure 1A). Such results are consistent with previous analyses of small groups of IgH gene sequences, Among individual V H genes, the V H3-23 gene was used the most frequently in 9536 IgH genes (25%). The V H4-28 gene was used less frequently, which was only found in 13 IgH rearrangements (0.03%). The differential usages of individual V H germline genes did not seem to correlate with their relative location within the IgH locus ( Figure 1A). Within the IgH locus, the V H1-24 , V H2-26 , and V H3-30 genes are located very close to the V H3-23 and V H4-28 genes. However, the frequency of the V H3-23 gene usage is only 4-fold higher than those of the V H3-30 gene, but is 50-and 80-fold higher than that of the V H1-24 and V H2-26 genes, respectively ( Figure 1A).
Among different D H genes, the D H3 gene family was predominantly used in 35% of IgH genes, in which the D H3-10 , D H3-3, and D H3-22 genes were used frequently; The D H1 gene family   www.frontiersin.org was used less frequently ( Figure 1B). Among J H germline genes, the J H4 gene was predominantly used followed by the J H6 gene ( Figure 1C). These results are consistent with previous individual reports with small number of IgH sequences. Taken together, this analysis provides a comprehensive view of the existing human IgH gene sequences in the NCBI database.

IDENTIFICATION OF V H REPLACEMENT PRODUCTS USING THE V H RFA PROGRAM
To identify potential V H replacement products in a large number of IgH gene sequences, the V H RFA program first generated libraries of potential V H replacement footprint database with different length based on the V H gene 3 ending sequences following the conserved cRSS sites of all the functional human V H germline genes (Tables S1 and S2 in Supplementary Material). Then, the V H RFA program uses these libraries to search for the presence of V H replacement footprint motifs with specified lengths at the V H -D H junction (N1) regions or the D H -J H junction (N2) regions of IgH genes. As an initial test of the newly developed V H RFA program, we reanalyzed the 417 human IgH gene sequences that had been to manually identify potential V H replacement products analyzed in a previous study (37). The V H RFA program efficiently identified V H replacement footprint motifs with 3, 4, 5, 6, or 7 nucleotides in both the N1 and N2 regions (  Within the large number of IgH genes, there are 3818 nonfunctional IgH gene sequences and 687 of them contain the 5-mer V H replacement footprint motifs in their N1 regions, which can be assigned as potential V H replacement products. The frequency of V H replacement products in non-functional IgH genes (18%) is extremely statistically significantly higher than that in the overall functional IgH genes (p < 0.0001, two-tailed Chi square test with Yates' correction). Identification of V H replacement products in non-functional IgH genes fulfills the prediction that V H replacement is a random process that can generate both functional and non-functional IgH rearrangement products. Taken together, these results uncovered a previously unrealized contribution of V H replacement products to the diversification of human IgH repertoire.

DISTRIBUTION OF V H REPLACEMENT PRODUCTS IN IgH GENES USING DIFFERENT V H GENES
Using the V H RFA program, we further analyzed the distribution of V H replacement products in IgH genes using different V H genes. The frequencies of V H replacement products in IgH genes using different V H germline genes are different (Figure 2). For examples, the frequencies of V H replacement products in IgH genes using the V H2-5 , V H3-30 , V H3-30-3 , V H1-69 , and V H3-34 genes are 23.88, 19.12, 16.64, 14.28, and 13.13%, which are extremely statistically significantly higher than that in IgH genes using the V H6-1 gene (p < 0.0001, two-tailed Fisher's exact test) (Figure 2). As an internal control, 7.56% of IgH genes using the V H6-1 gene have 5-mer V H replacement footprints within their N1 regions, which is statistically significantly lower than that in the overall IgH gene sequences (p = 0.0004, two-tailed Fisher's exact test).

V H REPLACEMENT PRODUCTS ARE HIGHLY ENRICHED IN IgH GENES DERIVED FROM PATIENTS WITH AUTOIMMUNE DISEASES OR VIRAL INFECTIONS
The overall frequency of V H replacement products in the 39,438 unique IgH genes from the NCBI database (12.1%) is much higher than what was estimated in the 417 IgH genes obtained from healthy donors. We reasoned that the majority of IgH gene sequences deposited at the NCBI database was derived from diseased subjects, which may have higher frequencies of V H replacement products. Next, we investigated the distribution of V H replacement products in IgH genes derived from different disease sub-categories. Using the keyword analysis function within the V H RFA program, we can correlate the frequencies of V H replacement products with different sub-categories of IgH gene sequences from the NCBI database. For examples, the frequency of V H replacement products in 558 IgH genes derived from healthy donors is 8.6% (Figure 3), which is similar to the result obtained from previous analysis of the 417 IgH gene sequences from healthy donors. Interestingly, the frequencies of V H replacement products in IgH genes derived from subjects with different autoimmune diseases, such as allergic rhinitis, RA, and SLE are statistically significantly higher than that in the healthy donors (Figure 3, p < 0.05, two-tailed Chi square test with Yates' correction; Table S4 in Supplementary Material). The frequencies of V H replacement products are further enriched in IgH genes derived from RA synovium and in IgH genes encoding rheumatoid factors, suggesting that B-cells expressing V H replacement products are positively selected in the RA synovium to encode rheumatoid factors (Figure 3, p < 0.05, two-tailed Chi square test with Yates' correction; Table S4 in Supplementary Material). Similarly, V H replacement products are highly enriched in IgH genes derived from SLE plasmablasts (Figure 3, p < 0.05, two-tailed Chi square test with Yates' correction; Table S4 in Supplementary Material), suggesting that these enriched V H replacement products contribute to the production of autoAbs in SLE.
The accumulation of V H replacement in IgH genes derived from patients with different autoimmune diseases suggested that V H replacement products may contribute to the production of autoAbs. Indeed, further analyses showed that V H replacement products are statistically significantly enriched in IgH genes encoding rheumatoid factors, anti-Rh (D) Abs, and anti-acetylcholine receptor Abs (Figure 3, p < 0.05, two-tailed Chi square test with Yates' correction; Table S4 in Supplementary Material).
To our surprise, the frequencies of V H replacement products are significantly elevated in IgH genes derived from different viral infections. For examples, the frequencies of V H replacement products in IgH genes derived from HIV and HCV infected patients are statistically significantly higher than that in healthy donors (Figure 3, p < 0  Statistical significance was determined using a two-tailed Chi square test with Yate's correction. *p < 0.05 is considered statistically significant and **p < 0.0001 is considered extremely statistically significant. correction; Table S4 in Supplementary Material). Further analyses showed that the V H replacement products contribute to about 30% of IgH genes encoding anti-HCV glycoprotein E2 Abs or anti-HBVsAg Abs. Such frequencies are statistically significantly higher than that in healthy donors (Figure 3, p < 0.05, two-tailed Chi square test with Yates' correction). Taken together, these results showed that V H replacement products are highly enriched in IgH genes derived from patients with different autoimmune diseases and viral infections.

V H REPLACEMENT ELONGATES THE IgH CDR3
V H replacement renews almost the entire V H coding region. Due to the location of the cRSS site, a short stretch of nucleotides is remained as a V H replacement footprint at the newly formed V H -D H junction after the V H replacement process (37). Such V H replacement footprints can contribute up to two amino acids into the IgH CDR3 to elongate the CDR3. The average CDR3 length of the identified V H replacement products is 18.2 ± 5.0 aa (n = 4417), which is extremely statistically significantly longer than that of the non-V H replacement products (15.4 ± 4.4 aa, Figure 3, p < 0.0001, unpaired t -test) (Figure 4). This result confirmed that V H replacement elongates the IgH CDR3 region.

THE V H REPLACEMENT FOOTPRINTS PREFERENTIALLY ENCODE CHARGED AMINO ACIDS
Our previous analysis showed that the V H replacement footprints preferentially encoded charged amino acids in the IgH CDR3 regions (37,45). This is likely predetermined by the conservation of amino acid sequence at the 3 ends of V H germline genes.
Here, analysis of the amino acids encoded by the identified pentameric V H replacement footprints in the 4417 V H replacement products showed that 57% of them are charged amino acids. Such frequency is extremely statistically significantly higher than that in the N1 regions of non-V H replacement products (25%) (Figure 5A, p < 0.0001, two-tailed Chi square test with Yates' correction). Detailed analyses showed that the frequencies of K, R, D, and E residues encoded by the V H replacement footprints are statistically significantly higher than their usage in the N1 regions of non-V H replacement products (Figure 5B, p < 0.05, two-tailed Chi square test with Yates' correction). These results confirmed our previous prediction that V H replacement footprints preferentially contribute charged amino acids to the IgH CDR3 regions.

V H REPLACEMENT PRODUCTS ARE POSITIVELY SELECTED DURING AUTOIMMUNE OR ANTI-VIRAL RESPONSES
Charged amino acids within IgH CDR3 are not well tolerated during Ab repertoire development, they are frequently found within the IgH CDR3 regions of autoreactive or anti-viral Abs, which may play important roles in binding charged self or viral antigens, respectively. Further analyses of V H replacement products derived from different autoimmune diseases or viral infections showed that the identified V H replacement footprints predominantly encode charged amino acids ( Figure 6A). Detailed analyses showed that the identified V H replacement footprints in IgH genes encoding anti-DNA/histone Abs or rheumatoid factors encoded significantly lower frequencies of negatively charged residues, including D, E, N, and Q residues ( Figure 6B, p < 0.05, two-tailed Chi square test with Yates' correction). The identified V H replacement products have similar mutation rate when compared with the non-V H replacement product derived from healthy donors, patients with autoimmune diseases or viral infections ( Figure 6C). As negative controls, V H replacement products or non-V H replacement products in neonatal IgH gene sequences have much lower mutation rates ( Figure 6C). The accumulation of mutations within these V H replacement products indicates that these enriched V H replacement products in autoimmune diseases or viral infections had been positively selected.

DISCUSSION
In order to determine the distribution of V H replacement products in these IgH genes and explore the biological significance of V H replacement products in human antibody diversification and diseases, we developed a Java based computer program V H RFA to analyze large number of IgH gene sequences and to identify potential V H replacement products (42). Previous analyses of the V H replacement products (n=4788) Non-V H replacement products (n=34650) ** FIGURE 4 | The average CDR3 length of identified V H replacement products is significantly longer than that of non-V H replacement products. The distribution of IgH genes with different CDR3 lengths is shown in the bar graph. The average CDR3 length of V H replacement products (black bars) was compared to that of non-V H replacement products (white bars). Statistical significance was determined by using an unpaired t -test. **p < 0.0001 is considered extremely statistically significant. The usages of different amino acids in the N1 regions of non-V H replacement products (white bars) or encoded by the V H replacement footprints (black bars) were analyzed and shown in the bar graph. The total number of amino acids analyzed for each population is indicated. Statistical significance was determined using a two-tailed Chi square test with Yate's correction. *p < 0.05 is considered statistically significant. **p < 0.0001 is considered extremely statistically significant.
human IgH repertoire. In this dataset, the usage of every functional V H germline gene was confirmed, although their usages differ dramatically.
Using the V H RFA program, we identified V H replacement products and analyzed their distributions in the 39,438 unique IgH sequences. Based on the identification of pentameric V H replacement footprint motifs within the V H -D H junctions, 12.1% of the IgH genes can be assigned as potential V H replacement products. These results confirmed our previous estimation that V H replacement products contribute to the diversification of the human IgH repertoire. Interestingly, the frequencies of V H replacement products in IgH genes using the V H2-5 ,V H3-30 ,V H3-30-3 ,V H3-49 ,V H1-69 , and V H3-34 are statistically significantly higher than that in the overall IgH genes. In contrast, the frequency of V H replacement products in IgH genes using the V H6-1 gene is statistically significantly lower than that in the overall IgH genes. Among the non-functional IgH genes, 18% of them contain the pentameric V H replacement footprints and can be assigned as potential V H replacement products. These results confirmed the prediction that V H replacement is a random process that can generate both functional and non-functional IgH rearrangements. Moreover, the high frequency of V H replacement products in non-functional IgH genes suggested that V H replacement products were negatively selected during B-cell development. Based on this reasoning, the frequency of V H replacement products in the non-functional IgH genes may represent the true frequency of V H replacement during early stages of B-cell development, because these nonfunctional IgH rearrangements cannot encode BCRs and had not been selected during B-cell development.
Due to the location of the cRSS site, a short stretch of nucleotides has the potential to remain as a V H replacement footprint at the V H -D H junctions following the V H replacement process (25,37,46). The leftover V H replacement footprints will elongate the IgH CDR3 regions (25,37,46). Analyses of the identified 4788 V H replacement products showed that the average CDR3 length of the identified V H replacement products is 2.8 aa longer than that of non-V H replacement products. Previously, it surprised us that the identified V H replacement footprints preferentially encode charged amino acids within the IgH CDR3 regions (22,37,46). Recent analyses showed that the positions of the cRSS and high frequencies of charged amino acids encoded by the following nucleotides are highly conserved in IgH genes from different vertebrates (47). In the current study, 57% of the identified V H replacement footprints encoded charged amino acids in the IgH CDR3 regions. Normally, charged amino acids within IgH CDR3 are not well tolerated during antibody repertoire development, probably due to charged residues may generate autoAbs. Indeed, our analysis revealed that V H replacement products are significantly enriched in IgH genes derived from patients with different autoimmune diseases, including RA, allergic rhinitis, and SLE or in IgH genes encoding different autoAbs such as rheumatoid factor, anti-rhesus D antigen, and anti-acetylcholine receptor Abs. Our recent analyses of large number of mouse IgH genes also showed that the frequencies of V H replacement products are enriched in IgH genes derived from autoimmune prone mice (48). These results suggested that V H replacement products contribute to the generation of autoantibodies in both human and mouse. Another important and interesting finding from this analysis of large number of IgH gene sequences is that the frequencies of V H replacement products are significantly elevated in IgH genes derived from various viral infections, including HIV, HCV, and in IgH genes encoding Abs against HCV glycoprotein E2 or HBV surface antigens. Our recent studies showed that V H replacement products are highly enriched in IgH genes encoding different subgroups of anti-HIV antibodies, especially in CD4i and PGT antibodies (49). These results suggested that V H replacement products may contribute to the generation of anti-viral Abs. The majority of the V H replacement footprints identified from anti-viral Abs also encode charged amino acids, which may be important for binding charged viral antigens. Moreover, the accumulation of mutations in these V H replacement products indicated that these enriched V H replacement products in patients with viral www.frontiersin.org infections are positively selected during anti-viral responses. The identification of V H replacement products in autoimmune diseases and anti-viral responses suggested a potential link between viral infections and the pathogenesis of autoimmune diseases. It has long been postulated that chronic viral infections contribute to autoimmunity. However, clear examples that Abs against viral antigens cross-react with self-antigens have only been found in a few cases (50,51). Here, our results reveal a shared pattern of accumulation of V H replacement products in IgH genes derived from autoimmune diseases and anti-viral responses.
V H replacement was originally proposed as a receptor editing mechanism to change unwanted IgH genes that are either nonfunctional or encoding autoreactive Abs. The enrichment of V H replacement products in IgH genes derived from autoimmune diseases or encoding autoAbs is particular puzzling. There are several possible mechanisms to explain this finding. First, we have recently shown that crosslinking cell surface BCRs induces V H replacement in human immature B-cells (40). Thus, the levels of V H replacement recombination might be induced in the immature B-cells during either the anti-viral immune response or autoimmune disease due to persistent antigen stimulation or chronic inflammation. In supporting of this assumption, the number of newly emigrated immature B-cells in the peripheral blood is increased during inflammatory response; and these mobilized immature Bcells may continue to undergo V H replacement recombination ectopically. Second, the intrinsic feature of V H replacement is elongating the IgH CDR3 with charged amino acid. V H replacement products may frequently encode autoAbs and they are efficiently deleted during normal B-cell development. The observed elevated frequencies of V H replacement products in different autoimmune diseases may reflect the defective negative selection in these diseased subjects. Moreover, ectopically occurred V H replacement may bypass the stringent negative selection in the bone marrow and release V H replacement products in the periphery. Last, due to the special features of V H replacement products in generating IgH genes with long and charged CDR3, it is possible that V H replacement products are positively selected by viral antigens during anti-viral responses to produce specific anti-viral Abs. In supporting of this notion, the identified potential V H replacement products encoding anti-HIV antibodies all have very long CDR3 regions with multiple charged amino acid residues (49). The accumulated mutations within the V H genes of the identified V H replacement products in the current study also indicated the positive selection. However, the leftover V H replacement products generated during a chronic viral infection may encode Abs that cross-react with self-antigens and later contribute to autoimmunity. In fact, many cell surface antigens and viral antigens are negatively charged, which may be a reason for the selection of V H replacement products with long and charged CDR3 regions.
In our sequence based analysis, the assignment of V H replacement is dependent on the identification of V H replacement footprints within the V H -D H junctions. Any deletion at the 3 of V H genes or the 5 of V H replacement footprint motifs during the primary or secondary IgH gene recombination, respectively, may destroy the pentameric V H replacement footprints. Therefore, it is possible that the sequence analysis based study still underestimates the frequency of V H replacement products. Using the V H RFA program, we extended our analysis our V H replacement products to include potential V H replacement footprint motifs with different lengths. For examples, 33.9% of the IgH genes contain the tetrameric V H replacement footprint motifs and 58.8% of IgH genes contain the trimeric V H replacement footprint motifs. These results revealed a significant contribution of V H replacement products to the IgH repertoire. Recent studies in mice carrying non-functional IgH genes on both IgH alleles demonstrated that V H replacement occurs efficiently to generate almost normal numbers of B-cells with diversified IgH repertoires (52). However, only about 20% of the IgH gene sequences from this study contained residual V H replacement footprints. Therefore, the majority IgH genes generated through V H replacement recombination have no leftover V H replacement footprints. Theoretically, 66% of IgH rearrangements will be out of reading frame and 44% of developing B-cells may carry non-functional IgH rearrangements on both alleles. If all of these B-cells are rescued by V H replacement, a minimum of 44% of the IgH genes might be generated through V H replacement recombination. Under this assumption, IgH genes containing the tetrameric or the trimeric V H replacement footprint motifs at their N1 regions should also be considered as potential V H replacement products.
Like any sequence based analysis program, the V H RFA program also has its limitation. Although sequence motifs assemble the V H gene 3 ending sequences can be identified in the N1 regions, such motifs can also be identified within the N2 regions at relative lower frequencies. Theoretically, V H replacement can only leave footprint within the N1 region; the existence of V H replacement footprint like motifs within the N2 regions can only be generated by random nucleotide addition. For IgH genes using the V H6-1 gene, which is the first V H germline gene 5 to the DH locus, there should have no V H replacement footprint like motifs within the V H -D H junctions, but the V H RFA program still identifies 7.56% of the sequences contains V H replacement footprint like motifs within the V H -D H junctions. We can only refer such motifs as the contribution of random nucleotide addition.
In summary, analyses of a large number of human IgH gene sequences from the NCBI database uncovered a significant contribution of V H replacement products to human Ab repertoire, especially in IgH genes derived from autoimmune diseases or anti-viral responses. Understanding how V H replacement is regulated and how V H replacement products are positively or negatively selected during normal or diseased conditions will be the focus of future studies, because modulation of the level of V H replacement may offer unique approaches to treat different human diseases.

ACKNOWLEDGMENTS
Miles D. Lange, Lin Huang, and Zhixin Zhang conceived and designed the study. Lin Huang developed the Java-based V H RFA software. Miles D. Lange and Lin Huang analyzed the raw data and generated figures and tables. Miles D. Lange, Lin Huang, and Zhixin Zhang validated the results. All authors contributed to the development of the project and final writing of the manuscript. This study was supported in part by NIH grants AI074948 (Zhixin Zhang), AI076475 (Zhixin Zhang), and AR059351 (Kaihong Su). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.