Abstract
Sequence analysis of immunoglobulin (Ig) heavy and light chain transcripts can refine categorization of B cell subpopulations and can shed light on the selective forces that act during immune responses or immune dysregulation, such as autoimmunity, allergy, and B cell malignancy. High-throughput sequencing yields Ig transcript collections of unprecedented size. The authoritative web-based IMGT/HighV-QUEST program is capable of analyzing large collections of transcripts and provides annotated output files to describe many key properties of Ig transcripts. However, additional processing of these flat files is required to create figures, or to facilitate analysis of additional features and comparisons between sequence sets. We present an easy-to-use Microsoft® Excel® based software, named Immunoglobulin Analysis Tool (IgAT), for the summary, interrogation, and further processing of IMGT/HighV-QUEST output files. IgAT generates descriptive statistics and high-quality figures for collections of murine or human Ig heavy or light chain transcripts ranging from 1 to 150,000 sequences. In addition to traditionally studied properties of Ig transcripts – such as the usage of germline gene segments, or the length and composition of the CDR-3 region – IgAT also uses published algorithms to calculate the probability of antigen selection based on somatic mutational patterns, the average hydrophobicity of the antigen-binding sites, and predictable structural properties of the CDR-H3 loop according to Shirai’s H3-rules. These refined analyses provide in-depth information about the selective forces acting upon Ig repertoires and allow the statistical and graphical comparison of two or more sequence sets. IgAT is easy to use on any computer running Excel® 2003 or higher. Thus, IgAT is a useful tool to gain insights into the selective forces and functional properties of small to extremely large collections of Ig transcripts, thereby assisting a researcher to mine a data set to its fullest.
Introduction
The fate of a B cell largely depends on the B cell receptor, or immunoglobulin (Ig), which it expresses on its surface (Rajewsky, ; Kurosaki et al., ). Thus, the analysis of Ig gene transcripts can give important insights into the selective forces that act upon B cells during cellular maturation or during physiological or pathological immune reactions (Schroeder and Cavacini, 2010). For example, repertoire studies of Ig transcripts have revealed that the length and composition of the Ig heavy chain third complementarity determining region (CDR-H3) is strictly regulated during ontogeny, and somatic mutations are rare during the perinatal period even in secondary antibody repertoires (Schroeder et al., 1987, 2001; Cuisinier et al., ; Brezinschek et al., ; Zemlin et al., 2001, 2007; Kolar et al., ; Souto-Carneiro et al., 2005; Schelonka et al., 2007; Richl et al., ; Prabakaran et al., ). It has also been shown that the composition of the antigen-binding site plays a key role during B cell maturation and during the recruitment into various B cell subsets (Schelonka et al., 2007; Arnaout et al., ) and during protective immune responses (Rajewsky, ; Frolich et al., ). Moreover, studies of Ig repertoires can give valuable insights into the immune dysregulation that underlies the development of autoimmunity (Dorner and Lipsky, ; Vrolix et al., 2010; Zuckerman et al., 2010; Kalinina et al., ) and allergies (Snow et al., 1998; Takhar et al., 2007; Kerzel et al., ).
The antigen-binding site of the antibody is endowed with an almost unlimited theoretical diversity due to the imprecise junction of Variable, Joining, and (in the case of the Ig heavy chain) Diversity gene segments (Tonegawa, 1983). The random exonucleolytic truncation of the rearranged gene segments and the insertion of non-encoded N-nucleotides and P-nucleotides, the shuffling of light and heavy chains, and the insertion of somatic mutations during the germinal center reaction further expands the potential diversity exponentially. Theoretically, these mechanisms allow the production of more than 1015 different antigen-binding sites (Schroeder and Cavacini, 2010). Although seemingly limitless in theoretical potential, the human antibody response probably does not exploit more than 1% of its potential diversity (Boyd et al., ; Glanville et al., ; Arnaout et al., ). Thus, it seems unlikely that the expressed antibody repertoire would represent merely a random selection of the theoretical diversity.
In order to discover potential biases within repertoires that may have been coined by selective forces, it is desirable to study large numbers of Ig gene transcripts. With the advent of next generation sequencing (NGS) technologies, such as Roche 454 pyrosequencing, the direct large-scale sampling of sequence collections of 104, 105, and even greater numbers, is now obtainable within the span of a few days (Boyd et al., ; Reddy et al., ; Wu et al., 2010; Zuckerman et al., 2010; Jiang et al., ; Ippolito et al., ). Previously published semi-automated instruments cannot be used for such large collections or state-of-the-art characterizations due to significant quantitative and qualitative advances in Ig gene analysis (Shannon, 1997; Johnson and Wu, ; Zemlin et al., 2003). Thus, novel analysis tools are required which can handle extremely large sequence batches.
The online repository “International ImMunoGeneTics Information System®” (IMGT®1, founder and director: Marie-Paule Lefranc, Montpellier, France (Brochet et al., ; Lefranc et al., ) offers IMGT/HighV-QUEST, a free online tool to assign Variable, Diversity, and Joining gene segments to each individual full-length Ig transcript in batches up to 150,000 sequences. In addition, IMGT/HighV-QUEST provides numerous descriptors for each individual sequence, such as assignment of N- and P-nucleotides, amino acid translation, position of somatic mutations, isoelectric point, and many others (Giudicelli et al., ; Alamyar et al., ). The output files of these analyses contain descriptions of each individual sequence and can be downloaded as text files in comma separated values (CSV) format for documentation and further analysis.
Our aim was to create an easy-to-use software tool for the generation of informative statistics and publication-ready figures derived from the HighV-QUEST text-only output files. Moreover, we sought to include new and important analyses of higher order antibody features. For instance, although Shirai’s H3-rules have been formulated for the sequence-based prediction of CDR-H3 structural properties (Shirai et al., 1999), and whereas complex algorithms have been published to determine the probability by which a somatic mutation profile might arise non-randomly from antigen-driven selection (Chang and Casali, ; Lossos et al., ), there are at present no software tools available to the research community for high-throughput application of these rules and algorithms.
Here we present Immunoglobulin Analysis Tool (IgAT), a novel and user-friendly software tool for the extensive analysis and graphical presentation of very large collections of Ig transcripts which have been pre-analyzed by IMGT/HighV-QUEST. IgAT additionally calculates the probability of antigen-driven selection within Ig repertoires and predicts structural properties of the antigen-binding site. IgAT can be used to analyze up to 150,000 human or murine heavy or light chain transcripts in a single run of the application and automatically generates 25 Microsoft® PowerPoint® graphics files illustrating key characteristics of the Ig repertoire, such as VDJ gene utilization, amino acid use, CDR-H3 junctional diversity, and average hydrophobicity, as well as the quantitation of somatic mutation among Ig heavy chain transcripts, to name but a few. IgAT is available free of charge.
When applied to two or more sequence collections (e.g., samples from multiple individuals, different cell subsets, or identical cell subsets but under differing immunological conditions), IgAT readily yields the necessary data to allow statistical and graphical comparisons between various repertoires.
Methods
IgAT is a Microsoft® Excel® workbook containing the analysis functions as Visual Basic® for Applications (VBA) code. Each sheet is described in the results section. The workbook was created in Excel 2010 on Microsoft Windows® XP but should be compatible with Excel versions down to Excel 2003 with some limitations (Table 1). IgAT is not compatible with Excel for Mac®. The file can be found at: www.uni-marburg.de/neonat/igat
Table 1
| Excel version | Operation system | Max. memory | No. of sequences |
|---|---|---|---|
| Excel 2003 | Windows XP | 1 Gigabyte | ∼40,000 |
| Windows 7 (32/64-bit) | |||
| Excel 2007 | Windows XP | 2 Gigabyte | ∼60,000 |
| Excel 2010 (32-bit) | Windows 7 (32/64-bit) | ||
| Excel 2010 (64-bit) | Windows 7 (64-bit) | 8 Terabyte | 150,000 (max. no. of IMGT/HighV-QUEST) |
Estimate of the maximum size of sequence collections that can be processed.
The restrictions are caused by limited addressable memory by Excel. Excel versions prior 2007 can not address more than 1 GB of memory. 32-bit versions of Excel 2007/2010 can use 2 GB of memory, while the 64-bit versions are virtually unrestricted.
Results
In the following, we present the features offered by IgAT, using exemplarily a previously published collection of 78,569 murine Ig heavy chain sequences that contained 18,403 functional sequences (Reddy et al., ). These sequences were obtained from CD138+ plasma-cell-enriched bone marrow mRNA of two BALB/c mice immunized with human complement serine protease (C1S; NCBI Entrez Gene ID: 716).
Begin with a text file of FASTA-formatted Ig DNA sequences as can be obtained from a Roche 454 experimental run or other techniques. When submitting the sequence batch to IMGT/HighV-QUEST, under the advanced parameters setting, “Nb of accepted D-GENE in JUNCTION” must be set to the default (1) as IgAT will only process IMGT output files that assign a maximum of one single D-gene to each V-DH-J junction. IMGT individual result files are not necessary for the analysis with IgAT.
Input
As input, IgAT takes the 11 CSV text output files standardly generated by IMGT/HighV-QUEST derived from its analysis of raw 454 sequence data uploaded by the researcher. IgAT imports the folder containing the IMGT/HighV-QUEST CSV text output files through the cell “C6” of the “input” worksheet. (Alternatively, the IgAT program may be copied and pasted into the folder, which already contains the IMGT files.) Optionally, sequences marked as “unproductive” by IMGT/HighV-QUEST can be deleted. Deleting unproductive sequences will improve performance but might discard functional transcripts as Roche 454 sequencing is prone to homopolymer errors due to technical reasons.
The species (human or mouse), the Ig chain (heavy, lambda, or kappa), the minimum number of non-mutated nucleotides that are required to identify a diversity (D) gene, and the option to calculate the Taq-error must be chosen before starting the analysis. The Ig isotype is needed to calculate the Taq-error (Figure 1).
Figure 1
To start the analysis simply press the button “analyze data.” If “convert formulas to text” is checked, most formulas will be replaced by their values, resulting in reduced file size and recalculation time. In this case, however, additional changes will not have any effect on the analysis output. Once the sequence analysis is complete, the graphs can be exported as Microsoft PowerPoint® files (.ppt) by pressing “save graphs as ppt.”
The workbook was created in Excel 2010 and tested in Excel 2003 and 2010. To determine if your Microsoft Office® software meets this requirement, press “check office version.” It might be compatible with other versions (not tested).
Summary
The number of total, non-functional, functional, and unique sequences, as well as the number of clonotypes is listed in the “summary” worksheet (Figure 2). Deep sequencing technologies usually yield a significant proportion of incomplete or otherwise defective sequences. IgAT counts the sequences which were labeled “unproductive,” “no result,” or “unknown” by IMGT/HighV-QUEST.
Figure 2
Sequences are considered clonally related if they (i) use the same V and J genes, (ii) have an identical CDR-3 length, and (iii) a highly homologous CDR-3 region. The default definition of “highly homologous CDR-3 region” is ≤10% difference in nucleotide sequence. IgAT gives the user the flexibility to choose another percentage difference in nucleotide sequence, or a total number of nucleotide matches, or a percentage or total number difference in amino acid sequence when defining clonotypic parameters.
Data
The “Data” worksheet contains the imported data of the IMGT/HighV-QUEST output files. IgAT uses the taxonomy and numbering of the IMGT repository (Lefranc et al., ).
Sequence
In this worksheet, each nucleotide sequence occurs in an individual row and is split into framework regions (FR) 1–4 and complementarity determining regions 1–3. The sequences are ordered by functionality, which is defined by the existence of an open reading frame throughout the sequence, and by V gene segment utilization. Furthermore, the “Sequence” worksheet provides the length and amino acid translation for CDR-3, number of clonotypes, and identifies sequences with potential “VH-replacement footprints” (only human sequences) that can originate from VH replacement during receptor editing according to Zhang et al. (2003). In addition, sequences can be tagged with the sample ID. Based on sample IDs, the analysis can be confined to one or several samples or the transcripts can be divided into two groups for comparison.
VDJ
The “VDJ” worksheet contains absolute numbers, percentages, and graphs of the V-, DH-, and J-gene families and individual genes in the order of their localization in the germline (Figure 3).
Figure 3
CDR-3_length
The “CDR-3_length” worksheet displays the nucleotide length distribution of CDR-3, N1-, and N2-nucleotides within the analyzed sequence collection (Figure 4). In addition, the average lengths of the components of CDR-3, namely V length, P-nucleotides 3′ of V, N1-nucleotides, P-nucleotides 5′ of D, D length, P-nucleotides 3′ of D, N2-nucleotides, P-nucleotides 3′ of J, and J length are displayed in a deconstruction graph. A separate graph displays the deconstruction of those sequences without an identifiable D-gene. As a default for the IgH chain, CDR-H3 is defined as amino acids 105–117, according to the IMGT unique numbering system. The descriptive statistics given in the “CDR_length” worksheet can be used for comparative statistics with other sequence collections.
Figure 4
Somat_mut
This worksheet displays the somatic mutation rate of each transcript (mutations per 1,000 nt), as well as the average mutational frequency (Figure 5A). In addition, the probability of antigen selection is analyzed by assessing the distribution of replacement and silent mutations between FRs and CDRs (only available for heavy chains). Using the method of Lossos et al. (), we determined the replacement frequency and the relative length of FR and CDR of each germline VH gene. The average probability that a random mutation would allocate in CDR was calculated to be 0.23 ± 0.012, and the sequence-inherent probability that a mutation in the CDR would be a replacement mutation was estimated to be 0.79 ± 0.01. Therefore, the chance for a random mutation to introduce a replacement mutation into the CDR was 0.18. The binomial distribution method of Chang and Casali () was used to calculate the 90 and 95% confidence limits for the ratio of replacement mutations in the CDR (RCDR) to the number of total mutations in the V region (MV) as described by Dahlke et al. (). These confidence intervals are shown as dark (90%) and light gray (95%) shaded area in Figure 5B. A data point falling outside these confidence limits represents a sequence that has a high proportion of replacement mutations in the CDR. Therefore, an allocation above the upper or below the lower confidence limit is considered indicative of Ag-driven selection. It should be mentioned that refined methods for calculation of antigen selection have been published and are available to the public (Hershberg et al., ; Uduman et al., 2011). However, at the present IgAT is not suitable to include this type of analyses, because sequence alignments in large sequence collections would require a different software environment.
Figure 5
AA
This worksheet shows the amino acid distribution and frequency of the CDR-3 loop for sequences with the same CDR-3 length as entered in cell “G3” and different resulting amino acid variability plot (Shannon entropy, a measure of amino acid variability at a given position of aligned protein sequences, and Kabat–Wu plot, the number of different amino acids observed at a position divided by the frequency of the most common amino acid; Shannon, 1997; Johnson and Wu, ; Zemlin et al., 2003; Figure 6).
Figure 6
AA_frequency
This diagram shows the amino acid frequencies of the CDR-3 loop for all sequences (Figure 7). The frequency is given as percent of all amino acids encoded by CDR-3 from all unique sequences studied. As a default for the IgH chain, the CDR-H3 loop is defined as the amino acids 107–114, according to the IMGT unique numbering system, but the definition of the loop can be modified by the user by entering the limits into the worksheet “AA,” cells N5 and N6.
Figure 7

Amino acid frequencies of the CDR-H3 loop for all unique sequences (positions 107–114).
Kyte–doolittle
The normalized Kyte–Doolittle scale assigns one value to each amino acid. Negative numbers represent polar/hydrophilic amino acids and positive values represent hydrophobic amino acids (Kyte and Doolittle,
Figure 8

Distribution of average CDR-H3 loop hydrophobicities according to a normalized Kyte–Doolittle scale (positions 107–114; Eisenberg,
IGHD
This worksheet displays the DH gene reading frame usage (Figure 9). For each DH segment there is one reading frame encoding predominantly hydrophilic residues (especially tyrosine and serine; RF1), followed by a hydrophobic reading frame (RF2), and lastly a third reading frame that often encodes a stop codon (RF3). Thus, the third reading frame can be used only if either somatic mutations or else nucleotide losses during VDJ recombination delete the germline stop codon.
Figure 9

Reading frame utilization given as percent of all unique sequences with identifiable DH gene segment. The DH reading frames are defined according to the nomenclature of Ichihara et al. (
Shirai
In this worksheet the predicted structural features of the CDR-H3 are displayed (Figure 10). The “H3-rules” by Shirai (Shirai et al., 1999; Kuroda et al.,
Figure 10

Predicted structural features of the CDR-3 according to the “H3-rules” by Shirai et al. (1999). (K−, kinked base; K+, extra kinked base; K−/+, kinked or extra kinked base; E, extended base; hp def K−, deformed hairpin in sequences with kinked base; hp def K+, deformed hairpin in sequences with extra kinked base; hp def K−/+, deformed hairpin in sequences with kinked and extra kinked base; H lad K−, intact hydrogen bond ladder in sequences with kinked base; H lad K+, intact hydrogen bond ladder in sequences with extra kinked base; H lad K−/+, intact hydrogen bond ladder in sequences with kinked and extra kinked base).
Taq-error
This worksheet calculates the Taq-error rate. To exclude a relevant biasing of the somatic mutation frequency by Taq polymerase errors, IgAT calculates the Taq-error rate within the stretches of the Ig constant region when it is included in the PCR amplificates.
Discussion
Since the discovery of the Ig genes, as well as the fundamental mechanisms describing their combinatorial somatic rearrangement, numerous studies have been published with the goal of understanding the selective forces which might govern B cell and T cell development and the diversification of their lymphocyte receptor repertoires. Whereas B and T cells share a common mode of initial diversification (VDJ recombination), it is only B cells which include additional postrecombination diversification mechanisms such as VH replacement and somatic hypermutation. Furthermore, whereas the selective forces shaping the receptor repertoire of developing T cells have been well established (Morris and Allen,
Early pioneering efforts involved laborious cloning and classic Sanger DNA/cDNA sequencing which yielded sequence collections of modest size on the order of tens to a few hundreds. Novel antibody repertoire studies employ high-throughput deep sequencing technologies which can yield collections of unprecedented sizes on the order of thousands to millions of raw sequence reads (reviewed in Benichou et al.,
Conventional and Roche 454 deep sequencing of Ig heavy chain transcripts has been used to better understand the maturation of B cells, their selection into various maturational subsets (Wu et al., 2010), to determine the degree to which the repertoire might be genetically predetermined (Glanville et al.,
In this report we have used as an example a previously published collection of >18,000 Ig heavy chain (IgH) sequences from mice immunized with the human complement serine protease C1S (Reddy et al.,
Clonotypic diversity as a measure of restriction of the expressed repertoire versus a random repertoire
In theory, a diversity of more than 1 × 1015 antibodies can be established from the human and murine Ig germline loci, respectively (Schroeder, 2006). However, several antigen-independent and antigen-dependent mechanisms restrict the expressed antibody repertoires to probably less than 1% of the theoretically available diversity. Current theory holds that during B cell development in the bone marrow, restrictions are required to avoid the production of harmful or unnecessary antibodies while focusing on potentially protective antibodies. Current data obtained from the deep sequencing of human and mouse IgH repertoires suggests that primary antibody repertoires, while highly diverse, are nonetheless constrained by genetic mechanisms imposed during antigen-independent B cell development (Arnaout et al.,
IgAT helps identifying biases in V, DH, and J gene utilization that can indicate superantigen-driven selection or frequent VH gene replacement
IgAT summarizes the frequency of V and DH gene families and individual V, DH, and J genes. The VH and VL gene segments encode for four of the six complementarity determining regions and can thus have great influence on the recognition of classical antigens or superantigens. One reason for contradictory results regarding V gene utilization is the observation that southern blot probes or oligonucleotide primers may not have equal affinity to all VH gene segments, in particular when somatic mutations affect the primer binding site. To overcome this limitation, Vale et al. (2012) have suggested a novel technique for a less biased analysis of VH gene usage. A true predominance of one V gene family or V gene segment can arise from the positive selection of the repertoire for a particular classical antigen or by a superantigen (Zouali, 1995) and has also been described in Ig transcripts of B cell neoplasias (Sasso et al.,
Biases in amino acid frequencies and average hydrophobicity of CDR-H3 calculated by IgAT reveal restrictions with potential relevance for antigen recognition
In the example presented here, IgAT calculated a slightly hydrophilic average hydrophobicity according to a normalized Kyte–Doolittle Hydrophobicity scale for the CDR-H3 region, which is representative for a typical murine primary antibody repertoire (Zemlin et al., 2003). The hydrophobicity profile of the CDR-H3 region in mice has been shown to be crucial for conservation of global features of a normal antibody repertoire, for generation of normal B cell differentiation, and for the maintenance of normal adaptive immunity to model antigens and pathogens (Ippolito et al.,
Shifts in reading frame usage can be identified by IgAT and may indicate a selective bias regarding the hydrophobicity profile of the antigen-binding site. Moreover, the overall amino acid frequencies of CDR-H3 regions and the frequency of each amino acid per position in CDR-H3 sequences of identical length are presented in bar diagrams by IgAT to characterize a given collection of Ig transcripts and to compare collections that were generated under differing selective pressure.
IgAT analyzes the length of CDR-H3 and its components and calculates predictions for structural properties of CDR-H3
The CDR-H3 loop can assume an almost unlimited diversity of differing three dimensional shapes which are grouped into canonical structures (Morea et al.,
Besides elucidating the ontogeny of antibody repertoires, the deconstruction of CDR-H3 components provided by IgAT can also give insights into the selective mechanisms during antigen responses. For example, Dorner et al. (
Moreover, IgH receptor editing by the mechanism of VH replacement result in increased CDR-H3 length due to retention of a portion of the 3′ end of the original VH segment (Zhang et al., 2003). IgAT identifies these “VH footprints” which tend to accumulate within the VH-DH junction during VH replacement and which typically encode for highly charged amino acids (R, E, and D) at the 5′ end of CDR-H3 (Zhang et al., 2003). VH replacement seems to occur more frequently in autoimmunity (Dorner et al.,
The nature and distribution of somatic mutations indicates antigen-driven selection
An enrichment of replacement mutations within the CDRs compared to the FRs is indicative of antigen selection (Berek et al.,
In conjunction with IMGT/HighV-QUEST, IgAT significantly accelerates the characterization of large collections of Ig transcripts
Fifteen years ago, a researcher needed ∼1 h to assign VH-, DH-, and JH-gene segments, N- and P-nucleotides, and somatic mutations to one single Ig heavy chain gene transcript (personal observation). Today, using the freely available IMGT/HighV-QUEST software and the immunoglobulin gene analysis tool, IgAT, which we present here, it is possible to perform much more detailed analyses on >105 sequences within hours and >106 sequences within one day. This comprises only a few minutes of work for the researcher while the remaining time is spent by automated data transfer and analyses. The sequence set used in this report consists of ∼18,000 functional sequences. Results from IMGT/HighV-QUEST were received after ∼2 h. The calculation time of IgAT depends on the hardware and software configuration of the computer. For example, the analysis takes merely 20 min on an Intel® Pentium® 4 (3 GHz) and 4 GB memory machine running Windows XP (32-bit) and Excel 2010 (32-bit) and 15 min on a AMD® Athlon® 4850e (2.5 GHz) and 4 GB memory machine running Windows 7 (64-bit) and Excel 2010 (32-bit).
In conclusion, IgAT can be used to summarize and further analyze large sequence collections that have been pre-analyzed with IMGT/HighV-QUEST. IgAT delivers publication-ready figures and descriptive statistics that can be used to compare multiple sequence collections. Thus, IgAT can be used to characterize selective forces that act upon Ig repertoires during B cell maturation, protective immune responses, and dysregulated immune responses, such as autoimmunity, allergies, and B cell neoplasias.
Statements
Acknowledgments
We thank Jason M. Link for helpful discussions. This work was funded by the German Research Foundation, SFB/TR22, TPA17 (to Michael Zemlin).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Footnotes
References
1
AdemokunA.WuY. C.MartinV.MitraR.SackU.BaxendaleH.KiplingD.Dunn-WaltersD. K. (2011). Vaccination-induced changes in human B-cell repertoire and pneumococcal IgM and IgA antibody at different ages. Aging Cell10, 922–930.
2
AlamyarE.GiudicelliV.LiS.DurouxP.LefrancM. P. (2012). IMGT/HighV-QUEST: the IMGT® web portal for immunoglobulin (IG) or antibody and T cell receptor (TR) analysis from NGS high throughput and deep sequencing. Immunome Res.8, 26.
3
ArnaoutR.LeeW.CahillP.HonanT.SparrowT.WeiandM.NusbaumC.RajewskyK.KoralovS. B. (2011). High-resolution description of antibody heavy-chain repertoires in humans. PLoS ONE6, e22365.10.1371/journal.pone.0022365
4
BenichouG.YamadaY.YunS. H.LinC.FrayM.ToccoG. (2011). Immune recognition and rejection of allogeneic skin grafts. Immunotherapy3, 757–770.10.2217/imt.11.2
5
BerekC.GriffithsG. M.MilsteinC. (1985). Molecular events during maturation of the immune response to oxazolone. Nature316, 412–418.10.1038/316412a0
6
BoydS. D.MarshallE. L.MerkerJ. D.ManiarJ. M.ZhangL. N.SahafB.JonesC. D.SimenB. B.HanczarukB.NguyenK. D.NadeauK. C.EgholmM.MiklosD. B.ZehnderJ. L.FireA. Z. (2009). Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci. Transl. Med.1, 12ra23.10.1126/scitranslmed.3000540
7
BrezinschekH. P.FosterS. J.BrezinschekR. I.DornerT.Domiati-SaadR.LipskyP. E. (1997). Analysis of the human VH gene repertoire. Differential effects of selection and somatic hypermutation on human peripheral CD5(+)/IgM+ and CD5(-)/IgM+ B cells. J. Clin. Invest.99, 2488–2501.10.1172/JCI119433
8
BrochetX.LefrancM. P.GiudicelliV. (2008). IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis. Nucleic Acids Res.36, W503–W508.10.1093/nar/gkn316
9
ButlerJ. E.WertzN.WeberP.LagerK. M. (2008). Porcine reproductive and respiratory syndrome virus subverts repertoire development by proliferation of germline-encoded B cells of all isotypes bearing hydrophobic heavy chain CDR3. J. Immunol.180, 2347–2356.
10
ChangB.CasaliP. (1994). The CDR1 sequences of a major proportion of human germline Ig VH genes are inherently susceptible to amino acid replacement. Immunol. Today15, 367–373.10.1016/0167-5699(94)90175-9
11
CokerH. A.HarriesH. E.BanfieldG. K.CarrV. A.DurhamS. R.ChevrettonE.HobbyP.SuttonB. J.GouldH. J. (2005). Biased use of VH5 IgE-positive B cells in the nasal mucosa in allergic rhinitis. J. Allergy Clin. Immunol.116, 445–452.10.1016/j.jaci.2005.04.032
12
CollisA. V.BrouwerA. P.MartinA. C. (2003). Analysis of the antigen combining site: correlations between length and sequence composition of the hypervariable loops and the nature of the antigen. J. Mol. Biol.325, 337–354.10.1016/S0022-2836(02)01222-6
13
CuisinierA. M.GauthierL.BoubliL.FougereauM.TonnelleC. (1993). Mechanisms that generate human immunoglobulin diversity operate from the 8th week of gestation in fetal liver. Eur. J. Immunol.23, 110–118.10.1002/eji.1830230118
14
DahlkeI.NottD. J.RuhnoJ.SewellW. A.CollinsA. M. (2006). Antigen selection in the IgE response of allergic and nonallergic individuals. J. Allergy Clin. Immunol.117, 1477–1483.10.1016/j.jaci.2005.12.1359
15
DornerT.BrezinschekH. P.FosterS. J.BrezinschekR. I.FarnerN. L.LipskyP. E. (1998a). Delineation of selective influences shaping the mutated expressed human Ig heavy chain repertoire. J. Immunol.160, 2831–2841.
16
DornerT.FosterS. J.FarnerN. L.LipskyP. E. (1998b). Immunoglobulin kappa chain receptor editing in systemic lupus erythematosus. J. Clin. Invest.102, 688–694.10.1172/JCI3113
17
DornerT.LipskyP. E. (2005). Molecular basis of immunoglobulin variable region gene usage in systemic autoimmunity. Clin. Exp. Med.4, 159–169.10.1007/s10238-004-0051-2
18
EisenbergD. (1984). Three-dimensional structure of membrane and surface proteins. Annu. Rev. Biochem.53, 595–623.10.1146/annurev.bi.53.070184.003115
19
FeeneyA. J. (2011). Epigenetic regulation of antigen receptor gene rearrangement. Curr. Opin. Immunol.23, 171–177.10.1016/j.coi.2010.12.008
20
FeeneyA. J.AtkinsonM. J.CowanM. J.EscuroG.LugoG. (1996). A defective Vkappa A2 allele in Navajos which may play a role in increased susceptibility to Haemophilus influenzae type b disease. J. Clin. Invest.97, 2277–2282.10.1172/JCI118669
21
FrolichD.GieseckeC.MeiH. E.ReiterK.DaridonC.LipskyP. E.DornerT. (2010). Secondary immunization generates clonally related antigen-specific plasma cells and memory B cells. J. Immunol.185, 3103–3110.10.4049/jimmunol.1000911
22
GiudicelliV.BrochetX.LefrancM. P. (2011). IMGT/V-QUEST: IMGT standardized analysis of the immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences. Cold Spring Harb. Protoc.2011, 695–715.10.1101/pdb.prot5634
23
GlanvilleJ.KuoT. C.Von BudingenH. C.GueyL.BerkaJ.SundarP. D.HuertaG.MehtaG. R.OksenbergJ. R.HauserS. L.CoxD. R.RajpalA.PonsJ. (2011). Naive antibody gene-segment frequencies are heritable and unaltered by chronic lymphocyte ablation. Proc. Natl. Acad. Sci. U.S.A.108, 20066–20071.
24
GlanvilleJ.ZhaiW.BerkaJ.TelmanD.HuertaG.MehtaG. R.NiI.MeiL.SundarP. D.DayG. M.CoxD.RajpalA.PonsJ. (2009). Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc. Natl. Acad. Sci. U.S.A.106, 20216–20221.
25
GuH.KitamuraD.RajewskyK. (1991). B cell development regulated by gene rearrangement: arrest of maturation by membrane-bound D mu protein and selection of DH element reading frames. Cell65, 47–54.10.1016/0092-8674(91)90406-O
26
HershbergU.UdumanM.ShlomchikM. J.KleinsteinS. H. (2008). Improved methods for detecting selection by mutation analysis of Ig V region sequences. Int. Immunol.20, 683–694.10.1093/intimm/dxn026
27
IchiharaY.HayashidaH.MiyazawaS.KurosawaY. (1989). Only DFL16, DSP2, and DQ52 gene families exist in mouse immunoglobulin heavy chain diversity gene loci, of which DFL16 and DSP2 originate from the same primordial DH gene. Eur. J. Immunol.19, 1849–1854.10.1002/eji.1830191014
28
IppolitoG. C.Hon HoiK.ReddyS. T.CarrollS. M.GeX.RogoschT.ZemlinM.ShultzL. D.EllingtonA. D.VandenbergC. L.GeorgiouG. (2012). Antibody repertoires in humanized NOD-scid-IL2Rg-null mice and human B cells reveals human-like diversification and tolerance checkpoints in the mouse. PLoS ONE. 7, e35497.10.1371/journal.pone.0035497
29
IppolitoG. C.SchelonkaR. L.ZemlinM.IvanovIi.KobayashiR.ZemlinC.GartlandG. L.NitschkeL.PelkonenJ.FujihashiK.RajewskyK.SchroederH. W.Jr. (2006). Forced usage of positively charged amino acids in immunoglobulin CDR-H3 impairs B cell development and antibody production. J. Exp. Med.203, 1567–1578.10.1084/jem.20052217
30
JiangN.WeinsteinJ. A.PenlandL.WhiteR. A.IIIFisherD. S.QuakeS. R. (2011). Determinism and stochasticity during maturation of the zebrafish antibody repertoire. Proc. Natl. Acad. Sci. U.S.A.108, 5348–5353.10.1073/pnas.1010814108
31
JohnsonG.WuT. T. (2000). Kabat database and its applications: 30 years after the first variability plot. Nucleic Acids Res.28, 214–218.10.1093/nar/28.1.214
32
KabatE. A.WuT. T. (1991). Identical V region amino acid sequences and segments of sequences in antibodies of different specificities. Relative contributions of VH and VL genes, minigenes, and complementarity-determining regions to binding of antibody-combining sites. J. Immunol.147, 1709–1719.
33
KalininaO.Doyle-CooperC. M.MiksanekJ.MengW.PrakE. L.WeigertM. G. (2011). Alternative mechanisms of receptor editing in autoreactive B cells. Proc. Natl. Acad. Sci. U.S.A.108, 7125–7130.10.1073/pnas.1019389108
34
KerzelS.RogoschT.StrueckerB.MaierR. F.ZemlinM. (2010). IgE transcripts in the circulation of allergic children reflect a classical antigen-driven B cell response and not a superantigen-like activation. J. Immunol.185, 2253–2260.10.4049/jimmunol.0902942
35
KolarG. R.YokotaT.RossiM. I.NathS. K.CapraJ. D. (2004). Human fetal, cord blood, and adult lymphocyte progenitors have similar potential for generating B cells with a diverse immunoglobulin repertoire. Blood104, 2981–2987.10.1182/blood-2003-11-3961
36
KrishnanM. R.JouN. T.MarionT. N. (1996). Correlation between the amino acid position of arginine in VH-CDR3 and specificity for native DNA among autoimmune antibodies. J. Immunol.157, 2430–2439.
37
KurodaD.ShiraiH.KoboriM.NakamuraH. (2008). Structural classification of CDR-H3 revisited: a lesson in antibody modeling. Proteins73, 608–620.10.1002/prot.22087
38
KurosakiT.ShinoharaH.BabaY. (2010). B cell signaling and fate decision. Annu. Rev. Immunol.28, 21–55.10.1146/annurev.immunol.021908.132541
39
KyteJ.DoolittleR. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol.157, 105–132.10.1016/0022-2836(82)90515-0
40
LefrancM. P.GiudicelliV.GinestouxC.Jabado-MichaloudJ.FolchG.BellahceneF.WuY.GemrotE.BrochetX.LaneJ.RegnierL.EhrenmannF.LefrancG.DurouxP. (2009). IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res.37, D1006–D1012.10.1093/nar/gkn838
41
LoganA. C.GaoH.WangC.SahafB.JonesC. D.MarshallE. L.BunoI.ArmstrongR.FireA. Z.WeinbergK. I.MindrinosM.ZehnderJ. L.BoydS. D.XiaoW.DavisR. W.MiklosD. B. (2011). High-throughput VDJ sequencing for quantification of minimal residual disease in chronic lymphocytic leukemia and immune reconstitution assessment. Proc. Natl. Acad. Sci. U.S.A.108, 21194–21199.
42
LossosI. S.TibshiraniR.NarasimhanB.LevyR. (2000). The inference of antigen selection on Ig genes. J. Immunol.165, 5122–5126.
43
MoreaV.TramontanoA.RusticiM.ChothiaC.LeskA. M. (1998). Conformations of the third hypervariable region in the VH domain of immunoglobulins. J. Mol. Biol.275, 269–294.10.1006/jmbi.1997.1442
44
MorrisG. P.AllenP. M. (2012). How the TCR balances sensitivity and specificity for the recognition of self and pathogens. Nat. Immunol.13, 121–128.10.1038/ni.2190
45
PadlanE. A. (1994). Anatomy of the antibody molecule. Mol. Immunol.31, 169–217.10.1016/0161-5890(94)90001-9
46
PrabakaranP.ChenW.SingarayanM. G.StewartC. C.StreakerE.FengY.DimitrovD. S. (2012). Expressed antibody repertoires in human cord blood cells: 454 sequencing and IMGT/HighV-QUEST analysis of germline gene usage, junctional diversity, and somatic mutations. Immunogenetics64, 337–350.10.1007/s00251-011-0595-8
47
RadicM. Z.ZoualiM. (1996). Receptor editing, immune diversification, and self-tolerance. Immunity5, 505–511.10.1016/S1074-7613(00)80266-6
48
RajewskyK. (1996). Clonal selection and learning in the antibody system. Nature381, 751–758.10.1038/381751a0
49
RamslandP. A.KaushikA.MarchalonisJ. J.EdmundsonA. B. (2001). Incorporation of long CDR3s into V domains: implications for the structural evolution of the antibody-combining site. Exp. Clin. Immunogenet.18, 176–198.10.1159/000049197
50
ReddyS. T.GeX.MiklosA. E.HughesR. A.KangS. H.HoiK. H.ChrysostomouC.Hunicke-SmithS. P.IversonB. L.TuckerP. W.EllingtonA. D.GeorgiouG. (2010). Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells. Nat. Biotechnol.28, 965–969.10.1038/nbt.1673
51
RichlP.SternU.LipskyP. E.GirschickH. J. (2008). The lambda gene immunoglobulin repertoire of human neonatal B cells. Mol. Immunol.45, 320–327.10.1016/j.molimm.2007.06.155
52
RogoschT.KerzelS.SikulaL.GentilK.LiebetruthM.SchlingmannK. P.MaierR. F.ZemlinM. (2010). Plasma cells and nonplasma B cells express differing IgE repertoires in allergic sensitization. J. Immunol.184, 4947–4954.10.4049/jimmunol.0900859
53
RosnerK.WinterD. B.TaroneR. E.SkovgaardG. L.BohrV. A.GearhartP. J. (2001). Third complementarity-determining region of mutated VH immunoglobulin genes contains shorter V, D, J, P, and N components than non-mutated genes. Immunology103, 179–187.10.1046/j.1365-2567.2001.01220.x
54
SassoE. H.SilvermanG. J.MannikM. (1989). Human IgM molecules that bind staphylococcal protein A contain VHIII H chains. J. Immunol.142, 2778–2783.
55
SchelonkaR. L.TannerJ.ZhuangY.GartlandG. L.ZemlinM.SchroederH. W.Jr. (2007). Categorical selection of the antibody repertoire in splenic B cells. Eur. J. Immunol.37, 1010–1021.10.1002/eji.200636569
56
SchroederH. W.Jr. (2006). Similarity and divergence in the development and expression of the mouse and human antibody repertoires. Dev. Comp. Immunol.30, 119–135.10.1016/j.dci.2005.06.006
57
SchroederH. W.Jr.CavaciniL. (2010). Structure and function of immunoglobulins. J. Allergy Clin. Immunol.125, S41–S52.10.1016/j.jaci.2009.09.046
58
SchroederH. W.Jr.HillsonJ. L.PerlmutterR. M. (1987). Early restriction of the human antibody repertoire. Science238, 791–793.10.1126/science.3118465
59
SchroederH. W.Jr.ZemlinM.KhassM.NguyenH. H.SchelonkaR. L. (2010). Genetic control of DH reading frame and its effect on B-cell development and antigen-specific antibody production. Crit. Rev. Immunol.30, 327–344.
60
SchroederH. W.Jr.ZhangL.PhilipsJ. B.III. (2001). Slow, programmed maturation of the immunoglobulin HCDR3 repertoire during the third trimester of fetal life. Blood98, 2745–2751.10.1182/blood.V98.9.2745
61
ShannonC. E. (1997). The mathematical theory of communication, 1963. MD Comput.14, 306–317.
62
ShiraiH.KideraA.NakamuraH. (1996). Structural classification of CDR-H3 in antibodies. FEBS Lett.399, 1–8.10.1016/S0014-5793(96)01252-5
63
ShiraiH.KideraA.NakamuraH. (1999). H3-rules: identification of CDR-H3 structures in antibodies. FEBS Lett.455, 188–197.10.1016/S0014-5793(99)00821-2
64
SnowR. E.ChapmanC. J.HolgateS. T.StevensonF. K. (1998). Clonally related IgE and IgG4 transcripts in blood lymphocytes of patients with asthma reveal differing patterns of somatic mutation. Eur. J. Immunol.28, 3354–3361.10.1002/(SICI)1521-4141(199810)28:10<3354::AID-IMMU3354>3.0.CO;2-Z
65
Souto-CarneiroM. M.SimsG. P.GirschikH.LeeJ.LipskyP. E. (2005). Developmental changes in the human heavy chain CDR3. J. Immunol.175, 7425–7436.
66
SteiningerC.WidhopfG. F.IIGhiaE. M.MorelloC. S.VanuraK.SandersR.SpectorD.GuineyD.JagerU.KippsT. J. (2012). Recombinant antibodies encoded by IGHV1-69 react with pUL32, a phosphoprotein of cytomegalovirus and B-cell superantigen. Blood119, 2293–2301.10.1182/blood-2011-08-374058
67
TakharP.CorriganC. J.SmurthwaiteL.O’ConnorB. J.DurhamS. R.LeeT. H.GouldH. J. (2007). Class switch recombination to IgE in the bronchial mucosa of atopic and nonatopic patients with asthma. J. Allergy Clin. Immunol.119, 213–218.10.1016/j.jaci.2006.09.045
68
TonegawaS. (1983). Somatic generation of antibody diversity. Nature302, 575–581.10.1038/302575a0
69
UdumanM.YaariG.HershbergU.SternJ. A.ShlomchikM. J.KleinsteinS. H. (2011). Detecting selection in immunoglobulin sequences. Nucleic Acids Res.39, W499–W504.10.1093/nar/gkr413
70
ValeA. M.FooteJ. B.GranatoA.ZhuangY.PereiraR. M.LopesU. G.BellioM.BurrowsP. D.SchroederH. W.Jr.NobregaA. (2012). A rapid and quantitative method for the evaluation of V gene usage, specificities and the clonal size of B cell repertoires. J. Immunol. Methods376, 143–149.10.1016/j.jim.2011.12.005
71
VrolixK.FraussenJ.MolenaarP. C.LosenM.SomersV.StinissenP.De BaetsM. H.Martinez-MartinezP. (2010). The auto-antigen repertoire in myasthenia gravis. Autoimmunity43, 380–400.10.3109/08916930903518073
72
WuX.ZhouT.ZhuJ.ZhangB.GeorgievI.WangC.ChenX.LongoN. S.LouderM.MckeeK.O’DellS.PerfettoS.SchmidtS. D.ShiW.WuL.YangY.YangZ. Y.YangZ.ZhangZ.BonsignoriM.CrumpJ. A.KapigaS. H.SamN. E.HaynesB. F.SimekM.BurtonD. R.KoffW. C.Doria-RoseN. A.ConnorsM.MullikinJ. C.NabelG. J.RoedererM.ShapiroL.KwongP. D.MascolaJ. R. (2011). Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing. Science333, 1593–1602.10.1126/science.1204117
73
WuY. C.KiplingD.LeongH. S.MartinV.AdemokunA. A.Dunn-WaltersD. K. (2010). High-throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory B-cell populations. Blood116, 1070–1078.10.1182/blood-2009-11-256016
74
XuJ. L.DavisM. M. (2000). Diversity in the CDR3 region of V(H) is sufficient for most antibody specificities. Immunity13, 37–45.10.1016/S1074-7613(00)00006-6
75
ZemlinM.BauerK.HummelM.PfeifferS.DeversS.ZemlinC.SteinH.VersmoldH. T. (2001). The diversity of rearranged immunoglobulin heavy chain variable region genes in peripheral blood B cells of preterm infants is restricted by short third complementarity-determining regions but not by limited gene segment usage. Blood97, 1511–1513.10.1182/blood.V97.5.1511
76
ZemlinM.HoerschG.ZemlinC.Pohl-SchickingerA.HummelM.BerekC.MaierR. F.BauerK. (2007). The postnatal maturation of the immunoglobulin heavy chain IgG repertoire in human preterm neonates is slower than in term neonates. J. Immunol.178, 1180–1188.
77
ZemlinM.KlingerM.LinkJ.ZemlinC.BauerK.EnglerJ. A.SchroederH. W.Jr.KirkhamP. M. (2003). Expressed murine and human CDR-H3 intervals of equal length exhibit distinct repertoires that differ in their amino acid composition and predicted range of structures. J. Mol. Biol.334, 733–749.10.1016/j.jmb.2003.10.007
78
ZhangZ.ZemlinM.WangY. H.MunfusD.HuyeL. E.FindleyH. W.BridgesS. L.RothD. B.BurrowsP. D.CooperM. D. (2003). Contribution of VH gene replacement to the primary B cell repertoire. Immunity19, 21–31.10.1016/S1074-7613(03)00170-5
79
ZoualiM. (1995). B-cell superantigens: implications for selection of the human antibody repertoire. Immunol. Today16, 399–405.10.1016/0167-5699(95)80009-3
80
ZuckermanN. S.HazanovH.BarakM.EdelmanH.HessS.ShcolnikH.Dunn-WaltersD.MehrR. (2010). Somatic hypermutation and antigen-driven selection of B cells are altered in autoimmune diseases. J. Autoimmun.35, 325–335.10.1016/j.jaut.2010.07.004
Summary
Keywords
immunoglobulin heavy chain gene, immunoglobulin light chain gene, rearrangement, somatic mutation, sequence analysis software, antibody repertoire, high-throughput analysis, deep sequencing
Citation
Rogosch T, Kerzel S, Hoi KH, Zhang Z, Maier RF, Ippolito GC and Zemlin M (2012) Immunoglobulin Analysis Tool: A Novel Tool for the Analysis of Human and Mouse Heavy and Light Chain Transcripts. Front. Immun. 3:176. doi: 10.3389/fimmu.2012.00176
Received
27 April 2012
Accepted
10 June 2012
Published
28 June 2012
Volume
3 - 2012
Edited by
Harry W. Schroeder, University of Alabama at Birmingham, USA
Reviewed by
John D. Colgan, University of Iowa, USA; Deborah K. Dunn-Walters, King’s College London School of Medicine, UK
Copyright
© 2012 Rogosch, Kerzel, Hoi, Zhang, Maier, Ippolito and Zemlin.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.
*Correspondence: Michael Zemlin, University Children’s Hospital, Baldingerstrasse, D-35033 Marburg, Germany. e-mail: zemlin@med.uni-marburg.de
This article was submitted to Frontiers in B Cell Biology, a specialty of Frontiers in Immunology.
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.