Edited by: Bridget S. Wilson, University of New Mexico, United States
Reviewed by: Michael Zemlin, Saarland University Hospital, Germany; Paolo Casali, University of Texas Health Science Center San Antonio, United States
This article was submitted to B Cell Biology, a section of the journal Frontiers in Immunology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Specific antibody reactivities are routinely used as biomarkers, but the antibody repertoire reactivity (igome) profiles are still neglected. Here, we propose rationally designed peptide arrays as efficient probes for these system level biomarkers. Most IgM antibodies are characterized by few somatic mutations, polyspecificity, and physiological autoreactivity with housekeeping function. Previously, probing this repertoire with a set of immunodominant self-proteins provided a coarse analysis of the respective repertoire profiles. In contrast, here, we describe the generation of a peptide mimotope library that reflects the common IgM repertoire of 10,000 healthy donors. In addition, an appropriately sized subset of this quasi-complete mimotope library was further designed as a potential diagnostic tool. A 7-mer random peptide phage display library was panned on pooled human IgM. Next-generation sequencing of the selected phage yielded 224,087 sequences, which clustered in 790 sequence clusters. A set of 594 mimotopes, representative of the most significant sequence clusters, was shown to probe symmetrically the space of IgM reactivities in patients' sera. This set of mimotopes can be easily scaled including a greater proportion of the mimotope library. The trade-off between the array size and the resolution can be explored while preserving the symmetric sampling of the mimotope sequence and reactivity spaces. BLAST search of the non-redundant protein database with the mimotopes sequences yielded significantly more immunoglobulin J region hits than random peptides, indicating a considerable idiotypic connectivity of the targeted igome. The proof of principle predictors for random diagnoses was represented by profiles of mimotopes. The number of potential reactivity profiles that can be extracted from this library is estimated at more than 1070. Thus, a quasi-complete IgM mimotope library and a scalable representative subset thereof are found to address very efficiently the dynamic diversity of the human public IgM repertoire, providing informationally dense and structurally interpretable IgM reactivity profiles.
The repertoire of human IgM contains a considerable proportion of moderately autoreactive antibodies, characterized by low intrinsic affinity/low specificity (
IgM antibodies appear early in the course of an infection. However, they fall relatively fast, even after restimulation, providing a dynamic signal. By interacting with structures of the self and carrying housekeeping tasks, this part of the antibody repertoire is coupled to changes in the internal environment. Consequently, IgM antibodies have gained interest as biomarkers of physiological or pathological processes (
The study of the IgM repertoire (igome) might be expected to give information about interactions that occur mostly in the blood and the tissues with fenestrated vessels, because, unlike IgG, IgM cannot easily cross the normal vascular wall. Yet IgM tissue deposits are a common finding in diverse inflammatory conditions (
Our goal was to demonstrate that an essential part of the human polyspecific IgM repertoire involved in homeostasis could be probed by a set of mimotopes, which could be rationally scaled to sizes appropriate for the diagnostic tasks. Essentially, our approach does not specifically target disease specific antibody reactivities but rather the natural antibody repertoire as a universal biosensor of changes of the internal environment. The existing approaches for immunosignature (
Human IgM was isolated from a sample of IgM enriched IVIg, IgM-Konzentrat (Biotest AG, Dreieich, Germany, generously provided by Prof. Srini Kaveri), whereas human monoclonal IgM paraprotein was isolated from an IgM myeloma patient's serum selected from the biobank at the Center of Excellence for Translational Research in Hematology at the National Hematology Hospital, Sofia (with the kind cooperation of Dr. Lidiya Gurcheva). In both cases, IgM was purified using affinity chromatography with polyclonal anti-μ antibody coupled to agarose (A9935, SIGMA-ALDRICH, USA). A 7-mer random peptide library (E8100S, Ph.D.-7, New England Biolabs, USA) was panned overnight at 4°C on pooled human IgM adsorbed on polystyrene plates at a concentration of 0.1 mg/ml, washed, eluted with glycine buffer at pH 2.7, and immediately brought to pH 7. The eluate was transferred to a plate coated with monoclonal IgM and incubated according to the same protocol, but this time, the phage solution was collected after adsorption and amplified once, following the procedure described by Matochko et al. (
Sera were obtained from randomly selected patients with glioblastoma multiforme (GBM), brain metastases of breast (MB) or lung (ML) cancers, and non-tumor-bearing patients (C) (herniated disc surgery, trauma, etc.) of the Neurosurgery Clinic of St. Ivan Rilski University Hospital, Sofia. The samples were acquired according to the rules of the ethics committee of the Medical University in Sofia, after its approval and obtaining informed consent. The sera were aliquoted and stored at −20°C. Before staining, the sera were thawed; incubated for 30 min at 37°C for dissolution of IgM complexes; diluted 1:100 with phosphate-buffered saline (PBS), pH 7.4, and 0.05% Tween 20 with 0.1% bovine serum albumin (BSA); further incubated for 30 min at 37°C; and filtered through 0.22-μm filters before use. The serum IgM reactivity was analyzed on different sets of peptides defined in microarray format.
The customized microarray chips were produced by PEPperPRINT™ (Heidelberg, Germany) by synthesis
The microarray images were acquired using a GenePix 4000 Microarray Scanner (Molecular Devices, USA). The densitometry was done using the GenePix® Pro v6.0 software. All further analysis was performed using publicly available packages of the R statistical environment for Windows (v3.4.1) (Bioconductor; Biostrings, limma, pepStat, sva, e1071, Rtsne, clvalid, entropy, RankProd, multcomp, etc.) as well as in-house developed R scripts (
Sections of protein sequences homologous to the studied peptides were identified using the blastp function and the non-redundant human protein database of NCBI (
We set out to define as complete as possible a library of mimotopes of the normal human broadly expressed IgM repertoire. To this end, we chose to pan a commercially available 7-mer random peptide phage display library (Ph.D.-7, New England Biolabs) of diversity 109. Thus, the size of the mimotopes would be in the range of the shortest linear B-cell epitopes in the IEDB database (
Schematic representation of the deep panning experiment.
The overall amino acids residues frequencies (AAF) in the mimotopes selected from the phage library showed a skewing in favor of G, W, A, R, T, H, M, P, and Q and against C, F, N, Y, I, L, and S (
Distribution of the amino acid residues in the mimotope library.
To gain insight into the mimotope sequence space, the set of 224,087 selected mimotope sequences was subjected to clustering using the GibbsCluster-2.0 method (
Results from GibbsCluster of the mimotopes. Different predefined numbers of clusters were screened for the quality of clustering as measured by Kullback–Leibler divergence (KLD). The inset shows the amplified part around the peak KLD values.
The mimotope library of more than 200,000 sequences is a rich source of potential mimotope candidates for vaccine or diagnostics. Yet the size of a probe containing an array of 105 peptides is impractical for routine diagnostic use. A way to scale down the mimotope probe array would be to include a representative sequence of each of the naturally existing 790 clusters. Only the sequence with the top score from each cluster was kept as a mimotope prototype for the cluster. These are expected to sample evenly (symmetrically) the mimotope sequence space, as ensured by the GibbsCluster algorithm.
The mimotope sequence diversity in each cluster was significant. Thus, cutting down so much the list of representative sequences would seem counterintuitive with respect to our goals. Nevertheless, this approach was chosen because such a symmetric sequence set was hypothesized to address a much wider range of IgM clones than the mere number of mimotopes chosen because of the well-documented polyspecificity of the majority of the antibodies probed.
The sequence clusters were found to vary with respect to the probability of random occurrence of such a group of sequences, which was used to rank them by significance (
The relevance of this mimotope library to the complete repertoire of broadly expressed IgM reactivities and the scope of their diversity could be established by comparing several different peptide libraries with different properties.
An alternative sequence library was constructed
Other libraries of peptides generated for further comparison were as follows: (1) uncertainly clustered sequences as reflected in their KLD scores as shown by the GibbsCluster algorithm (NG1 and NG2); (2) two groups of five high scoring clusters as lower diversity libraries (C5_1 and C5_2); (3) random 7-mer sequences predicted to belong to some of the five highest scoring clusters based on PWM profile scores (C5P); and (4) random 7-mer sequences (RND) (see
Libraries of 7-mer peptides studied.
SYM | A library that samples symmetrically the mimotope sequence space. Contains the sequence with the highest score for the respective position weight matrix from each significant cluster (significant clusters are those for which the number of sequences with more than median PWM score is greater than the expected number of occurrences of such score in random peptides— |
594 |
C5_1 | A group of 5 of the 288 clusters with best binomial |
600 |
C5_2 | A group of 5 of the 288 clusters with best binomial |
1,193 |
C5P | 150 random |
750 |
NG1 | The lowest scoring sequence (using KLD) from each significant cluster. These sequences are least certain to belong to any of the 790 clusters. | 594 |
NG2 | Among the set of the lowest scoring sequences (NG1) using GibbsCluster's own “Corrected” score—those with score <5 ( |
82 |
NGR | The max scores for each of a set of 2 × 106 random |
753 |
RND | 800 random peptides. | 800 |
Total | 5,366 |
IgM reactivity in sera from patients with GBM (
Statistics testing the libraries' capacity to probe the mimotope reactivity space.
Next, the capacity of the different libraries to sample symmetrically the space of 7-mer IgM mimotope reactivities in the IgM repertoire was tested. The mean nearest neighbor distance (MNND) was used for that purpose as a statistic indicating clustering of the data. Peptides that have similar reactivity profiles with different sera (thus carrying redundant information) would map to points in the reactivity space that lie close to each other. This clustering in some regions of the space would lead to a lower MNND. The library SYM ranked second only to NG2 (
The correlations between the patient profiles of reactivity were also used as a measure of the capacity of the libraries to extract information from the IgM repertoire. We tested all pairwise correlations between the patient profiles with the peptides from a given library. After
Finally, all three criteria were summarized using a rank product test, which proved that reactivity with SYM stands out from all the other tested libraries as the best among them for probing the IgM repertoire (
Rank product test of three criteria for optimal mimotope library.
C5_1 | 4.380 | 0.650 |
C5P | 5.518 | 0.853 |
NG1 | 5.313 | 0.823 |
NG2 | 2.154 | 0.099 |
NGR | 5.241 | 0.812 |
C5_2 | 4.579 | 0.692 |
SYM | 1.260 | 0.007 |
RND | 4.820 | 0.739 |
T-distributed stochastic neighbor embedding (t-sne) was used to visualize the structure of the mimotope sequence space as represented by the general mimotope library produced by deep panning. To represent the sequences as vectors of real numbers, each amino acid residue was represented by five scores based on the z1–z5 scales published by Sandberg et al. (
Visualization of the 7-mer mimotope sequence space with the optimized library SYM marked in red. Although individual GibbsCluster defined clusters do not coincide with those shown by t-sne, the mapping of the optimized library apparently probes quite uniformly the mimotope sequence space. t-sne, T-distributed stochastic neighbor embedding.
An important aspect of the usage of mimotopes as igome probes is their interpretability. Using large mimotope libraries provides an opportunity to generalize the type of structures targeted by the antibody repertoire studied. The numerous tests may allow for signal to emerge despite the noise due to poor representativeness of conformational epitopes, polyspecificity, mimotope/epitope sequence length disparity, etc.
To explore the capacity of the library SYM to reveal general properties of the antigens targeted by the natural IgM repertoire, we used NCBI blastp program to find SYM homologous short sequences in the non-redundant (nr) database of human proteins (
Comparison between the number of homologous sequences found by BLAST search of the non-redundant human protein database classified as immunoglobulin J regions or non-immunoglobulin hits. The numbers and the size of the shaded bars correspond to the number of peptides having this type of homolog in the targeted database, and the proportion these peptides represent the volume of the library. A peptide was considered to be homologous to an immunoglobulin J region if the search returned at least one hit in a variable J region. The parameters were automatically adjusted for short sequences, and the results further restricted to those with a minimum of six positive positions and a minimum of six identity position. The alignment length was set equal to the number of positive positions, and no gaps were allowed. The proportions were compared using the chi-square test followed by pairwise multicomparison with false discovery rate correction (overall,
To test the diagnostic potential of the SYM library, we chose to look for reactivity profiles able to separate sera from patients with different brain tumors. Although somewhat questionable, our expectation to find IgM repertoire correlates of brain tumor diseases was justified by (1) reports by Merbl et al. (
For this assay, we used sera from a set of 34 patients with brain tumors. The main goal was a “proof of principle” test demonstrating the capacity of the assay to provide mimotope reactivity profiles suitable for building predictors for randomly selected pathology. The distribution of patients by diagnosis (GBM, ML, MB, and C) is shown in
Patients tested using the optimized library.
Non-tumor bearing (control) | C | 1 | 3 | 4 | 8 |
Glioblastoma multiforme | GBM | 2 | 4 | 9 (5) |
15 (11) |
Lung cancer (brain metastasis) | ML | 2 | 4 | 3 | 9 |
Breast cancer (brain metastasis) | MB | 0 | 0 | 2 (0) |
2 (0) |
Total | 34 |
A two-dimensional projection of the cases on the 582 positive reactivities by multidimensional scaling (MDS) showed no separation (data not shown). This is expected because the peptide library is not targeted to any particular pathology. It represents rather a universal tool for IgM repertoire probing and mapping to a highly multidimensional feature space. The information in the reactivity profiles when all features are used is so rich that it makes practically each patient unique and a generalization impossible. In addition, the “curse of dimensionality” makes differentiating in 582 dimensional space hard. Therefore, a feature selection step would be necessary to construct a predictor for any diagnostic task.
A combination of filtering and wrapping feature selection techniques was applied next. The filtering method used was a selection of individual features with highly significant expression in at least one patient. The wrapping techniques were recursive feature elimination followed by a forward selection algorithm. The feature to remove (respectively to add) at each step was selected so as to improve maximally the separation of the patient data clusters of interest when mapped on the remaining features. This iteration was repeated until no further improvement of the separation is possible (see
Matthews correlation coefficient (MCC) as a measure of the quality of SVM models using different optimized feature sets. The models were constructed using GBM specific feature sets derived by a combination of filtering and backward and forward feature selection steps. Finally, consensus feature sets were formed from at least
Interestingly, this two-stage feature selection strategy (bootstrapping RFEDS variability and pooling recurring features) helped improve the generalization considerably. Testing of the model of 43 dimensional data with just a few cases is impractical. Therefore, the dimensionality of the IgM reactivity data was reduced from 43 to 2 using MDS. The SVM model, constructed on the basis of the two surrogate features obtained by MDS, successfully classified the GBM and non-GBM cases not only of the training but also of the testing set of sera (
Multidimensional scaling plot of cases in batch “R” based on the set of GBM-related features found in at least 15/28 of the “leave on out” patient groups. See
Thus, we were able to show that a rationally designed small library of 586 IgM mimotopes contains potentially a huge number of mimotope profiles that can differentiate randomly selected diagnoses after appropriate feature selection.
High-throughput omics screening methods have extracted profiles from different dynamic diversities (proteome, genome, glycome, secretome, etc.) and used them as biomarkers. The use of the antibody repertoire as a source of biomarkers has also been defined and approached in multiple ways. First came the technically minimalistic, but conceptually loaded, semiquantitative immunoblotting, developed 20 years ago (
The deep panning approach relies on next-generation sequencing (NGS) and thus requires balancing between sequence fidelity and diversity. Even with diversity affected by discarding sequences of one and two copies on the one hand, and overgrowth of phage clones on the other, our strategy still manages to find a general representation of the mimotope sequence space by identifying clusters of mimotopes. This relatively small set of sequence classes is hypothesized to be related to the modular organization of the repertoire defined previously (
The central role of prolines in the nAb mimotopes has been observed previously (
The mimotope library of diversity 105 derived by deep panning reflects the recurrent IgM specificities found in the human population. A library of random peptides with sequences selected to be least related to the observed 790 cluster profiles reacted very weakly with IgM from patients' sera. This fact suggests that not only does the library of a little over 200,000 mimotopes represents the IgM mimotope space but also that the 790 cluster profile matrices are collectively a promising model of it. The good coverage of the IgM reactivity space by this mimotope library most probably is facilitated by the polyspecific binding of IgM and the small, flexible peptides.
Although the large mimotope library can be used as is in peptide arrays when applicable, its size is not very practical for routine diagnostics. The classification in 790 clusters was used to produce a smaller and more applicable library, SYM, for clinical use. It contains basically representative sequences from the most significant clusters. SYM represents more efficiently the mimotopes' main reactivity patterns found in the phage selection experiment when compared with seven other libraries chosen to represent key alternative concepts. The precision of that representation can be adjusted by expanding the small library if necessary. Including more mimotopes from the set of 224,087 can be done in a similar fashion, sampling further the existing sequence clusters. Another improvement may be to include a couple of related sequences to each of the mimotopes, for example, those immediately adjacent in the same cluster, for a statistically robust signal.
An interesting though not unexpected property of the public IgM igome found is its idiotypic connectivity. Overlap with immunoglobulin variable domain J regions proved the prominent feature of the human protein sequence fragments homologous to the peptides in SYM library. The actual epitopes of IgM should be mostly conformational. Nevertheless, both linear idiotypic epitopes (idiotopes) and fragments of them are probably represented in the CDR3 loops so as to produce statistically detectable signal in the BLAST results. It has long been known that linear epitope models yield clear structural idiotypic representation in CDR3 loops (
SYM could be used as a tool for the study of the IgM repertoire, as a source of mimotopes for design of immunotherapeutics (
The optimal feature set for GBM diagnosis we find has 43 mimotopes. If the library provides in the order of 500 significant reactivities and the profiles are typically of around 50 features, the theoretical capacity of this approach is >1070 different subsets. This is an estimate of only the qualitative outcome—presence or absence of reactivity. Thus, the information provided by a typical IgM binding assay with the library is probably enough to describe any physiological or pathological state of clinical relevance reflected in the IgM repertoire. Of course, this is just an estimate of the resolution of the method. The number of naturally occurring profiles and their correlation with clinically relevant states will determine the actual capacity. Another important consideration is the significant probability to find profiles correlating to any state by chance. Therefore, extensive testing of the models to prove their ability to generalize is indispensable.
The novelty of our approach is based on the combination of several previously existing concepts.
First, early studies have argued that the physiologically autoreactive nAbs comprise a consistent, organized immunological compartment (
Second, germline variable regions are characterized by polyspecificity or cross-reactivity with protein and non-protein antigens (
Third, peptide arrays have been used for some time now as probes of the antibody repertoire (
The phage display-generated library provides a rich source of mimotopes that can be screened for different theranostic tools focused on particular targets. On an omics scale, the smaller optimized mimotope library proposed here probes the repertoire of broadly expressed IgM reactivities efficiently, mapping its dynamic diversity to a space of potentially over 1070 distinct profiles. The major tasks ahead are (1) exploring the concept of reproducibility for the sequences of IgM mimotopes by further deep panning experiments and (2) designing studies aimed at efficiently extracting specific diagnostic profiles and building appropriate predictors, for example, for predicting immunotherapy responders or side effects and predicting the risk of malignancy in chronic inflammation as well as other conditions involving immune activity.
The datasets generated for this study can be found in the GITHUB (
The studies involving human participants were reviewed and approved by Ethics Committee at the Medical University Sofia. The patients/participants provided their written informed consent to participate in this study.
AP conceptualized the project, analyzed the results performing all the
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors wish to thank Prof. Radha Nagarajan, Prof. Ivanka Tsakovska, and Prof. Soren Hairabedyan for critically reading the manuscript.
The Supplementary Material for this article can be found online at: