Edited by: Thomas Herrmann, Julius Maximilian University of Würzburg, Germany
Reviewed by: Brian M. Baker, University of Notre Dame, United States; David A. Scheinberg, Memorial Sloan Kettering Cancer Center, United States
This article was submitted to T Cell Biology, a section of the journal Frontiers in Immunology
In Memoriam: This paper is dedicated to the memory of Prof. Vincenzo Cerundolo.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Adaptive immune recognition is mediated by specific interactions between heterodimeric T cell receptors (TCRs) and their cognate peptide-MHC (pMHC) ligands, and the methods to accurately predict TCR:pMHC interaction would have profound clinical, therapeutic and pharmaceutical applications. Herein, we review recent developments in predicting cross-reactivity and antigen specificity of TCR recognition. We discuss current experimental and computational approaches to investigate cross-reactivity and antigen-specificity of TCRs and highlight how integrating kinetic, biophysical and structural features may offer valuable insights in modeling immunogenicity. We further underscore the close inter-relationship of these two interconnected notions and the need to investigate each in the light of the other for a better understanding of T cell responsiveness for the effective clinical applications.
Specific molecular interactions between heterodimeric T cell receptors (TCRs) and their cognate peptide-MHC (pMHC) ligands contribute to the nature of ensuing adaptive immune response. A better understanding of TCR:pMHC interaction is required to be able to harness adaptive T cell immunity effectively for vaccines and therapeutics. Unfortunately, the mechanisms underpinning cross-reactivity and antigen specificity of peptide-specific TCRs remain puzzling, and leaves the community with an incomplete picture of T cell recognition.
Cross-reactivity is defined as the capacity of a TCR to recognize more than one peptide-MHC molecule. The idea was first postulated by Matzinger and Bevan (
Although it is known that T cells can recognize peptide and non-peptide antigens, it is now well-accepted that peptide-specific TCRs exhibit high levels of cross-reactivity. In fact, it has been proposed that a single TCR can recognize 104-107 different MHC-associated epitopes (
Recent biological and computational advances to screen antigenic peptides and profile TCR repertoires have greatly improved our understanding of the TCR:pMHC interaction. However, the picture is far from complete. As yet it is not possible to, (a) predict TCRs recognizing a given antigen, or (b) predict antigens recognized by a given TCR. Methods to accurately predict biological specificity or cross-reactivity would have profound clinical, therapeutic and pharmaceutical applications in designing cellular therapies for fighting cancer, autoimmune and infectious diseases.
New biological methodologies have enabled definition of cross-reactive peptides using high throughput screens against a series of TCR molecules and some can screen whole cells (
Understanding the underlying mechanisms of common antigen specificity of TCRs, on the other hand, has been the focus of key research over the past few years. A number of recent studies have demonstrated the plausibility of identifying shared motifs amongst tetramer-specific TCRs (
Although cross-reactivity and common antigen-specificity have been investigated individually, understanding the relationship of the two closely interconnected notions seems to be underrepresented in the research community. Whereas, in order to set a foundation for better understanding T cell responsiveness for effective clinical applications, these two pillars of the adaptive immunity can be and should be investigated together and each in the light of the other.
Here we review the recent advances in the understanding of both cross-reactivity and common specificity of T cell recognition mainly from a computational perspective. We will discuss current experimental and computational approaches to investigate cross-reactivity, and highlight how integrating kinetic, biophysical and structural features may offer valuable insights in modeling immunogenicity against TCRs. We will then discuss the progress and limitations in assigning antigen-specific TCRs based on their shared features. Lastly, we will underscore the close inter-relationship of these two principles and how recent single cell technologies are poised to shed further in this area.
TCR cross-reactivity, which was coined in the late twentieth and early twenty first century, has become recognized as a common feature of TCR recognition (
Prior exposure of degenerate T cells can induce polarized response to a pathogen or vaccination (
While naïve T cells expressing self-reactive TCRs survive due to the low avidity or low expression of peptides derived from self-proteins (
From a clinical perspective, recent immunotherapy trials have highlighted off-target toxicities triggered by cross-reactivity of high affinity TCRs, where adoptive T cell transfer trials with high-avidity DMF5 TCR targeting the HLA-A*02:01 MART-1 melanoma peptide showed a greater promise than DMF4 TCR for cancer treatment but also triggered autotoxicities (
Inability to detect potential toxicities through initial safety evaluation highlighted the need to develop technologies to assess the cross-recognition potential of each TCRs engineered for clinical uses. In recent years, technologies to extensively characterize the recognition pattern of TCR:pMHC have emerged [reviewed in (
Briefly, large combinatorial peptide libraries (
With the help of combinatorial peptide libraries and single amino acid analogs, the “hotspots” crucial for potential off-target cross-reactivity have been characterized (
While binding of recombinant TCR and pMHC molecules provide essential information, previous studies reported high-affinity, yet non-stimulatory, interactions occur with high frequency in the human T cell repertoire (
Although recent approaches provide increased flexibility to investigate the degeneracy of TCRs, they remain limited in (i) the number of possible TCRs that can be tested against peptide libraries in a single experiment, (ii) the number of peptides compared to the actual number of ligands that might be encountered, (iii) the need to prepare a new peptide library for each analysis of pMHC specificity, (iv) the high number of false positive and negative peptides resulted from screening, and (v) often the requirement to generate individual recombinant TCR, T cell clones, or reporter cells expressing TCR for screening. Some approaches in ongoing development do offer the potential to obtain high-throughput biological data using primary unmodified polyclonal T cells (
Moreover, current strategies of generating a single amino acid analog library rely on replacing a pre-established peptide target with one amino acid at a time. However, such an approach may underscore the possibility of duplex or triplex amino acid substitutions or even largely different peptides to trigger a TCR response (
Moreover,
For example, Kasprowicz et al. observed preferential directionality from Hepatitis C Virus (HCV) to Influenza A Virus (IAV) i.e., a T cell primed with an HCV-derived peptide was capable of recognizing an IAV-derived peptide but the opposite was not true (
Indeed, several groups have started to use modeling approaches to test various hypotheses on TCR:pMHC interaction propensities (
In addition to models predicting TCR:pMHC interactions, models to relate TCR:pMHC binding parameters and antigen doses to T cell response have also been proposed [reviewed in (
In 1998, Don Mason argued in favor of the necessity for cross-reactivity (
Several attempts to estimate the polyspecificity of TCRs have been performed. These include: (i) generation of mutant peptides with amino acid substitutions and testing the impact of substitution on T cell activation and/or cytotoxicity (
For instance, in a recent TCR fingerprinting study, Karapetyan et al. investigated which amino acids at each position are essential for 1G4 TCR binding, activation and killing by sequentially replacing every amino acid position outside of anchor positions 2 and 9 with 19 alternative amino acids. The peptides were analyzed using three
Instead of scanning a single TCR, a few algorithms have been designed to predict immunogenicity of a peptide against a pool of TCRs by the use of sequences (
List of algorithms to predict immunogenicity.
Tung et al. ( |
Trained on 9-mer HLA-A2 restricted peptides. From MHCPEP, SYFPEITHI and IEDB, consist of 558 immunogenic, 527 non-immunogenic peptides | Decision tree learning methods to identify informative physicochemical properties from 531 physicochemical properties retrieved from version 9.0 of amino acid index (AAindex) database. Support vector machine with a weighted string kernels for immunogenicity prediction (named POPISK) | Top AAindex contributors: (i) Retention coefficient in HPLC, pH2.1, (ii) Principal property value z2, (iii) Hydrophobicity scale from native proteins, (iv) Normalized composition of membrane proteins, and (v) pK-C. Found positions 4, 6, 8, and 9 critical for 9-mer peptide |
Calis et al. ( |
Trained on 9-mer from MHC-I associated peptides. From IEDB and three immunogenicity studies in mice ( |
Per non-anchor residue of the presented peptide, log enrichment score calculated as ratio between the fraction of specific amino acid in immunogenic vs. non-immunogenic data, then score weighted to the importance of that position measured as Kullback-Leibler divergence. The weighted log enrichment scores of all (non-anchor) residues summed as immunogenicity score | Preference for residues with larger or aromatic side chains Positions 4–6 critical for 9-mer peptide |
Trolle and Nielsen ( |
Trained on 9-mer peptides covering 9 HLA alleles. From 295 T cell epitopes from SYFPEITHI and 1,216 T cell epitopes from IEDB, allele-balanced training data created by randomly selecting 50 epitopes from each of 9 HLA alleles except 2 alleles having 14 epitopes each, Total 378 epitopes | Weighted sum of pMHC binding affinity [NetMHCcons ( |
Performance gain obtained by summing pMHC binding affinity, pMHC stability predictions and T cell propensity than individual predictions |
Chowell et al. ( |
Trained on 9-mer H-2Db and HLA-A2 restricted peptides (separately for two ANN-Hydro models). From IEDB, 204 immunogenic and 232 non-immunogenic (self-peptides from MHC ligand elution experiment with no known immunogenicity) for H-2Db, and 372 immunogenic and 201 non-immunogenic peptides for HLA-A2 | Hydrophobicity-based artificial neural network (ANN-Hydro) based on numeric sequence of amino acid hydrophobicity | Strong bias toward hydrophobic amino acids at TCR contact residues (P4, P6, P7, and P8 for 9-mers) within immunogenic epitopes. Negative correlation between polarity of amino acids and immunogenicity |
Łuksza et al. ( |
Trained on 2,552 MHC-I immunogenic peptides from IEDB. Neoantigens with mutations generated from non-hydrophobic, wild-type residues at positions 2 and 9 excluded (as prediction of MHC affinities for wild-type peptides with non-hydrophobic anchor residues led to non-informative amplitudes) | Recognition potential of a neoantigen = A × R, where amplitude (A) is relative probability that a neoantigen is presented on MHC-I whereas its wild-type counterpart is not, and R is probability that neoantigen will be recognized by TCR repertoire. R defined by a multistate thermodynamic model, treating sequence similarity as proxy for binding energies | High sequence similarity of a given neoantigen with epitopes in IEDB by gapless alignment with BLOSUM62 amino acid similarity matrix |
Bjerregaard et al. ( |
From 13 publications, analyzed total 1,948 peptide-HLA complexes, of which 53 reported immunogenic | HLA binding prediction by NetMHCpan-4.0. Similarity between each neo- and normal peptide using kernel similarity measure proposed by Shen et al. ( |
High predicted binding score (HLA binding strength). Peptide sequence dissimilarity to self (wild-type counterpart of the neopeptide), especially for those with comparable HLA binding |
Pogorelyy et al. ( |
Trained on 9-mer peptides. From ( |
Principal component analysis and dimensionality reduction on 10-dimensional vectors of Kidera factor sums for each epitope. Fit multinomial Gaussian model using expectation maximization to estimate probability of being immunogenic | Distinct physicochemical properties in Kidera space |
Jurtz et al. ( |
Trained on 8,920 TCRβ CDR3 sequences and 91 HLA-A2 cognate peptides obtained from IEDB. 379 TCR and 16 peptides from the MIRA assay in ( |
Convolutional neural networks (CNN) to predict whether a given TCR is able to recognize a specific peptide, with amino acid sequences of peptide and CDR3 region of TCRβ chain as input. CNNs scans the input and detects pattern to be integrated into network (named NetTCR) | Conserved sequence patterns of peptide-TCR pairs encoded by BLOSUM50 matrix |
Smith et al. ( |
Trained on 8-11mer 141 epitopes from MHC-I H2b and H2d haplotypes | Using amino acid features (tiny, small, aliphatic, aromatic, non-polar, polar, charged, basic and acidic), variables derived by presence/absence of each feature at each absolute and relative position, at site of SNV mutation, at being/middle/end residues, difference of each feature in mutated vs. reference antigen. Most predictive features into gradient boosting algorithm and trained by 10,000-fold cross-validation | Peptide biochemical features: valine at position 1, valine at last position, small amino acids at the last position, basic amino acids of the reference at the mutated position, changes in the mutated position to a small amino acid, lysine at relative site 1, and presence of valine within the first 3 positions |
Ogishi and Yotsuyanagi ( |
Trained on 8–11 mer MHC-I and 11–30 mer MHC-II peptides. From IEDB, LANL HIV and HCV database and TANTIGEN database, 6,957 HLA-I and 16,642 HLA-II immunogenic peptides. 191,326 TCR CDR3β sequences obtained from MiXCR | TCR-peptide contact potential profiling (CPP) by optimal alignment between CDR3β (randomly down-sampled to 10,000 sequences) and peptides and using pairwise contact potential scales from AAindex. Peptide sequence-based estimates of physicochemical properties (= peptide descriptors) using: |
Physicochemical and CPP features: features from short (3- and 4-aa) and longest (8- and 11-aa for MHC-I and MHC-II, respectively) fragments, skewness- and kurtosis-derived features and AAindexes, including inverse of modified Miyazawa-Jernigan transfer energy, inverse of quasichemical energy in an average protein environment from interfacial regions of protein-protein complexes, and distance-dependent statistical potential within 10–12 Å |
Riley et al. ( |
Trained on 9-mer HLA-A2 restricted peptides. 155 immunogenic from IEDB, 2,756 HeLa HLA-A2 binding self-peptides and 1,044 HLA-A2 non-binders | A feed-forward neural network with inputs describing structural and structure-based energetic features of 9-aa in peptide sequence and peptide-HLA complex. Structural and energy features are those comprising Talaris 2014 energy function or derived from Table S3 ( |
Structural and energic features: van der Waals interaction, hydrophobic solvation, Coulombic potentials, hydrogen bond energies, side chain rotamer energies, and solvent accessible surface areas (SASA) |
In order to predict antigens with high potential for cross-reactivity and off-target toxicity, Jaravine et al. developed Expitope 2.0 that allow analysis of tissue-specific gene expression pattern and prediction of potential side effect in normal tissue, with the ultimate aim of selecting a safer pool of vaccine targets for personalized immunotherapy (
Although there have been several attempts to predict immunogenicity, the dual nature of the peptide-specific TCR recognition interface, comprised of both peptide and MHC, makes predicting interaction between TCR and pMHC uniquely challenging. While much of T cell specificity is determined by the promiscuous peptides due to a relatively invariant interaction with MHC molecule (
Features associated with TCR:pMHC interaction. Description of sequence-based, structural, kinetic, and biophysical features previously found to be associated with pMHC recognition by TCR The diagram is 1G4 TCR bound to NY-ESO-1/HLA-A*02:01 (PDB 2BNR) where TCRα, TCRβ, MHC, β2-microglobulin and peptide are colored in orange, red, blue, light blue, and yellow, respectively.
In addition to the discovery of hotspot residues through TCR sequence alignments (
A collective effort has identified biological and physical parameters that modulate TCR:pMHC engagement and T cell response [reviewed in (
Although we are not currently in the position to perform an
In another application, Haider et al. aimed to engineer an affinity enhanced A6 TCR targeting Tax peptide/HLA-A2 complex (
In a recent review, Spear et al. have highlighted the significance of considering the previously unappreciated complex relationship between kinetic, cellular and structural patterns that modulate antigen specificity and TCR cross-reactivity in designing TCRs (
The 3D crystal structures of T cell receptor and their cognate pMHCs have been resolved and deposited in protein database (PDB) (
Based on the cognate peptide, MHC and TCR structures in the aforementioned database, there have been a number of attempts to accurately predict peptide-MHC conformations, including docking algorithms (
The features retrieved from structural modeling were utilized to predict TCR:pMHC complex formation (
Recent structural studies have emphasized the importance of structural and physicochemical homology in T cell receptor cross-reactivity (
However, Riley et al. questioned the notion that the pools of ligands for a given TCR is built around core regions of restricted structural and chemical space, and showed that T cell receptors can also cross-react between ligands with little structural or physicochemical commonalities. They demonstrated that the DMF5 TCR can cross-react with divergent antigens by unanticipated rearrangements in peptide and presenting MHC molecules, including binding-induced peptide register shifts. Although dramatic rearrangements did not translate into molecular mimicry, this TCR was capable of cross-reacting with distinct classes of epitopes. Likewise, cross-reactivity has been observed from unrelated pathogens even with a low level of structural homology (
These findings suggest that while structural homology may inform cross-recognition potential of peptides having the same structural configuration, current methods are suboptimal in predicting polyspecificity across different classes. Moreover, amino acid mutations at positions distant from direct recognition sites may also have a substantial effect on TCR:pMHC interaction e.g., change in binding parameters and/or structural conformation, and can only be validated by experimentation (
Given the limitations in the current methods to reflect and predict TCR:pMHC recognition, here we describe a few considerations to make in building algorithms to predict immunogenicity or cross-recognition potential.
First, a key challenge in developing machine learning and statistical models to predict immunogenicity is the lack of true negative datasets for TCR-epitope interaction as well as cross-reactivity information. Several groups tackled this limitation by simulating a background or negative data (
Additionally, the existing datasets are in a binary format of being immunogenic or non- immunogenic, whereas it is evident that the T cell response is a continuum and comes in different flavors from a mild to a very strong response and varies in functional outcomes such as differential cytokine production. Quantitative T cell response measures associated with each epitope will open a new avenue for rigorous modeling.
Second, current distance measures are mainly context specific and do not capture the true immunogenic capacity of the input peptides. For example, Grouping Lymphocyte Interactions by Paratope Hotspots (GLIPH) and TCRDist that are aimed to detect common antigen specificity groups may not be effective in estimating breadth and/or constituents of the cross-reactome. Cancer specific immunogenic neoantigens that are used for cancer vaccine targets are mainly different from the wild type by only a single point mutation. Engineered affinity-enhanced TCRs have recently been shown to generate unpredicted cross-reactivity even by a single amino acid substitution (
Third, there is a considerable heterogeneity in the experimental methodologies employed in assessing T cell responses. Although standardizing T cell assays into a single readout is practically difficult, accuracy of predictive algorithms may be enhanced by reflecting the sensitivity and specificity of assays employed for characterizing each epitope.
Fourth, up to date, exhaustive screenings have been performed based on an assumption of invariant MHC interaction. However, previous studies suggested the ability of a TCR to recognize peptides bound by non-canonical HLA molecules (
Lastly, we need to keep in mind that while TCR:pMHC interactions exhibit a remarkable capacity of discrimination, they are often sloppy and cross-reactive. Nevertheless, as exemplified by thymic selection, weaker affinities play an essential role in underpinning the sensitive detection of a wide range of cognate antigens yet keeping it well-balanced from self-reactivity (
The amino acid sequence of paired TCR defines its binding specificity. However, we still do not have a full understanding of the mechanisms underpinning the recognition of pMHC complexes by their cognate TCRs. In the last few years, there have been mathematical and computational efforts to find systematic ways to cluster TCRs based on their likely antigen specificity, a phenomenon known as defining common antigen specificity groups.
To identify TCRs specific to a given antigen, one will require to sort and sequence naïve and antigen experienced T cell repertoires. Recent advances in both bulk and single cell sequencing technologies facilitates generation of such datasets in a high throughput manner. A dedicated set of algorithms and software tools will allow computational biologist to further analyze and profile TCR repertoires (
Such complementary biological assays and computational platforms enabled robust generation and analysis of millions of TCRs in a single experiment. Importantly, the curated sequences have been deposited in databases such as VDJdb (
The accumulation of so many antigen-specific TCR sequences, on one hand, urged the development of systematic methods to group TCR sequences according to, for example, their shared antigen specificity, and on the other hand, opened an opportunity to conduct in-depth characterization of antigen-specific TCR repertoires, find shared and conserved features and develop a distance measure that permits clustering and visualization of the TCR space (
Current workflow for predicting antigen specificity of TCRs. The tetramer-sorted antigen specific CDR3β or TCRβ are clustered by distance measure defined by either global sequence similarity, motif enrichment or sequence co-occurrence pattern. Then, specificity clusters are investigated for their descriptive features, such as enrichment of common V-genes, CDR3 length, clonal expansions, and motif significance, to be considered in making the prediction of antigen specificity. Based on the collection of identified features, previously uncharacterized CDR3βs or TCRβs are predicted for their antigen specificity. The example sequences have been retrieved from (
The above mentioned rationales have formed the foundation for several recent studies trying to predict specificity groups of TCRs based on their TCR or CDR sequences (
Algorithms to predict antigen specificity of TCR repertoire.
Thomas et al. ( |
CDR3 sequences of CD4+ T cell repertoire before and after immunization | Replace each CDR3 by all possible n-mer peptides, then convert each n-mer peptide into numeric Atchley vectors | K-means clustering of Atchley vectors, count number of Atchley vectors assigned to each cluster, and generate into a feature vector. Classify the feature vector using hierarchical clustering (unsupervised) or support vector machine (supervised) |
Dash et al. ( |
pMHC-facing loop between CDR2 and CDR3 and trimmed CDR3 sequences from 4,635 paired TCRαβ sequences | Similarity-weighted mismatch distance between the potential pMHC-contacting loops of two TCRs, defined by BLOSUM62 (named TCRdist) | Sampling density nearby each TCR estimated by weighted average distance to the nearest-neighbor receptors in repertoire (a small nearest-neighbor distance, NN-distance). Each TCR repertoire clustered using “greedy” fixed-distance-threshold clustering algorithm. At each step, TCR with the largest number of neighbors within the distance threshold chosen as a cluster center and iterated for all TCRs |
Glanville et al. ( |
CDR3 from 5,711 TCRβ sequences | Global similarity by CDR3 hamming distance between two TCRs with same Vβ segment and same-length CDR3. A fold-change enrichment of local convergence motif by observed frequency of the motif over expected frequency in repeat random sampling from naïve distribution | Cluster TCRs sharing either global similarity below Hamming distance threshold (differ <2 amino acids) or share a significant motif (>10-fold enriched and <0.001 probability of occurring than in naïve TCR pool) |
Cinelli et al. ( |
CDR3 from CD4+ TCRβ sequences before and after immunization | CDR3β sequences deconstructed into k-mers, then motifs ranked according to one-dimensional Bayesian classifier score comparing their frequency in repertoires of two immunization classes | Top ranking motifs selected and used to create feature vectors to train a support vector machine for classifying into distinct clusters |
Priel et al. ( |
~360,000 TCRβ sequences from ( |
Levenshtein distance between TCRβ and cluster representative | UClust algorithm ( |
DeWitt et al. ( |
TCRβ sequences from 666 healthy individuals from ( |
Co-occurrence of global TCRβ (for genetic background) and HLA-restricted TCRβ (for immune history and receptor specificity) by analysis of covariation and hypergeometric distribution to assess significance | DBSCAN algorithm ( |
Meysman et al. ( |
Two independent datasets of 412 TCRβ from [( |
Investigated length-based distance, GapAlign score, profile score, trimer score, dimer score, Lavenshtein distance score, and VJ edit distance | DBSCAN algorithm ( |
Pogorelyy and Shugay ( |
CDR3 from TCRβ sequences from ( |
Hamming distance, allowing single substitution | TCR similarity networks by Hamming distance and identify enriched TCR network hubs by testing neighborhood size (degree) enrichment against VDJ rearrangement model using ALICE algorithm ( |
Thakkar and Bailey-Kellogg ( |
CDR3 sequences, CDR3α and CDR3β analyzed separately | Local alignment using Smith-Waterman (SW) algorithm with BLOSUM45 | Hierarchical agglomerative clustering, with CDRdist (a nearest neighbor classifier to predict label of another CDR based on nearby labeled CDRs) as a comparison function. Clusters defined by CDRdist thresholds |
Zhang et al. ( |
82,000 CDR3 sequences from 9,700 tumor RNA-Seq samples from TCGA | Pairwise alignment score with BLOSUM62, normalized by the length of longer CDR3 sequence | From pairwise score matrix, apply a predefined cut-off value (default 3.5) to filter out low scoring comparisons A depth-first search (DFS) on the matrix to identify all connected CDR3 clusters (named iSMART) |
Glossary.
Accessible surface area | Also known as solvent-accessible surface area (SASA); the surface area of a biomolecule that is accessible to a solvent. Measurement is usually described in units of square Ångstroms |
Adoptive T cell transfer | A type of immunotherapy in which T cells are given to a patient to improve immune functionality to fight diseases |
Amino acid index database (AAindex) | A database of amino acid indices and amino acid mutation matrices. An amino acid index is a set of 20 numerical values representing various physicochemical and biochemical properties of amino acids. An amino acid mutation matrix is generally 20 ×20 numerical values representing similarity of amino acids |
Clonal expansion | A process in which a small number of precursor cells recognize a specific antigen, proliferate into expanded clones, differentiate and acquire various effector and memory phenotypes |
Combinatorial peptide library | A library typically comprised of millions to billions of random peptides covering possible combinations of amino acids in each position |
Degeneracy | Ability to recognize diverse ligands |
Electrostatic potential | The amount of work needed to move a unit of charge against an electric field |
Featured peptide | A peptide with solvent-exposed, prominent side chains or harmonious bulged confirmations and typically correspond to a diverse repertoire of TCRs |
Find Individual Motif Occurrence | A motif-based sequence analysis tool that scans a set of sequences for individual matches to each of the motifs provided by the users |
Flexible docking | A macromolecular docking where the internal geometry of the interacting partners can be changed when a complex is formed |
Heterologous immunity | An immunity that can develop to one pathogen after a host has had exposure to non-identical pathogens |
Immunodominant peptide | A peptide having a strong affinity for binding with HLA and for stimulating a T cell response |
Kidera factor | A set of orthogonal physicochemical properties that reflect 20 amino acids, which include helix/bend preference, side-chain size, extended structure preference, hydrophobicity, double-bend preference, partial specific volume, flat extended preference, occurrence in alpha region, pK-C and surrounding hydrophobicity |
Molecular mimicry | A phenomena that sequence similarities between foreign and self-peptides are sufficient to trigger cross-activation of autoreactive T cells by pathogen-derived peptides |
Peptide-MHC display system | A platform with engineered functional peptide-MHC complexes for high-throughput screening of immunogenic peptides against TCRs |
Polarization | A process to adopt different functionality in response to the signals from their microenvironment |
Positional specific scoring matrix | An amino acid scoring matrix in a 20 ×20 table such that position indexed with amino acids e.g., position (X, Y), gives the score of alignment or substitution of amino acid X with amino acid Y |
Private TCR | A TCR unique to an individual |
Public TCR | A TCR shared among different individuals |
Rigid docking | A computational modeling of the quaternary structure of complexes formed by two or more interacting biological macromolecules, where the relative orientation of interacting partners was allowed to vary but the internal geometry of each of the partners was held fixed |
Rosetta terms | A set of 19 terms comprising Rosetta Energy Function 2015 (REF15), a model parametrized from small-molecule and X-ray crystal structure data, used to approximate the energy associated with each biomolecule conformation |
Tetramer-associated T cell receptor sequencing | A method to link TCR sequences to their cognate antigens in single cells at high throughput manner. Peptide-TCR binding is determined using a library of DNA-barcoded antigen tetramers |
ZAFFI score | Abbreviation for Zlab affinity enhancement; an algorithm to predict the effect of point mutations on binding affinity of TCRs. Training of energy function was performed using a dataset of systematic point mutations at 10 positions on the ovomucoid turkey inhibitor (OMTKY) molecule in four enzyme-inhibitor complexes. The optimal terms and weights for the function was obtained to fit the energies of OMTKY point mutants and tested using point mutations of T cell receptor. The terms and weights making up the score are: van der Waals attractive (0.24), van der Waals repulsive (0.017), Lazaridis-Karplus solvation (0.24), intra-residue clash (0.073) and atomic contact energy (0.32) |
While TCRs are rarely cross-reactive across HLA haplotypes (
As a result of somatic recombination, TCR sequences produce three complementary determinant region (CDR) loops, where CDR1 and CDR2 of α- and β-chains are conventionally believed to govern the interaction with an MHC molecule, and hypervariable CDR3α and CDR3β loops to guide specific engagement of TCRs with MHC-bound cognate peptides (
Based on the understanding of CDR loops with pMHC interaction, some progress has been made in predicting specificity groups of TCRs based on the similarity of short stretches of TCR amino acid sequences, known as motifs, mainly within CDR3 region (
Although current algorithms have been applied in multiple biological contexts such as Alzheimer's disease (
A number of recent studies have suggested that integrating information across all six CDRs, instead of considering CDR3α or CDRβ independently, would likely yield a higher performance (
In addition, translating CDR amino acid sequences into their physicochemical properties and using their inherent properties to cluster TCRs into specificity groups may bring another step forward. Ostmeyer et al. developed a statistical classifier of T cell receptor repertoire that distinguishes tumor tissue from patient-matched healthy tissue of the same organ (
From previous efforts to reduce dimensionality of a large number of possibly co-linear amino acids properties into small number of orthogonal properties that maintain most of the information contained in the original set, physicochemical properties of amino acids have been characterized and summarized into e.g., 10 Kidera factors (
While bulk TCR sequencing revolutionized characterization of TCR repertoire in different pathological settings e.g., tumor immunology and autoimmunity (
Recent advancement in single cell approaches opened the door for elucidating how particular α-β pairing contributes to antigen specificity. In particular, several groups have started to implement single cell platforms for simultaneous identification of TCRαβ sequence and antigen specificity in a high-throughput manner across multiple pMHCs (
The potential benefits of identifying TCR αβ pairs coupled to antigen specificities include but are not limited to: (i) identifying unique CDR3 α/β signatures dictating epitope recognition for possible applications across the field of adaptive immunity e.g., efficient design of TCRs for vaccine development or targeted immunotherapy (
Importantly, the large number of paired TCRαβs coupled to antigen specificity can be fed into computational models improving accuracy of prediction. The exhaustive list of recognition patterns combined with increasing structural information about TCR:pMHC interaction will assist prediction of specific TCR:pMHC interaction based on TCR sequence (
Despite interest in mapping the TCR:pMHC interactions, a combinatorial approach exploring the mutation space of TCRs against corresponding peptide cross-reactome has not been exhaustively performed. Thus, it would be exceptionally challenging to account for the whole range of available TCRs and surveilling pMHCs.
Depicting the cross-recognition of TCRs and pMHCs in >10 (
Interplay between unique clusters of pMHCs and TCRs. In an ideal world with an accurate distance measure, pMHCs in the same cluster should share the common specificity toward TCRs and vice versa. Each node denotes pMHC (circle) or TCR (polygon) entities and edge denote the distance with the closest pMHC or TCR, respectively.
Therefore, modeling this dynamic interplay may require the development of an accurate distance measure to group TCRs informative of their antigen specificity and/or cross-reactivity. This will require assessment of all identified features, such as paired TCRαβ sequences, n-mer motifs, physicochemical properties as well as structural, physical and kinetic parameters, to derive a minimum set of features with maximum association to immunogenicity. These features will become a toolkit for developing TCR and pMHC distance measures to discriminate >106 TCRs and >1018 peptides into designated clusters. Following the classification of clusters, the relationship between TCR and pMHC clusters can further be explored—it may segregate into a linear function or may yield an indistinct pattern where even the repertoires for closely related epitopes have divergent landscape with a very limited overlap.
Here we discuss two fundamental principles of TCR:pMHC interaction, antigen specificity and TCR cross-reactivity. Modeling the underlying principles by cellular, kinetic, and structural features will deepen our understanding on the organizational principle of TCR repertoires.
Recent technological advancements have opened doors for screening antigen-specific TCRs and cross-reactive peptides in a high-throughput manner. In particular, MHC multimer screening in combination with multimodal single cell technologies increased the breadth of T cell analysis by allowing integration of antigen specificity with immune repertoire, transcriptomic and proteomic profiling (
The present algorithms have not distinguished TCR repertoires by their functional subsets, such as CD4+ and CD8+ T cells with pro-inflammatory or regulatory functions, largely due to lack of sufficient annotations. Given that each subsets have distinct dynamics according to pathogenic conditions, e.g., viral infection, cancer or autoimmunity, utilization of subset-specific TCR repertoire may further improve predictability of epitope immunogenicity (
Along with an increasing wealth of experimental and sequencing data, there have been advancements in
For instance, recent studies have focused on a rational computer-aided approach to TCR engineering as a more predictable and safer approach to TCR design (
Ultimately, building a complete map portraying the TCR:pMHC interface will provide opportunities to describe the response to dynamic interactions in the immune system. The examples include: (i) dynamic changes of antigen-specific TCR repertoire after adoptive transfer (
HK conceived and designed the study. CL conducted literature review. HK and CL wrote the manuscript with contributions from GN, MS, GO, and AS. HK and AS supervised the project. All authors contributed to the interpretation of the observations.
GO has served on advisory boards or holds consultancies or equity with Eli Lilly, Novartis, Janssen, Sanofi, Orbit Discovery and UCB Pharma, and has undertaken clinical trials with Atopix, Regeneron/Sanofi, Roche. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
HK would like to dedicate this study to Prof. Cerundolo who introduced him to the amazing world of T cell immunology. We wish to thank Omer Dushek, Agne Antanaviciute, Paul Buckley, Jeongmin Woo and Isaac Woodhouse for critical reading of the manuscript.