Using the T Cell Receptor as a Biomarker in Type 1 Diabetes

T cell receptors (TCRs) are unique markers that define antigen specificity for a given T cell. With the evolution of sequencing and computational analysis technologies, TCRs are now prime candidates for the development of next-generation non-cell based T cell biomarkers, which provide a surrogate measure to assess the presence of antigen-specific T cells. Type 1 diabetes (T1D), the immune-mediated form of diabetes, is a prototypical organ specific autoimmune disease in which T cells play a pivotal role in targeting pancreatic insulin-producing beta cells. While the disease is now predictable by measuring autoantibodies in the peripheral blood directed to beta cell proteins, there is an urgent need to develop T cell markers that recapitulate T cell activity in the pancreas and can be a measure of disease activity. This review focuses on the potential and challenges of developing TCR biomarkers for T1D. We summarize current knowledge about TCR repertoires and clonotypes specific for T1D and discuss challenges that are unique for autoimmune diabetes. Ultimately, the integration of large TCR datasets produced from individuals with and without T1D along with computational ‘big data’ analysis will facilitate the development of TCRs as potentially powerful biomarkers in the development of T1D.


INTRODUCTION
A T cell receptor (TCR) determines antigen specificity of T cells by interacting with a peptide-major histocompatibility complex (peptide-MHC), and signals received through the TCR along with the CD3 complex are the primary components that regulate function and fate of T cells. Individual T cells express unique TCRs, and therefore TCR sequences can be used as an identifier of T cells that are specific to particular antigens and involved in immune responses. In this review, we will focus on the potential use of TCR sequences as non-cell based T cell biomarkers for type 1 diabetes (T1D), a tissue-specific autoimmune disease targeting insulin-secreting pancreatic beta cells (1)(2)(3).
Several features of self-reactive T cells make it challenging to develop T cell biomarkers in diabetes (4). First, the frequency of autoreactive T cells is extremely low in the peripheral blood, estimated to be 1/10 5 -1/10 6 . Second, response to peptide-MHC by autoreactive T cells tends to be minimal compared to anti-cancer or anti-pathogen T cell responses (5,6). Third, healthy individuals with T1D-risk MHC molecules can have autoreactive T cells that are quantitatively and functionally similar to those in T1D patients (7). TCR sequencing allows for the analysis of TCR clonotypes from tens of millions of T cells using nucleotide samples rather than living cell biospecimens and may overcome many of these challenges when appropriately utilized. Advantages provided by TCR biomarkers include (1) living T cells are not required for assays; (2) intra-and inter-assay variations due to cell conditions and operator performance are minimized; and (3) extremely infrequent and low-responding T cells are detectable by recently emerging high-throughput sequencing technologies. Here, we will review current knowledge about TCR repertoires and clonotypes specific for T1D and address the knowledge gaps to develop TCR biomarkers that can stratify individuals throughout the stages of T1D development.

TRI-MOLECULAR COMPLEX CONSISTING OF TCR, PEPTIDE, AND MHC MOLECULES
TCRs expressed by classical T cells are composed of alpha and beta chains, both of which are formed by somatic recombination of the variable (V) and joining (J) segment genes (and diversity [D] for beta chains). In humans, 45 TRAV and 52 TRAJ genes have been identified as functional V and J segment genes for alpha chains (8,9). Likewise, there are 49 TRBV, 2 TRBD, and 13 TRBJ functional V, D, and J segment genes in the beta chain locus (10,11). During maturation in the thymus, individual T cells undergo rearrangement of segment genes, resulting in one V, one D (for beta), and one J segment genes assembled adjacent to each other. Since additional nucleotides are often inserted or deleted between the segments, billions of junction sequences with hundreds of different V, D, J combinations are possibly assembled (12)(13)(14). Experimentally, each adult person is expected to have over 100 million TCR clonotypes uniquely expressed by hundreds of billions of individual T cells in the body (15)(16)(17)(18)(19). There are three regions, called complementarity determining regions (CDR), that directly interact with peptide-MHC complexes, thereby crucial to determine antigen specificity (20)(21)(22). Two CDR regions, CDR1 and CDR2, are included in the V segment, and the CDR3 region is formed at the junction between V, D (for beta), and J segments. Amino acid residues in the CDR3 regions closely interact with peptide, and thus are considered to be important to determine antigen specificity and are often used as a property of each TCR clonotype for TCR repertoire analysis.

MHC MOLECULES IN T1D
The major genetic determinant in susceptibility to most autoimmune diseases reside in the human MHC that contains the human leukocyte antigen (HLA) region. MHC molecules are heterodimers formed between alpha and beta chains that function to present peptides to TCRs on T cells. Class I molecules are on all nucleated cells and present antigens to CD8 T cells, while class II molecules are expressed by antigen presenting cells (e.g. B cells, dendritic cells, and macrophages) and present peptides to CD4 T cells. In T1D, specific HLA class I and II alleles are associated with increased risk (23,24). Several HLA class I and II alleles confer risk for T1D and are associated with other autoimmune disorders ( Table 1) (25,26). DR is in close linkage disequilibrium with DQ such that the DR4-DQ8 and DR3-DQ2 haplotypes confer the greatest risk for T1D development. Both the alpha and beta chains of DQ molecules are polymorphic, and have the ability to form mixed molecules in cis and trans. As an example, the alpha chain of DQ2 can pair with the beta chain of DQ8 to form DQ8-trans (DQA1*05:01-DQB1*03:02) when both DQ2 and DQ8 are in the genotype. DQ8-trans has an odds ratio of disease development for T1D at 35 (35 times more likely to develop diabetes compared to those without these alleles), compared to odds ratios of~11 and~4 for DQ8 and DQ2, respectively (27,28). Interestingly, HLA-DQ6 (DQA1*01:02-DQB1*06:02) provides dominant protection for T1D development with an odds ratio of only 0.03 (29,30). The stark dichotomy of risk between DQ molecules highlights the important role of antigen presentation to TCRs in T1D.

DIVERSITY OF TCR REPERTOIRES
Adults have approximately 10 8 -10 10 unique TCR clonotypes (15,17,18,31). With an assumption that the TCR repertoire size may represent a capacity for responding to diverse antigens, the TCR repertoire diversity in the blood has been examined to determine whether it is associated with immune conditions. For example, having diverse TCR repertoires is associated with desirable responses to immune therapies in cancer (32)(33)(34). In T1D, it has been reported that TCR repertoires in peripheral blood of T1D patients are less diverse compared with those without T1D (35). Thus, there may be trends of TCR repertoire sizes that are preferred by a certain immune condition. However, it should be noted that the diversity of TCR repertoires cannot specify a certain disease.

USE OF TCR CLONOTYPES AS SURROGATES TO QUANTIFY ANTIGEN-SPECIFIC T CELLS
TCR clonotypes determine antigen specificity, and therefore they can be utilized as a surrogate marker to evaluate the presence and prevalence of antigen-specific T cells in the blood. Frequencies of these antigen-specific TCR clonotypes can be quantified by highthroughput sequencing, which is expected to be more specific to individual diseases compared to surveying the broad TCR repertoire. Furthermore, once a panel of antigen-specific TCR clonotypes are determined, a single TCR sequencing assay allows for evaluating specificity to many antigens rather than needing to test specificity to each individual antigen. TCR sequencing has been done from different tissues in many disease states (36), including autoimmune disorders (37) and cancer (38)(39)(40).
Remarkably, TCR sequencing has been shown to differentiate early-stage cancer patients from healthy individuals (41,42). This strategy requires a list of TCR clonotypes beforehand that can be searched in blood samples, and such TCR clonotypes used as surrogate biomarkers need to satisfy three factors: (1) publicity (i.e. commonality and shared between individuals), (2) abundancy, and (3) disease specificity. Namely, T cells expressing the same or similar TCR clonotypes need to be commonly present in a number of people; frequency of such T cells in the blood of each person needs to be high enough for quantification; and presence or absence of such T cells needs to be associated with a disease state. In addition, with larger numbers of TCR clonotypes in a given panel, the more specific and sensitive an assay will become. Thus, identifying diseasespecific TCR candidates is essential to establish a robust TCR sequencing assay that can discriminate a subset of individuals having a specific stage or feature of T1D such as those who have potential to respond to an interventional therapy. There are several strategies to identify disease-specific TCR clonotypes. Since a significant portion of disease-specific TCRs are likely to recognize islet antigens, TCR clonotypes expressed by islet antigen-specific T cells are reasonable candidates for TCR biomarkers. Such T cell sources include peripheral blood T cells responding to islet antigen stimulation or enriched by staining with fluorescence-conjugated multimers consisting of an isletderived peptide and a particular HLA molecule (43)(44)(45). Alternatively, TCR clonotypes identified in the target organ (i.e. pancreas or pancreatic islets) or draining lymph nodes may be also disease-specific. In any of these T cell sources, specificity (i.e. potential contamination of non-disease associated T cells) as well as sensitivity (i.e. missing a portion of antigenspecific T cells) needs to be carefully considered. For example, T cell samples enriched by antigen stimulation may contain only a few clonotypes that readily proliferate in response to the stimulation or could be non-specific T cells that proliferate due to "bystander effect." Likewise, T cells in the pancreas and pancreatic lymph nodes may not necessarily be islet-reactive or disease-specific (46). On the other hand, T cell populations enriched by multimer staining may contain only those having high affinity to bind peptide-MHC complexes, and TCRs weakly binding to peptide-MHC may be missed. This possibility is likely important for autoreactive TCRs since T cell responsiveness to self-antigens tends to be low compared to pathogen T cell responses. Nevertheless, identifying TCR clonotypes from samples enriched with antigen-specific T cells is indispensable to identify disease-specific TCR candidates. These TCR clonotypes should then be assessed for frequency in peripheral blood of individuals with different stages of T1D to determine the ultimate association with disease status. The next subsections will summarize features of TCR clonotypes specific to islet-specific autoantigens as well as those potentially associated with T1D pathogenesis.

Lessons From Islet-Specific TCRs in T1D Animal Models
Non-diabetic (NOD) mice spontaneously develop autoimmune diabetes and represent many features of human T1D including a T1D-susceptible MHC allele (I-A g7 ), homologous to HLA-DQ8, the development of insulin autoantibodies prior to diabetes onset, and insulitis. A number of T cell clones reacting with islet tissues have been isolated from pancreatic islets and spleens of NOD mice in the past few decades and further characterized for antigen specificity as well as TCR clonotypes (47). In the 1990's, Santamaria and colleagues discovered that a large portion of CD8 T cells infiltrating NOD islets share an identical Valpha segment (i.e. TRAV16) along with a specific junction motif (i.e. MRD or MRE) (48), and subsequently identified a peptide derived from islet-specific glucose-6-phosphatase catalytic subunit-related protein (IGRP) as an epitope targeted by these CD8 T cells (49). Likewise, CD4 T cell clones as well as Thybridoma cells that are reactive to insulin B-chain peptides have been established from NOD islets by a number of investigators using different methods (50)(51)(52)(53)(54)(55)(56). The majority of these T cells expresses TCRs containing specific Valpha and Jalpha segment motifs, TRAV5D-4 or TRAV10 along with TRAJ53 or TRAJ42. When mice are forced to have only T cells expressing TCRs containing TRAV5D-4, approximately one percent of CD4 T cells becomes specific to an insulin B chain 9-23 peptide (57), and the mice are susceptible to develop anti-insulin autoimmunity (58,59).
Alanine scanning and crystal structure analyses identified several amino acid residues in the TRAV5D-4 and TRAV10 CDR1 and CDR2 regions that are crucial to interact with the insulin peptide-MHC complex (59,60). Also, among insulin B chain-specific CD4 T cells, those particularly recognize an insulin B chain 12-20 peptide prefer to express TCR beta chains containing a negatively charged amino acid (i.e. aspartic acid [D] or glutamic acid [E]) in the junction region (56). This observation is consistent with a notion that the I-A g7 T1D-susceptible MHC class II molecule, which has a positively charged patch in the surface area near the p9 pocket due to the lack of a negatively charged amino acid residue at the beta 57 position, engage TCRs having a negatively charged residue when p9 of peptides is not negatively charged (the position 20 of insulin B chain is glycine). Thus, these studies provide a molecular elucidation of how TCR motif selection occurs by interaction with a particular peptide-MHC complex.
T1D-specific TCR repertoires in rat models have been extensively studied by the group of Mordes and Blankenhorn (61). Of note, diabetes-susceptible rat strains have a T1D risk MHC haplotype (RT1B/D u ), which lacks a negatively charged amino acid residue at the beta chain 57 position and is homologous to HLA-DQ8 (61). In addition to the HLA gene locus, Iddm14, which contains the TCR beta chain genes, is a T1D-susceptible locus (62). The group identified a TCR Vbeta allele, Tcrb-V13S1A1, that is shared among T1D-susceptible rat strains but not with T1D-resistant ones (63), and demonstrated that genetic elimination of this allele or depletion of T cells expressing TCRs containing Vbeta13a (product of the Tcrb-V13S1A1 gene) abrogates diabetes development in T1Dsusceptible rats (64)(65)(66). A series of these studies elegantly linked the genetic risk with a functional mechanism in which a particular TCR motif facilitates T1D development with a specific MHC molecule.
In sum, these animal studies demonstrate the presence of preferred TCR motifs in both germline-encoded and rearranged regions to recognize particular epitope sequences, which can be reasonably explained by molecular interaction between the TCRpeptide -MHC molecule. From a view of TCR biomarker development, TCR motifs shared by antigen-specific or disease-susceptible T cells can be utilized to enrich and classify TCR clonotypes that are distinctive of T1D.

TCR Repertoires in the Pancreas of Humans
Emerging sequencing technologies and increasing availability of human samples, in particular pancreas and peripheral immune tissues isolated from organ donors having T1D, facilitate identification of islet antigen-specific or T1D-associated T cells and TCR clonotypes (7,(67)(68)(69)(70)(71)(72)(73)(74). In the 1990's, two groups in Spain and Japan separately analyzed TCR repertoires in the pancreas and demonstrated clonal expansion of T cells with particular Vgene segment usage in individual patients (75,76). Importantly, the same group in Spain demonstrated that a clonally expanding TCR in islet and pancreas samples was detected in the blood of the same individual, indicating that islet-residing TCR clonotypes are detectable in peripheral blood samples (77). More recently, Brusko and colleagues further corroborated this concept by studying a larger number of individuals using a next generation sequencing technology that allows to analyze much higher numbers of T cells (72,78). This high resolution analysis discovered that CD8 TCR clonotypes in the pancreas and draining lymph nodes are detected in peripheral blood more frequently than those expressed by CD4 T cells and provided important insights about the depth of TCR sequencing to achieve quantitative measurement.
Another important concept is to consider commonality of TCR repertoires in the pancreas across patients. We recently determined thousands of TCR clonotypes expressed by T cells in the islets of organ donors with and without T1D (73,74). Our analysis indicated clonal expansion in the pancreas of individual donors regardless of the disease, but also found that the frequency of TCR clonotypes shared between donors is limited. This low frequency of shared TCR clonotypes may be due to diverse HLA restrictions present in different individuals. Another reason could be the fact that T cells in the islets may not be necessarily islet-specific. Indeed, multiple studies analyzing islet T cell specificity found that over half of T cell clones and lines derived from the islets did not respond to preproinsulin and other known islet epitopes (46,70,71,73,74). However, it should be noted that collecting TCR clonotypes from a larger number of donors significantly increases the number of shared clonotypes and such large TCR repertoire information allows for identifying common motifs even when not sharing entire TCR sequences, which will be essential to precisely cluster TCRs recognizing the same epitope (see below regarding TCR clustering). Thus, continuing efforts to accumulate TCR sequence information from the target organ along with epitope identification is crucial to establish a sufficient list of TCR clonotypes that can be used for disease-associated TCR biomarkers.

Islet Antigen-Specific TCR Clonotypes in Humans
TCRs expressed by islet-reactive T cells may be another optimal source that can be used as clonotypes for T1D biomarkers, especially if they circulate in the peripheral blood. Such clonotypes could come from T cell clones, T cell lines, hybridomas, and transductant cells that have been confirmed to respond to islet antigens, cell subsets enriched by multimer staining, and those activated or proliferated by antigen stimulation. TCR clonotypes for which reactivity to epitopes has been confirmed at a single cell level would be the most reliable source. Here we summarize islet antigen-specific TCR clonotypes that were isolated from individuals having T1D ( Table 2). To date, over a hundred TCR alpha and beta paired sequences specific to common islet epitopes have been reported by a number of investigators, and it is notable that the majority of these TCRs were identified in the past several years (7,45,70,73,74,. However, hundreds of disease-associated TCR clonotypes are far too small to cover T1D patients having heterogeneous antigen specificity. With rapidly evolving sequencing technologies, future efforts to identify islet epitope-      specific TCR clonotypes is essential to develop TCR biomarkers for T1D. In addition to TCR clonotypes listed in Table 2, Bonifacio and colleagues reported hundreds of TCR sequences expressed by T cells that were stained with multimer composed of islet epitopes or those proliferated in response to islet antigens (44,45). While it is necessary to carefully validate true reactivity to antigens, this type of analysis is an excellent resource to gain T1D-associated TCR clonotypes. Computational tools to decrease the "noise" (i.e. eliminating non-specific binding TCR clonotypes) may help to enrich truly antigen-specific clonotypes (100,101). Further, these candidate TCR clonotypes could be validated for disease specificity using larger cohorts analyzed with whole blood TCR sequencing, and then clonotypes that were detected only in individuals having various stages of T1D could be assessed for functional reactivity (Figures 1A-C).
Retro/lentiviral transduction systems, especially in a moderate to high throughput multiplex assay, will facilitate verifying reactivity to antigens (82,98,102,103).

Identification of Disease-Specific TCR Clonotypes Using Big Data
Big data analysis, which seeks to classify TCR repertoires in a specific condition using a large number of TCR samples, is an emerging strategy to identify disease-associated TCR clonotypes. A major advantage of this approach is the capability to identify disease-associated TCR clonotypes without knowing antigen specificity, thereby allowing one to include TCRs that are potentially disease-associated but not islet-specific and also those having low affinity to antigens. Indeed, specificities of large proportions of T cells in the islets are unknown (46,(70)(71)(72)(73)(74). Virus infections such as enterovirus and coxsackie B virus (CVB) are suggested to be involved in T1D development (104)(105)(106), and TCRs specific to these viruses could be identified by big data analysis by comparing TCR repertoires of individuals having or not having different stages of T1D. Although it has been demonstrated in infectious diseases that big data analyses can identify pathogen-specific TCR clonotypes, it has not yet been successful at identifying T1D-associated TCR clonotypes using PBMC samples from individuals with or without different stages of T1D. This could be explained by several possibilities: (1) the frequency of T1D-associated T cells may be lower than that of pathogen-specific T cells; (2) antigens involved in T1D pathogenesis, especially those at different stages of T1D, may be more heterogeneous than those in infectious diseases; (3) autoreactive TCRs could be more private (i.e. not common between patients) than those of conventional T cells; and (4) sample sizes studied to date have not been large enough. However, having large TCR data sets produced by next generation sequencing will enable machine learning algorithms to cluster and classify TCR clonotypes. Using these newly developed techniques, even infrequent disease-specific TCRs having less publicity (i.e. commonality) between people may be identified from relatively small numbers of samples. Indeed, some computational TCR classifying methods are now capable of identifying cancer patients responding to immune checkpoint inhibitors (40), and also early stages of cancer can be differentiated from healthy individuals using this type of technique (107,108). In the next section, we will discuss how to take advantages of the latest TCR clustering/classifying techniques for T1D TCR biomarkers.

Clustering and Classification of TCR Clonotypes
TCR clonotypes recognizing the same peptide-MHC complex often share similar motifs and features. For example, influenzaspecific TCRs prefer to use TRAV38-1/TRAJ52/TRBV19/ TRBJ1-2 (109)(110)(111), and melanoma (MART-1)-specific TCRs often contain an alpha chain with TRAV12-2 (112). Likewise, several features common for islet antigen-specific TCRs have been reported. We discovered that insulin B-chain-specific TCRs tend to use TRAV38-1/38-2 and other Valpha segments having similar motifs in the CDR1 and CDR2 regions (113). Also, it has been shown that a specific motif "SGGSNYKLTF" is contained in the CDR3 region of alpha chains specific to an IGRP peptide (45). More recently, crystal structure analysis of TCRs specific to a hybrid insulin peptide composed of proinsulin and islet amyloid polypeptide (IAPP) demonstrated that motifs in the TRBV5-1 segment commonly interact with amino acid residues in IAPP (92). Our work also indicates that T cell responses to hybrid insulin peptides precede clinical T1D onset (114), making these TCR clonotypes excellent candidates for biomarkers. Thus, autoreactive TCRs share commonalities and similarities, which provide clues to cluster TCRs and stratify those specific to a certain condition. A number of algorithms to cluster or classify TCR clonotypes have been developed. Each algorithm has advantages and disadvantages as reviewed by others (115,116), but in respect to TCR biomarker development for T1D, the algorithms can be divided to two groups. First, those that clusters TCRs by assessing similarities of TCR sequences with each other in datasets. Second, those that seek to classify TCRs by identifying similar to known antigen-specific or disease-specific TCR clonotypes. The former algorithms such as TCRdist (111), GLIPH/GLIPH2 (117,118), ClusTCR (119), and GIANA (108) do not need information about T1D-specific epitopes and TCR sequences beforehand, and thus can be used to predict diseasespecific TCR clonotypes that are specifically detected in T1D patients but not in non-diabetic subjects. On the other hand, machine learning-based algorithms that assess similarities to known antigen-specific TCR datasets to predict epitopes, such as DeepTCR (101), DeepCAT (107), TCRmatch (120), and TCRAI (100) need prior information about disease-specific TCR sequences. These algorithms show excellent performance when classifying TCRs specific to the same epitopes that were used to develop the machine learning algorithm but not for those having different specificities. Therefore, large sets of diseasespecific TCR sequence information for machine training are necessary to achieve high specificity and sensitivity. Typically these types of algorithms show better performance to detect antigen-specific TCR clonotypes than the clustering-based algorithms, thereby being useful to validate TCR clonotypes once epitopes or disease-specificity are determined.
Alternatively, they can be also used to 'clean up' (i.e. eliminate non-specific TCR clonotypes) TCR datasets that are obtained from multimer-stained T cells or those activated by antigen stimulation ( Figure 1B).
In any case, it is essential to prepare TCR datasets from a large number of individuals with and without T1D at multiple time points to elicit the best performance by machine learning and clustering algorithms. Typically, diverse datasets rather than large data but from a limited number of samples improve learning efficiency (100). In addition, it is also important to prepare accurate TCR clonotype information to differentiate T1D patients from healthy subjects. There are now several TCR databases available, which accumulate and curate information about TCR sequences along with target peptide-MHC complexes, such as VDJbase (121,122), IEDB (123), VDJdb (124), iReceptor (125), and McPAS-TCR (126). While these are incredibly useful resources, a proportion of islet-specific clonotypes is still very small, accounting for only~100 out of tens of thousands of clonotypes, the majority of which are specific to viruses and tumor antigens. Assuming that self-reactive TCR clonotypes are more heterogeneous and rarer compared to pathogen-specific ones, there is a need for higher numbers of clonotypes specific to T1D. Thus, identifying a large set of accurate disease-specific TCR clonotypes will be a key component to achieve successful big data analysis, which will ultimately lead us to establish TCR biomarkers in T1D (Figure 1).

PERSPECTIVE
It is still controversial whether T1D patients have distinct islet antigen-specific T cell subsets in the blood compared to healthy individuals. Even in the pancreas, non-diabetic organ donors have preproinsulin-specific T cells in the exocrine compartment, but such antigen-specific T cells accumulate into the islets over the course of T1D progression (127). In the islets, we recently demonstrated that only T1D donors have CD8 T cells highly reactive to preproinsulin (74). Mallone and colleagues also reported that pancreata of T1D donors have a higher number of zinc transporter-8-specific T cells than nondiabetic controls (7). Thus, multiple studies demonstrate that islets of T1D individuals have distinct T cell repertoires from those without diabetes. However, a number of studies indicate that healthy individuals have islet-antigen specific T cells in the blood (7,113,(128)(129)(130)(131), and depending on cell subsets examined, some studies including those looking into pathogenic T cells show that T1D patients have higher numbers of isletspecific T cells, whereas others do not detect differential isletspecific T cells in T1D patients. This controversy could be explained by either (1) detectable numbers of pathogenic T cells in the islets do not leak into the peripheral blood ( Figure 2A); or (2) pathogenic T cells in the islets do indeed circulate, but because there are already a number of islet-specific (but not harmful) T cells in the circulation, the total numbers of islet-specific T cells (i.e. pathogenic T cells leaked from the islets plus non-pathogenic T cells) are not differentiated enough in the blood of T1D patients from healthy individuals ( Figure 2B). Given evidence that a portion of T cell repertoires are shared between pancreas, pancreatic lymph nodes, and peripheral blood cells (72), and that TCR repertoires in the islets of T1D organ donors are clonally distinct from those of non-diabetic donors (74), if the latter hypothesis ( Figure 2B) is correct, islet-derived TCR sequences will be a powerful marker to discriminate pathogenic from physiological T cells, thereby capable of stratifying individuals with active insulitis prior to clinical T1D onset.
To develop practical TCR biomarkers in T1D, a number of obstacles need to be overcome, some of which may be unique to autoimmune diseases. These challenges can be considered from the view of (1) publicity, (2) abundancy, and (3) disease-specificity.

Publicity
It will be important to understand the frequencies of public vs private TCR clonotypes that are specific to the T1D disease state, and these likely fluctuate over time during T1D development. Given the genetic risk associated with HLA class II genes, heterogeneity provided by HLA diversity could be smaller than other diseases for TCR clonotypes expressed by CD4 T cells. However, autoreactive T cells, which often bind to peptide-MHC complexes with low affinity, may have a larger TCR repertoire than conventional anti-pathogen T cells, resulting in less commonality. Therefore, frequency of public T1D-specific TCR clonotypes may be low. Strategies that compare TCR repertoires in each individual such as pre and post treatment (40) do not need to consider publicity of clonotypes, and therefore may be more easily applicable to T1D immune intervention studies.

Abundancy
Theoretically, 10 15 -10 16 diverse TCR clonotypes can be assembled (12)(13)(14); however, a practical TCR repertoire size is estimated to be about 10 8 -10 10 per person (15,17,18). This indicates that the frequency of target clonotypes is extremely low. However, there is evidence that identical clonotypes are persistently detected from the same individuals over time (44,81,93,132). We believe quantitative resolution of TCRs will need to be increased. This could be achieved by enriching samples before sequencing (e.g. beads enrichment by antigen-specific multimers). Another very attractive approach is to target sequencing to TCRs containing a preferred Vgene segment of interest, thus greatly enhancing the depth of sequencing by analyzing clonotypes that can be obtained for a specific V allele. Blood sample volume needed to quantitatively evaluate frequency of disease-associated TCR clonotypes is another important consideration, which will need to be addressed given that the T1D disease process does begin in young children.

Disease-Specificity
Identification of disease-specific TCR clonotypes is an essential component to develop robust T1D TCR biomarkers. A larger number of TCR clonotypes with higher specificity to the disease that are in place will allow for more sensitive and specific assays. Therefore, the key is how to select such truly disease-specific TCR clonotypes. As illustrated in Figure 1, both accumulation of actual TCR datasets produced from individuals with and without T1D and computational big data analysis will facilitate the development of biomarkers. While the majority of TCR big data analysis currently uses only CDR3-beta sequences, it has been demonstrated that inclusion of entire sequence information such as V and J segments, in particular CDR1 A B D E C FIGURE 1 | Strategy to determine disease-specific TCR clonotypes. Red and gray circles represent true and false disease-specific TCR clonotypes, respectively. Green circles are true disease-specific clonotypes determined by clustering with known disease-specific TCR clonotypes. (A) TCRs detected in the islets, pancreata, and pancreatic lymph nodes, in particular those for which antigen specificity has been determined as well as those that are clustered with known disease-specific TCRs, can be the initial source for disease-specific TCR candidates. (B) TCRs detected from peripheral blood T cells enriched by antigen stimulation or peptide-MHC-conjugated multimers are also an initial source. Antigen-specific algorithms can enrich TCR clonotypes that are truly specific to antigens. (C) Candidate TCR clonotypes may be assessed for specificity to islet tissues, proteins, and peptides. (D) Using classifying algorithms, candidate TCR clonotypes are assessed for frequency in the blood of individuals with and without T1D to determine disease specificity. Simultaneously, clustering algorithms can select additional clonotypes that are clustered with known disease-specific TCR clonotypes. (E) TCR clonotypes selected by classifying and clustering algorithms are used for machine learning of antigen-specific algorithms to further determine true disease-specificity. and CDR2 sequences, increases accuracy of classifying TCR clonotypes (100,120). While the number of T1D-specific clonotypes that have been determined so far is low, evolutions in both TCR sequencing technologies and computational analysis strategies will dramatically impact this effort.
In conclusion, the antigen receptor on disease specific T cells holds promise for a non-cell based biomarker of not only the presence of T1D but disease activity as well. Efforts to define the TCR repertoire within the human pancreas of T1D and non-T1D organ donors is underway with a need to define the antigen specificity and HLA restriction of these identified clonotypes. Those clonotypes that are shared between individuals with T1D, frequent, and circulate from the pancreas and pancreatic lymph nodes to the peripheral blood are prime candidates for deep sequencing and clustering of TCRs using developed computational analyses.

AUTHOR CONTRIBUTIONS
MN and AM wrote and edited the manuscript. All authors contributed to the article and approved the submitted version. | Proposed models of islet-specific T cell detection in the blood. While pancreatic islets contain a certain amount of T cells regardless of disease status (gray bars), only islets of individuals with T1D contain T cells that are highly reactive to islet antigens (red bar). Model (A) Leak of T cells from the islets to peripheral blood is limited. T cell repertoires specific to islet antigens are not different between individuals with and without T1D. Model (B) There are substantial amounts of islet-specific but not disease-specific T cells in the blood regardless of disease status (gray bars). T cells in the islets do circulate in the blood (red bar), but the total numbers of islet-specific T cells are not different between individuals with and without T1D. Enumerating only T cells derived from the islets can identify individuals having T1D. TCR clonotypes are a distinct property to identify islet-derived T cells. *** significantly different. ns, not significant.