Systematic pattern analyses of Vδ2+ TCRs reveal that shared “public” Vδ2+ γδ T cell clones are a consequence of rearrangement bias and a higher expansion status

Background Vγ9Vδ2+ T cells are a major innate T cell subset in human peripheral blood. Their Vδ2+ VDJ-rearrangements are short and simple in the fetal thymus and gradually increase in diversity and CDR3 length along with development. So-called “public” versions of Vδ2+ TCRs are shared among individuals of all ages. However, it is unclear whether such frequently occurring “public” Vγ9Vδ2+ T cell clones are derived from the fetal thymus and whether they are fitter to proliferate and persist than infrequent “private” clones. Methods Shared “public” Vδ2+ TCRs were identified from Vδ2+ TCR-repertoires collected from 89 individuals, including newborns (cord blood), infants, and adults (peripheral blood). Distance matrices of Vδ2+ CDR3 were generated by TCRdist3 and then embedded into a UMAP for visualizing the heterogeneity of Vδ2+ TCRs. Results Vδ2+ CDR3 distance matrix embedded by UMAP revealed that the heterogeneity of Vδ2+ TCRs is primarily determined by the J-usage and CDR3aa length, while age or publicity-specific motifs were not found. The most prevalent public Vδ2+ TCRs showed germline-like rearrangement with low N-insertions. Age-related features were also identified. Public Vδ2+ TRDJ1 TCRs from cord blood showed higher N-insertions and longer CDR3 lengths. Synonymous codons resulting from VDJ rearrangement also contribute to the generation of public Vδ2+ TCRs. Each public TCR was always produced by multiple different transcripts, even with different D gene usage, and the publicity of Vδ2+ TCRs was positively associated with expansion status. Conclusion To conclude, the heterogeneity of Vδ2+ TCRs is mainly determined by TRDJ-usage and the length of CDR3aa sequences. Public Vδ2+ TCRs result from germline-like rearrangement and synonymous codons, associated with a higher expansion status.

Introduction gd T cells are unconventional T cells which have T cell receptors (TCR) consisting of both rearranged g (TRG gene) and d (TRD gene) chains. Like ab T cells, gd T cells use the recombination of variable, diversity, and joining gene segments (V (D)J recombination) to generate the complementaritydetermining region 3 (CDR3) of the TRG and TRD. The diversity of these CDR3 regions is further amplified by the insertion of palindromic sequences (P nucleotides) and additional non-templated nucleotides (N-insertions) introduced by terminal deoxynucleotidyl transferase (TdT) (1,2).
However, in contrast to conventional ab T cells, which use numerous V segments almost randomly, human gd T cells exclusively use Vd1, Vd2, and to a lesser extent also Vd3 segments to generate delta chains. Further restrictions on diversity are imposed due to Vd2 + chains mostly pairing with Vg9-JP chains (2). The resulting Vg9Vd2 + T cells are regarded as innate gd effectors that are quickly activated in anti-tumor, infection, and inflammation within diseases (3). Committed Vg9Vd2 + T effector cells are enriched in fetal thymus and blood, where they then persist into adulthood (4)(5)(6). The Vg9Vd2 + TCRs uniformly recognize phosphoantigens like microbial-derived (E)-4-hydroxy-3-methyl-but-2-enyl pyrophosphate (HMB-PP) and host-derived Isopentenyl pyrophosphate (IPP) in a pMHC-unrestricted manner (7)(8)(9)(10), leading to fast TCR expansion and cytokine release of Vg9Vd2 + gd T cells (3). The Vg9Vd2 + TCRs are featured as "semiinvariant" TCRs whereby the Vg9 chains always have a TRGV9-TRGJP rearrangement. Fetal-derived Vg9JP chains often express the germline-encoded CDR3 sequence CALWEVQELGKKIKVF due to the lack of TdT in the fetal thymus (5,6). The Vd2 + repertoire, on the other hand, that evolves during human development remains both highly diverse and individual (4,5). In the early stages of life, the TRDV2 gene segments preferentially rearrange with TRDJ3 and TRDJ2, and gradually switch to TRDJ1 after birth (11). Meanwhile, more Ninsertions and longer CDR3 length are introduced into Vd2 + TCRs after birth due to the increasing activity of the TdT (6). Public Vd2 + TCRs are frequent among Vd2 + repertoires from both the fetus and cord blood (6,(12)(13)(14). Public Vd2 + TCRs have a higher overall diversity than the public Vg9-JP; they occupy a substantial portion of Vd2 + repertoires from adult peripheral blood (4). However, the properties and ontogeny of public Vg9Vd2 + TCRs are not completely solved. It is also unclear whether public Vg9Vd2 + TCRs have any advantage in target recognition, amplification over private TCRs or whether the thymus after birth still preserves the ability to produce public Vd2 + TCRs.
TCR-sequencing data is high-dimensional data. The CDR3 sequences are typically composed of 10-30 diverse amino acids and factors such as V(D)J recombination, frequency, and MHC restriction need to be considered in the analysis of this. Recently, different computational tools were developed to discover TCR clusters based on the sequence patterns (15)(16)(17). For example, TCRdist3 is an open-source python package which transforms TCR repertoires into biochemically informed distance metrics based on the similarity of the TCR amino acid sequences, especially on the CDR3 sequence regions. The calculated distance metrics enabled clustering or meta-clonotype analysis to be carried out on the TCR sequences (18,19). However, MHC restriction of ab TCRs and lack of HLA genotyping data for most of the available data impeded these tools from being applied to public TCR datasets on a larger scale. In contrast, the MHCunrestricted nature of gd TCR makes it possible to apply TCRdist3 on gd TCR repertoires across a large number of individuals.
To investigate the heterogeneity and ontogeny of public Vd2 + TCRs, we determined the publicity of TCRs from Vd2 + TCR repertoires of 89 individuals from cord blood (CB), infant peripheral blood, and adult peripheral blood. Vd2 + CDR3 amino acid (CDR3aa) sequences were embedded into the distance matrix by TCRdist3 and visualized by Uniform Manifold Approximation and Projection (UMAP). We found that both the J-usage and length together defined the heterogeneity of Vd2 + CDR3aa sequences. Both germlineencoded and age-dependent features were preserved among public Vd2 + TCRs, indicating that they are produced in the fetal and adult thymus. Interestingly, we additionally revealed a higher expansion status of public Vd2 + TCRs than private Vd2 + TCRs.

Results
Public Vd2 + clones prevail in all age groups To investigate the occurrence of public Vd2 + clones, we collected TCR repertoires containing 213,391 Vd2 + CDR3aa sequences from 11 cord blood (CB), 55 infant peripheral blood, and 23 adult peripheral blood samples. Eighty-one samples were collected from our published studies (4,13,20,21), and eight of these samples (five CB and three adult) were included from an unpublished databank to increase further the sample size ( Figure 1A and Table S1). The lengths of CDR3s ranged from 4 to 39 amino acids with a median of 18 amino acids ( Figure S1A). The TRDJ3 segment dominated in CB samples and rapidly decreased after birth. Similarly, 15.9% of the TRDV2 rearranged with TRDJ2 in CB, but this number decreased to around 2.1% in adults. In contrast, the TRDJ1 segment increased to a large majority in adult samples compared to the small frequency that was found in CB. The proportions of the TRDJ4 segment were marginal in all three groups ( Figure S1B). "Public" Vd2 + TCR clones were defined by the proportion of individuals sharing the same CDR3aa sequence Private TCR CD3R regions were found in only one individual. As well as this low and high TCR's appeared in less than or equal to 10% of individuals respectively. In CB samples, 26.8% of TCR sequences were low public and 15.1% were high. Interestingly, although the publicity of adult Vd2 + TCRs significantly decreased,14.5% of low public and 4.6% of high public TCRs were still found on average ( Figure S1C).
Before applying the TCRdist3 tool to Vd2 + TCR repertoires, data pre-processing and down-sampling were performed ( Figure 1A). To reduce the noise caused by rare sequences, we only selected CDR3aa sequences with a length between 14 to 22 amino acids, and all TRDJ4 rearrangements were also excluded ( Figures S1A, 1B, C). Subsequently, this led to 52,199 CDR3aa sequences being obtained after down-sampling. This data cleansing and down-sampling method did not significantly affect the J-usage and publicity of post-procession TCRs in this study ( Figures 1C, D).
Highly diverse Vd2 + TCRs cluster according to CDR3aa length and TRDJ segment usage The distance between every two TCRs was calculated based on CDR3aa sequences by the TCRdist3 which generated a Experimental design and data pre-processing. (A) Illustration of the data analysis workflow. Our datasets were collected from 11 CB, 55 infants, 23 adults; among them, datasets from five CB and three adults were unpublished data. distance matrix (18,19) Following this a UMAP was generated to allow data embedding and visualization ( Figures 1A, 2A). At first sight, Vd2 + TCRs were clearly stratified on the UMAP by both the J-usage and CDR3aa length ( Figures S2A, B). The Jusage skewed from TRDJ3-and TRDJ2-dominant in the CB group to TRDJ1-dominant in the adult group ( Figures 2B, S2C). Longer CDR3s on the other hand were more frequently found in TRDJ1 and TRDJ2 adult group. The CDR3aa length distribution between the age groups did however remain similar ( Figures 2B, S2C, D). Infant-derived Vd2 + TCRs showed intermediate features between CB and adult TCRs in terms of both J-usage and CDR3aa length ( Figures 2B, S2C, D). Adjacent to this, in order to test if other factors contributed to the heterogeneity of Vd2 + TCRs, we selected the TCRs with the most prevalent lengths for the TRDJ1 (length 17aa) and TRDJ3 (length 19aa) regions for a more in-depth re-analysis. This showed that publicity ( Figure 2C) and age groups ( Figure 2D) were not distinguishable on the re-analyzed UMAP. More evidently, after restricting to the same J-usage and length, the CDR3aa sequence logomap showed almost identical motifs between the different publicity and age groups ( Figures S2E, F). This suggests that the heterogeneity of Vd2 + TCRs is primarily determined by a combination of TRDJ usage and CDR3aa length.
Public Vd2 + repertoire preservers both germline and age-related characteristics In previous studies by Ravens et al. and Papadopoulou et al., public Vd2 + TCRs were described as germline-encoded CDR3 with either no or few N-insertions and short CDR3 lengths (12, 13). In our dataset, publicity was also reversely associated with the number of N-insertions and length of CDR3aa ( Figures S3A,  B). Interestingly, public Vd2 + TCRs previously have shown agedependent wave-like dynamics: enriching in fetal blood, then decreasing in cord blood before rising again in 5 to10-week-old infants and then finally dropping in adulthood (12, 13). This then therefore led us to determine whether or not the public clones generated in different time windows would also show similar age-dependent features. Indeed, although public Vd2 + TCRs were enriched in TCR clusters with shorter lengths, they still demonstrated to preserve the J-usage and length-determined heterogeneity as private Vd2 + TCRs also displayed ( Figures 2E, F).
Following this, to investigate how public Vd2 + TCRs' features changed during development, we took advantage of the whole dataset before down-sampling. Overlapping of all unique public CDR3aa clones for different age groups showed that only a minor portion of clones were shared between the CB and adult groups (CB&AD shared) (1,175 out of 4,641 in CB and 1,175 out of 5,262 in adult). In contrast, both CB and adult groups largely shared their public Vd2 + repertoire with the infant group (4,428 out of 4,641 in CB and 4,258 out of 5,262 in adult) ( Figure S3C). From combining the transitional features of infant TCRs in the J-usage and length, we considered that agerelated differences of public Vd2 + TCRs mainly exist between CB and adult groups ( Figure 3A), while a transitional infant group shared the commonalities from both sides. As TdT activity increases along with human development, we hypothesized that adult-derived TCRs would have more N-insertions than CB-derived ones. Indeed, the private Vd2 + TCRs from the adult group had the most N-insertions and longest CDR3aa length, whereas the CB&AD shared group Vd2 + TCRs had the fewest Ninsertions ( Figures 3B, S3D). The N-insertions of adult-derived TRDJ2 and TRDJ3 public Vd2 + TCRs were slightly more than that of CB-derived public TCRs ( Figure 3B). Intriguingly, for TRDJ1, we observed more TCRs with higher N-insertions in the CB public group than in the adult public group ( Figure 3B). Here 25.0% of CB-derived public Vd2 + TRDJ1 TCRs had more than 10 N-insertions. Whereas for adult-derived and CB&AD shared public clones, the number was merely 5.91% and 2.91%, respectively ( Figure 3C). Finally, although the CB-derived public Vd2 + TRDJ1 TCRs had more residues in the highvariable region, the motifs of the three groups were similar, i.e. polar amino acids were mainly used ( Figure 3D).

Synonymous codons in CDR3 nucleotide sequences result from different TRDDgene usages and N-insertions that contribute to the generation of public Vd2 + CDR3
Since the generation of public Vd2 + clones did not entirely result from simple germline rearrangements without Ninsertions ( Figures 3B, C), we explored in more detail how the public CDR3aa sequences were rearranged. The publicity of CDR3aa sequence positively correlated with the number of its corresponding unique encoding transcripts ( Figure 4A). The same CDR3aa sequences could be generated by the exceedingly high numbers of different CDR3 nucleotide (CDR3nt) sequences. For example, the public CDR3aa sequence 'CACDTLGDTDKLIF' (2) was detected in 76 different individuals as well as also being transcribed from 80 different transcripts ( Figure 4A). Additionally, public Vd2 + CDR3aa sequences were more likely to have a variable TRDD-segment usage. 31.3% and 10.6% of 'high public' and 'low public' CDR3's, respectively, could be rearranged from more than one TRDDsegment, whereas a much lower frequency of only 0.16% was observed in private CDR3's ( Figure 4B). More surprisingly, public CDR3aa sequences could be generated from multiple CDR3nt sequences even within one individual. For example, in donor SA62, the public CDR3 "CACDTLGDTDKLIF" could be produced by eight different CDR3nt transcripts, either rearranged with TRDD3 and 0 -1 N-insertion, TRDD2 with 2 The heterogeneity of Vd2 TCRs is determined by CDR3 lengths and TRDJ segments. (A) Each point stands for a Vd2 + CDR3aa sequence. UMAP for 52,199 Vd2 + CDR3 (same data as in Figure 1B N-insertions, or 9 N-insertions without TRDD segment (Table 1). 19.8% (median value, ranging from 4% to 62.6%) of high public Vd2 + CDR3 in each individual were generated by at least five unique transcripts. In contrast, the number of private CDR3s was much lower at 1.28% (median value, ranging from 0.26% to 5.56%) ( Figure 4C).
The publicity of Vd2 + clones positively associated with expansion status To determine whether the publicity of Vd2 + TCRs correlated to the expansion ability, we assigned the top 25% of most expanded TCRs in each sample as high frequency (high-freq) TCRs and then labelled the remaining as low frequency (lowfreq) TCR's ( Figure S4A). The high-freq and low-freq TCRs were not distinguishable on the UMAP ( Figure 5A). In order to understand which groups of Vd2 + TCRs are more likely to be high-freq TCR's, we calculated the "expansion status score" based on high-freq to low-freq TCRs (Methods section). For a group of TCRs in one individual, the expansion status is calculated by dividing the number of high-freq TCRs in the group by the number of low-freq TCRs followed by a logtransformation. Hence, the higher the expansion status score, the more high-freq TCR's in that group. An expansion status score of > 0 means the group has more high-freq TCRs than lowfreq ones. Interestingly, the median expansion status score of "high public" TCRs was 0.37, and that of the "low public" TCR's remained significantly higher than the private TCR values (-0.50 vs -1.28, median value) ( Figure 5B). We further examined the expansion status score for TCRs with different J-usages, and similar results were observed ( Figure S4B). Given that the   (C) Box plot shows the ratio of CDR3aa sequences translated from 5 or more different nucleotide transcripts in each individual. Games-Howell Post-Hoc Test was used to test the mean difference between groups. Adjusted P-values are shown between groups.
publicity is reversely associated with CDR3 length ( Figure 4A), expansion status could also be associated with CDR3 length.
However, CDR3aa lengths only demonstrated to have a minimal impact on expansion status, and the median expansion status scores of all lengths and TRDJs remained below 0 ( Figures S4C, D).

Discussion
In this study, we applied TCRdist3 to systematically investigate the Vd2 + TCR repertoire and revealed that Vd2 + TCRs retain a high heterogeneity that is primarily determined by the J-usage and CDR3aa length. It was observed that public Vd2 + TCRs were as diverse as private TCRs. In previous studies, TCR's with high publicity or shared between cord blood (CB) and adult age groups were characterized to show only a few or no Ninsertions and shorter CDR3length (6,13). Unexpectedly, our study also demonstrated that the TRDJ1 of public (but not of private) gd TCRs in CB displayed a relatively high number of Ninsertions and longer CDR3 lengths. Moreover, it was additionally revealed that, compared to private Vd2 + CDR3aa sequences, the public Vd2 + CDR3aa sequences were prone to be generated from multiple CDR3nt transcripts even within one individual. Thus, it could be concluded that germline-like rearrangement and synonymous codons used by CDR3nt sequences contribute to the generation of public CDR3aa. Finally, public Vd2 + TCRs displayed a higher expansion status than private Vd2 + TCRs.
By using TCRdist3 and various other tools for investigating CDR3 motif or amino acid properties 'clustering' of TCR's can be carried out. This strategy was particularly useful in linking ab TCR sequences to antigen-specificity based on similarity (18,(22)(23)(24). In contrast to highly rearranged ab TCRs, which have the ability to recognize any possible antigen, most of the rearranged Vg9Vd2 + TCRs are instead generated from relatively fixed options and are thought to uniformly recognize phosphoantigens (2, 7). Complex TCR repertoire data can be extracted to generate a single UMAP by applying the TCRdist3 method to conveniently analyze the heterogeneity of gd TCR's. Hence, it is useful when investigating the shift of the Vd2 + TCR repertoire under different physiological and pathological conditions. For example, in our study, the repertoire shift from CB-derived to adult-derived repertoire was notably highlighted. Moreover, from this, it would be interesting to see if TCRdist3 could be applied to the more adaptive Vd1 + or Vd3 + gd TCRs to possibly determine their function and antigen-specificity.
Vd2 + TCRs derived after birth displayed more N-insertions and longer CDR3 length than those from CB, considering the increasing TdT activity. However, in contrast to this, the public clones among CB-derived TRDJ1 Vd2 + TCRs showed more Ninsertions and longer CDR3 than their adult public TCR counterparts. This property was not seen among public Vd2 + TCRs with other J-usage meaning it is difficult to fully explain and understand this complex feature as yet. One possibility for this could be that it may associate with the intrathymic differentiation of Vg9Vd2 + T effectors. Mouse and human innate gd T effectors are committed in waves within the fetal thymus, and have shown to acquire phenotypes that are closely related with certain TCR usages (3, 4, 6, 25). While the development of human gd T cells is not fully elucidated, it could be hypothesized that a number of underappreciated Vg9Vd2 + T effectors develop later in the fetus when the TdT becomes much more active. These specialized effector cells do not remain in peripheral blood after birth. By comparing specific gd T cells from mice relevant information can be obtained. Mouse Vg6 + and Vg4 + IL-17-producing gd T cells are a rare population of cells which reside in mucosal tissues like the skin or lungs (26,27). These specialized cells exclusively develop at embryonic days of E15 to E18 after B A FIGURE 5 Public clones have a greater expansion status compared to private clones. (A) UMAP of Vd2 + TCR colored by high-freq/low-freq category. (B) Expansion status score for each publicity group. Games-Howell Post-Hoc Test was used to test the mean difference between groups. Adjusted P-values are shown between groups. gestation in the fetal thymus where they will then home to specific tissues (28). Thus, there is only a narrow window in which these cells can easily be observed whilst they travel within the circulation. This therefore means that the existence of previously unknown tissue-resident gd T cell populations which are generated shortly after birth cannot be excluded.
We demonstrated that the publicity of Vd2 + TCRs positively associates with a higher expansion status. This remains in line with previous studies which also suggest that higher abundance was found on high public clones (12, 13). One of the most debatable questions regarding public Vd2 + TCRs continues to determine if the generation and expansion of public Vd2 + TCRs are driven by interactions with BTN2A1 and BTN3A1 butyrophilin molecules. It is also yet to be discovered if the recognition of specific antigens may additionally alter the expansion process within these Vd2 + TCRs. Although the CDR3 is essential for recognition, previous studies failed to find evidence that the CDR3 of Vd9Vd2 + TCRs specifically recognize phosphoantigens (7)(8)(9)(10). We cannot exclude the possibility that even the family of Vg9Vd2 + TCRs recognizes antigens in an "adaptive-like" way via the CDR3 until a complete structure of interacting Vg9Vd2 + TCR, phosphoantigen, and butyrophilins BTN2A1 and BTN3A1 is revealed. However, based on the current understanding of Vg9Vd2 + T cells, it is unlikely that public or expanded Vd2 + TCR clones result from antigen-specific clonal expansion. First of all, previous ex vivo experiments suggested that phosphoantigen stimulation induced both polyclonal and unbiased expansion of Vd9Vd2 + T cells (6,20). Moreover, in our study, by calculating the geometric distance between Vd2 + CDR3 based on sequence patterns, it was found that there is no significant difference between public and private Vd2 + CDR3 patterns or between high-freq and low-freq Vd2 + CDR3s. The results suggest that the binding between Vd2 + CDR3 and phosphoantigen-activated butyrophilins BTN2A1 and BTN3A1 does not favor specific CDR3 variants or motifs.
Taking this all into account it can be determined why public Vd2 + TCRs appear to have a survival advantage? Based on the rearrangement bias and development ontogeny, various speculations can be made as follows: 1). There is a rearrangement bias. where the publicity of Vd2 + TCR CDR3aa positively associates with the number of corresponding CDR3nt sequences. Therefore, the gd T cells with a public Vd2 + TCR may have multiple sources from different TCR rearrangements, resulting in a higher copy number. 2). Most public Vd2 + TCRs, especially those shared between many individuals, are rearranged early in life and persist into adulthood (4,(12)(13)(14). They may simply have more time to accumulate. A similar situation was observed in human ab T cells, where it was found that T cells carrying public ab TCRs were generated before birth and then continued to maintain high abundances for a long time throughout adulthood (29).
One of the major limitations to this current study was that it was only viable to investigate the Vd2 + chains, meaning information on the corresponding pairing of Vg9 chains was lost. Within our study it was also difficult to prove or disapprove the possibility that public Vg9Vd2 + TCRs may interact with antigens in a different way compared to antigen interaction by private TCRs. However, recent advancements in single-cell TCR sequencing do make it possible to sequence paired gd TCR and relate it to phenotypes of other cells (4). From this it can be expected that more such data will soon become available. Another limitation to this study was the fact that undersampling could possibly impair accuracy. As the library protocol only enabled a survey of up to tens of thousands of gd T cells from a portion of a PBMC sample, this underrepresented the huge vast number of gd T cells that are actually living within our body. This undersampling may make it difficult to accurately identify moderately expanded clones. However, considering the relatively low diversity of Vd2 + TCRs, undersampling may compromise some details, but the major findings are unlikely to be greatly affected.
The TCRdist3 method has proven to be a very useful tool for analyzing human ab T cells, and the software is able to support gd TCR analysis (19). However, as mentioned above, the ab TCRs have a much higher heterogeneity than Vd2 + TCRs provided by the V(D)J rearrangement. Thus, detecting the different patterns between ab TCRs is considerably easier. In our case, the TCRdist3 detected the heterogeneity of Vd2 + TCRs generated by length and J-usage, but not by publicity or age. Our sequence pattern analysis also failed to find heterogeneity between public and private TCRs. Furthermore, the existing possibility that more subtle and essential substitutions hiding in public Vd2 + TCRs cannot be excluded. Currently, methods for TCR clustering are all based on the CDR3aa sequences, which is sufficient to study antigen-specificity. However, the ontogeny of TCRs can be better determined if CDR3nt sequences are included to provide crucial information about VDJ rearrangement and N-insertions.
Our study established that TCR sequence analysis tools such as the TCRdist3 are very useful for investigating the gd TCR repertoire. By using TCRdist3 and downstream analysis, it could be demonstrated that public Vd2 + TCRs are a heterogeneous population with both germline and age-related features that confer expansion advantages over private TCRs. Given that expressing gd TCRs on ab T cells is a promising immunotherapy strategy against tumors (30, 31), those "more successful" public Vg9Vd2 TCR might improve the performance of immunotherapy using Vg9Vd2 + T cell clones or engineered ab T cells carrying Vg9Vd2 TCR.

Materials and methods
Human sample isolation and preparation Data from 8 healthy donors in this study were newly generated. Blood samples from adult donors (n = 3) and cord blood (CB) donors (n = 5) were collected at Hannover Medical School (Hannover, Germany) after written informed consent. This study was performed in accordance with the Declaration of Helsinki and approved by the institutional ethics review board at Hannover Medical School under study numbers 1303-2012 (CB individuals) and 7901-2018 (healthy adult individuals). PBMCs and CBMCs were purified from the blood samples by Ficoll-Paque density gradient media separation. These cells were then stored at -80°C in 90% fetal bovine serum and 10% DMSO freezing medium before use.
The amplified cDNA library with Illumina P5 and P7 adaptor was sequenced by Illumina Miseq using 500 cycles of paired-end sequencing.

Raw sequencing data alignment and annotation
Raw reads alignment annotation was performed with MiXCR software v.2.1.12 to international immunogenetics information system (IMGT) reference (32). Unproductive TCRs were filtered out. Annotated TCRs were further counted and summarized by VDJtools (33).
Vend, Dstart, Dend, Jstart are the start/end position of V, D, J segments on CDR3nt sequence.
Publicity of TCRs were defined based on the CDR3aa sequence by whether a sequence is shared among a certain percentage of the population. "private" CDR3aa is defined as CDR3aa that only appears in only one individual, "high public" TCRs are shared among at least 10% of the population, i.e. shared among 9 or more individuals in our study, the remaining TCRs are defined as "low public", i.e. shared by at least 2 individuals to 10% of the population.

TCR distance calculation and UMAP embedding
Vd2 + CDR3aa sequences with a length from 14 to 22 aa were preselected and downsampled for TCR distance calculation. CDR3s rearranged with TRDJ4 segment were excluded. For each age group, the numbers of CDR3s were randomly downsampled to 17,398 -17,401 sequences. TCR distances were computed according to the protocol of TCRdist3 (34). Briefly, CDR3aa sequence, V-usage, and J-usage were then included as input for the TCRdist3 in the Python 3.8 environment. CDR1, CDR2, and CDR2.5 sequences were reconstructed from the Vusage. After alignment, penalties were given to each mismatch between two TCRs according to the BLOSUM62 substitution matrix. Finally, distance was calculated as the weighted sum of penalties across all CDRs. The TCR distance matrix was further embedded into latent spaces by UMAP.

Calculation of expansion potential
In each individual, CDR3aa sequences were ranked by the frequencies from high to low. The top 25% of CDR3s were assigned as "high frequency" TCRs, and the rest were labelled as "low frequency" TCRs. (Figure S4A). The expansion status score is calculated for a pre-defined group of TCRs within an individual (i.e. the high public TRDJ1 Vd2 + TCR in the donor CB2) as follows: Expansion status score = ln ( n highfreq +1 n lowfreq +1 Þ n highfreq and n lowfreq are the number of high-freq and low-freq CDR3aa sequences in the group.

Statistics
Statistical analyses were performed under R v4.1.2. The statistical methods are described in the figure legends, in all cases, considering the sample size, variance and number of comparisons. Either a one-way ANOVA or a Tukey's HSD test after a one-way ANOVA or Games-Howell Post-Hoc Test was used and P-values were then calculated.

Data availability statement
The previously unpublished raw data presented in this study are deposited in the GEO repository, accession number GSE213280. All codes and processed data are available from Github repository https://github.com/isihh-uke/gdTCR_ analysis.git.

Ethics statement
The studies involving human participants were reviewed and approved by Institutional ethics review board at Hannover Medical School. Written informed consent to participate in this study was provided by the participants' legal guardian/ next of kin.

Author contributions
LD and LT conducted and interpreted bioinformatics analysis. AH and SR organized and performed TCR sequencing. LD, IP, and LT designed the study and wrote the manuscript. All authors contributed to the article and approved the submitted version. Public clones have greater expansion status compared to private clones. (A) The frequency distribution for the CDR3aa sequences in different individuals, from left to right, shows three representative individuals. The low/high frequency label was defined within each individual using the corresponding 75th percentile number of the frequency as a threshold. (B) Expansion potential for each publicity group according to different J gene usage. Games-Howell Post-Hoc Test was used to test the mean difference between groups. Adjusted P-values are shown between groups. (C) Expansion potential for TRDJ1 sequences with different length. (D) Expansion potential for TRDJ3 sequences with different length.