Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Immunol., 08 January 2026

Sec. Systems Immunology

Volume 16 - 2025 | https://doi.org/10.3389/fimmu.2025.1707727

A fundamental relationship between TCR diversity, repertoire size and systemic clonal expansion: insights from 30,000 TCRβ repertoires

H. Jabran Zahid*H. Jabran Zahid1*Damon MayDamon May2Harlan RobinsHarlan Robins2Julia GreisslJulia Greissl1
  • 1Microsoft Research, Redmond, WA, United States
  • 2Adaptive Biotechnologies, Seattle, WA, United States

Introduction: T cell receptor (TCR) diversity is essential for immune defense, yet the mechanisms underlying its decline with age and its variation among individuals remain poorly understood. These patterns are typically attributed to passive processes such as thymic atrophy and cumulative immune exposures. However, this view does not account for the systematic and highly structured variation in TCR diversity observed across large populations.

Methods: We analyze TCRβ repertoires from approximately 30,000 adults using high throughput sequencing. We quantify repertoire size and the contribution of the most expanded clones and evaluate their ability to predict TCRβ diversity across age, sex and Cytomegalovirus exposure using machine learning and linear modeling approaches.

Results: We show that TCRβ diversity is almost entirely determined by two measurable repertoire features: repertoire size and the frequency of the 1,000 most abundant clones. Together, these features explain 96% of the variance in TCRβ diversity, capture its dependence on age and sex and define a robust relationship that persists under strong immune perturbations such as Cytomegalovirus infection. This relationship arises because the frequency of abundant clones, which represent less than one percent of TCRβ diversity, tracks a repertoire wide pattern of coordinated clonal expansion which we term intrinsic clonality.

Discussion: We propose that intrinsic clonality reflects a fundamental, previously unrecognized property of the immune system which challenges the view that TCR diversity declines primarily through passive erosion. Rather, TCR diversity emerges as a system level property mediated by repertoire size and intrinsic clonality, both of which are likely subject to homeostatic regulation. These findings offer a new conceptual framework for understanding TCR diversity within immune homeostasis which may guide therapies aimed at restoring immune function.

Introduction

The theory of clonal selection is a cornerstone of modern immunology, providing the foundation for understanding how TCR diversity is broadly shaped and maintained by the immune system (1). T cells play an essential role in immune defense by targeting antigens from infections and cancer, with their specificity determined by T cell receptors (TCRs) (2, 3). The ability to respond to a broad range of pathogens is enabled by the diversity of the TCR repertoire (413). A large pool of naive T cells with diverse, randomly rearranged TCRs is primarily generated during childhood and adolescence via V(D)J recombination and is maintained throughout adulthood predominantly through homeostatic proliferation (14, 15). Mechanisms of immune tolerance eliminate or regulate self-reactive T cells, thereby limiting responses to non-self antigens (16). The T cell repertoire is further shaped by selection processes, including clonal expansion upon antigen encounter and the subsequent preferential retention of activated T cells in the memory compartment. A key feature of immune homeostasis is the long-term balance between naive and memory T cell compartments, which enables rapid responses to previously encountered antigens while preserving the capacity to recognize new ones (17, 18).

Despite its essential role, the mechanisms governing TCR diversity remain poorly understood. Diversity declines by about a factor of two between the ages of 20 and 80 years and is systematically lower in males than females (19, 20). Moreover, variation between individuals exceeds that explained by age and sex alone (20). Changes in TCR diversity are commonly attributed to passive, cumulative processes such as thymic atrophy, stochastic cell loss and chronic immune activation (2129). Under this view, diversity passively erodes over time independent of any intrinsic homeostatic regulatory mechanisms. However, these factors fail to explain the systematic, population-wide patterns observed in large datasets, nor do they identify specific mechanisms that mediate and constrain diversity.

Understanding what determines TCR diversity is critical because it directly impacts the immune system’s ability to recognize novel antigens. Reduced TCR diversity is linked to poor health outcomes including a greater risk of infectious disease and cancer (20, 21, 24, 26, 3031). This association with human health highlights the urgent need to understand TCR diversity within the broader context of immune homeostasis. Identifying factors that determine TCR diversity, which may themselves be intrinsically regulated, can provide new mechanistic insight into how TCR diversity is maintained and inform interventions aimed at restoring immune competence after its decline.

Here we analyze TCRβ repertoires from ∼30,000 individuals and show that diversity is almost entirely determined by two measurable features of the repertoire: the total number of T cells sequenced (repertoire size) and the frequency of the 1,000 most abundant clones. These two quantities independently correlate with TCRβ diversity and together account for 96% of its variation across individuals, including its systematic dependence on age and sex, as well as its response to Cytomegalovirus (CMV) exposure. The key finding of our analysis is that the predictive power of the frequency of abundant clones arises from the apparently coordinated nature of clonal expansion across the repertoire. We interpret this coordination as the manifestation of a fundamental, system level property that governs the amplitude of clonal expansion across T cells, which we term intrinsic clonality. Intrinsic clonality may be a previously unrecognized feature of the immune system subject to homeostatic control, helping to explain how TCR diversity is mediated within immune homeostasis with potentially important implications for translational research.

Results

We conduct a cross-sectional analysis of 30,430 TCRβ repertoires processed under standardized protocols in a CLIA1-certified laboratory. 95% of subjects in this cohort are aged 20–74 years (median 50) with 47% males and 53% females. The cohort was sequenced as part of the T-Detect COVID test which was granted Emergency Use Authorization by the Food and Drug Administration2 and is the same cohort analyzed by Zahid et al. (20). We measure the total number of T cells sequenced, S, and the number of unique clonotypes (i.e., richness), D, and refer to these quantities as the repertoire size and TCR diversity, respectively; we interpret them as relative measures of the true underlying size and diversity of the peripheral repertoire. Both quantities follow a log-normal distribution across individuals (all logarithmic values refer to base 10). We further define S1000 as the total number of T cells derived from the 1,000 most abundant clones and P1000 as the percentage of the repertoire they comprise (i.e., P1000 = 100 × S1000/S). To facilitate straightforward interpretation, we use P1000 when visualizing the data. For all modeling applications, S1000 is used to avoid covariance between P1000 and S, resulting in more robust and interpretable models.

Repertoire size is influenced by both biological factors and technical variables such as input volume and measurement uncertainty. We account for measurement error in our analysis and note that repertoire size strongly correlates with the T cell fraction (i.e., the proportion of peripheral blood mononuclear cells that are T cells; Spearman ρ = 0.84). Analyses using either repertoire size or T cell fraction yield consistent results, indicating that our findings are robust to the chosen metric. This supports the conclusion that the relationships we observe reflect intrinsic immune properties rather than technical artifacts.

TCRβ diversity, repertoire size and clonal expansion

TCRβ diversity declines with age. This decline occurs ∼10 years earlier in males than in females, leading to pronounced differences emerging in middle age (Figure 1A). After accounting for measurement uncertainty, the peak-to-peak intrinsic biological variance in D for the central 90% of subjects increases by a factor of 2 to 5 (0.3 dex to 0.7 dex3) between the ages of 20 and 80 years, respectively (20). Notably, inter-individual variation in D exceeds the systematic effects of age, sex and CMV exposure, particularly among older individuals. Repertoire size (S) declines with age similarly to D (Figure 1B), while clonal expansion (P1000) increases from 10% to 30% between ages 20 and 80 and is consistently lower in females (Figure 1C). Furthermore, CMV-positive individuals exhibit slightly lower TCRβ diversity, substantially larger repertoire size and higher clonal expansion (Figures 1D–F). These findings demonstrate that repertoire size, TCRβ diversity and clonal expansion all depend on age, sex and CMV exposure status.

Figure 1
Six line graphs compare repertoire diversity, repertoire size, and percentage for different ages. Panels (A)-(C) differentiate between males and females, while panels (D)-(F) compare CMV negative and positive groups. Each graph shows a trend over age with shaded areas representing confidence intervals. Panels (A) and (D) examine diversity, (B) and (E) show size, and (C) and (F) depict percentage.

Figure 1. The dependence of T cell receptor diversity, repertoire size and clonal expansion on age, sex and CMV exposure status. (A) Log of TCRβ diversity (Log D) as a function of age stratified by sex. Blue and orange curves are the median diversity in decade wide age bins for males and females, respectively. Error bars are bootstrapped and blue and orange shaded regions indicate the distribution of the central 50% of the data. Gray shading represents the central 90% of the subjects. (B) Log of the total number of productive TCRβs sequenced (Log S) in each repertoire as a function of age stratified by sex. The binning procedure and shading definitions are the same as in (A). (C) P1000 as a function of age stratified by sex. The binning procedure and shading definitions are the same as in (A). (D–F) are the same as in (A–C), respectively, but data are stratified by CMV exposure status rather than sex.

TCRβ diversity is independently correlated with both repertoire size and clonal expansion. We calculate the median D as a function of age, binned by S and P1000, respectively (Figures 2A, B). At all ages, individuals with high D tend to have low P1000 and high S, and vice versa. In CMV-negative individuals, larger repertoire size is accompanied by a more balanced clonal distribution, with both factors contributing to higher TCRβ diversity (Figure 2C). Conversely, in CMV-positive subjects, repertoire size and clonal expansion are decoupled, such that large repertoires coexist with high levels of clonal expansion. Larger repertoire sizes compensate for increased clonality in CMV-positive individuals, thereby mitigating the impact of CMV exposure on TCRβ diversity (32).

Figure 2
Six-panel chart illustrating changes in repertoire diversity and size. Panel A shows repertoire diversity versus age for different log sizes, with decreasing trends. Panel B compares repertoire diversity across age groups and P1000 percentages. Panel C displays P1000 against repertoire size, comparing CMV positive and negative statuses. Panels D, E, and F show repertoire diversity versus size, distinguishing by sex, CMV status, and age, respectively, with consistent upward trends. Colored lines represent varying percentages, and shaded areas indicate confidence intervals.

Figure 2. Relationship between T cell receptor diversity, repertoire size and clonal expansion. (A) Log D as a function of age with colored curves indicating median Log D in bins of Log S. Gray shading represents the central 90% of subjects. TCRβ diversity increases with repertoire size. (B) Same as in (A) but with color bars indicating median Log D as function of age in bins of P1000. TCRβ diversity decreases with increasing clonal expansion. (C) Relationship between repertoire size and P1000 stratified by CMV exposure status. For CMV-positive subjects, there is a marginal anti-correlation between P1000 and S (Spearman ρ = −0.04, p = 7×10−5). For CMV-negative subjects, the anti-correlation is significantly stronger (Spearman ρ = −0.25, p = 1×10−264). (D) Log D as a function of Log S with colored curves indicating median Log D in bins of P1000. The solid and dashed curves are for males and females, respectively. (E) Same as (D) but solid and dashed curves are for CMV-negative and CMV-positive subjects, respectively. (F) Same as in (D) but solid and dashed curves are subjects that are younger and older than the median age of the cohort (50 years), respectively. After accounting for repertoire size and clonal expansion, TCRβ diversity is independent of sex, CMV exposure status and age.

At a fixed S, D systematically declines with increasing P1000, demonstrating that clonal expansion reduces diversity independent of total repertoire size (Figures 2D–F). Remarkably, after controlling for S and P1000, the residual dependence of D on age, sex or CMV exposure status is minimal, indicating that these biological factors influence TCR diversity through their effects on repertoire size and clonal expansion. Meaning, repertoire size and clonal expansion strongly mediate the observed dependence of TCR diversity on age, sex and CMV exposure, suggesting that the relationship between these repertoire measures is largely independent of age, sex and CMV exposure.

Variations in repertoire size and clonal expansion almost fully account for the observed variance in TCRβ diversity at any given age (Figures 2A, B). P1000 increases systematically with age (Figure 1C) but subjects with low P1000 and high D are present at all ages. Notably, the 1000 most abundant clones may occupy a sizable fraction of the repertoire but they represent a very small fraction of the diversity (< 1%). Moreover, the 1000 most abundant clones in any repertoire are dominated by CD8+ memory T cells (Supplementary Figure S1), but we find that TCRβ diversity (D) is strongly correlated with the average clonal expansion of all clones in the repertoire (Spearman ρ = 0.92), which includes CD4+ and naive T cells. This correlation remains significant even when the 1000 most abundant clones are excluded (Spearman ρ = 0.52), indicating that P1000 serves as a proxy for the repertoire wide property we refer to as intrinsic clonality.

Modeling TCRβ diversity

To better understand the factors shaping TCR diversity, we develop predictive models using biological and repertoire-derived features. We first use XGBoost, a gradient-boosted decision tree algorithm (33), to model TCRβ diversity as a function of age, sex and CMV exposure status.

All three features are predictive (Figure 3A) and the model broadly captures the systematic dependence of TCRβ diversity on these variables (Figure 3B). Next, we include repertoire size (S) and clonal expansion (S1000). Adding these features substantially improves model performance (Figures 3C, D): S and S1000 together explain approximately 96% of the intrinsic variance in TCRβ diversity (see Materials & Methods). Furthermore, including S and S1000 eliminates the predictive value of sex, age and CMV status, confirming that their effects on TCR diversity are mediated through repertoire size and intrinsic clonality.

Figure 3
Panel A shows a bar chart of feature importance, with age as the most significant factor, followed by sex and CMV status. Panel B displays a line graph of repertoire diversity by age, comparing males and females, showing a decline with age. Panel C presents feature importance with log S and S₁₀₀₀ as the main factors. Panel D shows a similar line graph as panel B, with separate lines for males and females, indicating less diversity with aging.

Figure 3. Modeling TCRβ diversity using XGBoost. (A) Feature importance of sex, age and CMV status in predicting TCRβ diversity. (B) Solid lines show the median Log D as a function age stratified by sex. The dashed lines show the model predictions generated via five-fold crossvalidation. (C) Feature importance when including Log S and S1000 as features in the model. The negligible contribution of age, sex and CMV status demonstrates that repertoire size and S1000 account for the dependence of TCRβ diversity on these factors. (D) Same as (B) but for a model including repertoire size and S1000 as parameters. Repertoire size and S1000 robustly predict TCRβ diversity and are better predictors of its systematic dependence on age and sex than the model shown in (A, B).

The relationship between TCRβ diversity, repertoire size and intrinsic clonality is independent of age, sex and CMV exposure status. We fit a linear model to explicitly quantify this relationship:

D^(S,S1000)=(0.825±0.001)·S(0.965±0.003)·S1000.(1)

Here  D^ represents the predicted TCRβ diversity as a linear function of S and S1000. We note that all variables in the model are expressed in their original, non–log-transformed values. Equation 1 describes D as increasing with S and decreasing with S1000. Notably, the coefficient of S1000 is close to unity, indicating that TCRβ diversity decreases nearly one-to-one with increasing S1000. The model yields an R2 of 0.96 (Figure 4), consistent with the ∼4% residual intrinsic scatter estimated from our XGBoost model, further supporting the robustness and completeness of the relationship. These results reinforce the idea that S1000 and P1000 serve as quantitative proxies for intrinsic clonality, a systemic repertoire property that mediates TCR diversity. Despite CMV-positive and CMV-negative individuals exhibiting distinct relationships between repertoire size and intrinsic clonality (Figure 2C), both groups follow a consistent relationship linking these variables to TCR diversity, underscoring the universal and fundamental role of this relationship in characterizing immune homeostasis.

Figure 4
Scatter plot showing the relationship between measured TCR diversity (D) on the x-axis and predicted TCR diversity (D-hat) on the y-axis. Points are densely clustered along a red dashed line, indicating strong positive correlation. Axes range from zero to one million.

Figure 4. A linear model of TCRβ diversity. We model D as a linear function of S and S1000 and fit the data in five-fold cross-validation. We plot the predicted TCRβ diversity as a function the measured TCRβ diversity. The red dashed line shows one-to-one correspondence. The linear model accurately describes the measured TCRβ diversity (R2 = 0.96).

We note that the choice of the 1,000 most abundant clones is not inherently special. For instance, when using the 10 or 100 most abundant clones (S10 or S100, respectively) as features (Supplementary Figure S3), the model performance is only slightly degraded—likely due to greater statistical uncertainty in these measures. The fitted coefficients differ modestly from those obtained using S1000, reflecting the smaller dynamic range of these quantities, but the overall linear relationship remains intact. Importantly, even with S10 or S100, the model continues to account for the dependence of TCRβ diversity on age and sex, demonstrating that the form of the relationship is robust to the precise definition of our proxy for intrinsic clonality. This result reinforces that the relationship reflects the heavy-tailed structure of the repertoire rather than a privileged cutoff. Additionally, we find that low TCRβ diversity does not appear to be linked to any specific or specialized set of immune exposures and HLA genotypes are not predictive (see Supplementary Material). Taken together, these results strongly suggest that repertoire size and intrinsic clonality are the primary determinants of TCR diversity.

Discussion

TCR diversity is essential for immune competence, enabling the recognition and elimination of diverse threats. We show that repertoire size (S) and the abundance of the most expanded clones (S1000) explain nearly all the variation in TCRβ diversity (D) across individuals, including its systematic dependence on age, sex and CMV exposure. We emphasize that this relationship is not a mathematical artifact or tautology. Although S1000 is derived from a subset of highly expanded clones that represent <1% of TCRβ diversity, it robustly predicts D, a global property of the repertoire. This unexpected result indicates that a small fraction of clones encodes information about the entire distribution. We interpret this coordinated pattern of clonal expansion as the measurable manifestation of a previously unrecognized property of the immune system we term intrinsic clonality. Our proxy of intrinsic clonality does not depend on selecting the 1,000 most abundant clones. Using as few as 10 or 100 most abundant clones yields consistent results, indicating that intrinsic clonality captures a key biological property of the immune system. Thus, our findings reveal a simple but powerful organizing principle: overall TCR diversity is an emergent property of the immune system, arising from a fundamental relationship between repertoire size, intrinsic clonality and TCR diversity itself.

CMV exposure underscores the generality and resilience of the relationship between D, S and S1000, demonstrating that it holds even under strong immune perturbations. Unlike acute infections such as SARS-CoV-2 or other chronic herpes viruses like Epstein-Barr Virus, which have significantly smaller and more transient effects on the repertoire, CMV strongly perturbs homeostasis by increasing both repertoire size and intrinsic clonality. The chronic nature of CMV alone does not fully explain its outsized impact on repertoire structure and the biological reasons for its influence remain incompletely understood (34). Nevertheless, the consistency of the relationship between D, S, and S1000 across CMV-exposed and -unexposed individuals underscores its fundamental and robust nature.

We find no evidence that shared immune exposures or unmodeled host factors explain the relationship between D, S, and S1000. HLA genotype does not predict D and both the most abundant clones and their co-occurrence patterns vary widely across individuals, consistent with their origin from disparate immune exposures (Supplementary Figure S4). Additionally, a targeted search for TCRβs associated with low-diversity repertoires identified no strong candidates, further suggesting that unrecognized shared exposures are not the primary driver of diversity loss (see Supplementary Methods). However, these findings are not central to our conclusions. Rather, the key result is that S and S1000 together explain 96% of the variation in TCRβ diversity, leaving little room for additional contributors. While other variables may correlate with these quantities, repertoire size and clonality are fundamental properties of the T cell repertoire. The predictive power of S1000 reflects a coordinated pattern of clonal expansion across compartments and is robust to the specific number of clones included in the calculation (Supplementary Figure S3). Notably, S1000 is dominated by memory CD8+ T cells (Supplementary Figure S1), while D is shaped primarily by naive CD4+ T cells (35). The near one-to-one inverse relationship between S1000 and D (Equation 1) supports systemic coordination, possibly explaining the observed stability of clonal hierarchy over time (36). Together, these findings point to intrinsic, homeostatic regulation of the T cell repertoire rather than extrinsic factors.

Although the precise mechanisms remain uncertain, we propose that an immunometabolic regulatory network comprising two interdependent homeostatic processes may plausibly underlie our observations. The first regulates repertoire size through T cell competition for soluble IL-7 and IL-15 and access to stromal niches, which together determine how many total T cells can be sustained (17, 3740), effectively setting a molecular homeostatic point for T cell carrying capacity. The second governs the overall amplitude of clonal expansion through an integrated network of cytokine, metabolic and costimulatory signals that integrate through the mTOR pathway (41, 42). mTOR links the cytokine milieu to cellular metabolic state, effectively coordinating the strength of clonal expansion across the repertoire, giving rise to the system-wide property we term intrinsic clonality. These two homeostatic processes are coupled through cytokine availability, which naturally accounts for the negative correlation between repertoire size and intrinsic clonality observed in CMV-negative subjects. Increased mTOR activity promotes greater clonal expansion but also likely enhances metabolic and cytokine dependence of each cell (43, 44), intensifying competition for limited resources and resulting in a smaller overall repertoire. In contrast, the same constraints do not apply to CMV-specific T cells which are dominated by late-differentiated CD45RA+ CCR7 (TEMRA) cells. TEMRAs are largely maintained through IL-15 trans-presentation and do not rely on the soluble cytokine–mediated homeostasis that governs the rest of the repertoire (34, 45). This lack of dependence on cytokine-mediated feedback explains why in CMV-positive individuals the expansion of CMV-specific T cells is balanced by an increase in repertoire size that preserves TCR diversity and why repertoire size and intrinsic clonality become uncoupled. While our findings robustly identify repertoire size and intrinsic clonality as key regulatory parameters, the specific biological mechanisms underlying intrinsic clonality remain speculative and may involve multiple, potentially overlapping pathways beyond mTOR.

There is growing evidence that many aspects of immune repertoire organization are genetically controlled, consistent with heritable regulation of the cytokine and metabolic pathways that govern T cell homeostasis. Cytokine levels and T cell counts vary systematically with age, sex and genetic background (4653), indicating genetically encoded set points that shape T cell homeostasis. Genetic variation beyond HLA influences cytokine signaling and broader immune traits (47, 5457), and twin studies demonstrate heritability in responses to homeostatic cytokines such as IL-7 and IL-2 (58) as well as in global immune parameters (46). Variation in genes encoding components and regulators of the mTOR pathway also modulates pathway activity and immune function, indicating that the metabolic arm of this regulatory network is likewise under genetic control (47, 58, 59). Together, these findings suggest that the mechanisms of immune homeostasis are at least partly genetically encoded, while environmental factors likely act through these intrinsic pathways (6063).

Our analysis captures population-level trends in a cross-sectional manner. Small studies suggest TCR diversity is stable over short periods but declines with age (64, 65). Because our study focuses on adults, it primarily captures homeostatic regulation of established repertoires. In children, thymic production and developmental selection dominate repertoire dynamics. Applying this framework to pediatric cohorts, such as those described by Mitchell et al. (66), could help identify how homeostasis emerges. While these studies are consistent with our finding, their small sizes underscore the need for large-scale, longitudinal studies to help establish intrinsic clonality as a repertoire-wide feature. Investigating links between our findings and immunosenescence and inflammaging (67, 68) may offer further insights, as the mechanisms we propose may help explain key aspects of these age-associated phenomena. Future work could assess whether interindividual variation in mTOR pathway activity predicts intrinsic clonality and repertoire organization at the population level. These efforts could be integrated with studies leveraging high-throughput proteomics in large cohorts to elucidate how systemic cytokine levels influence TCR diversity and to identify molecular mediators of immune homeostasis. Continued integration of immune repertoire data with genomic profiling may help clarify how genetic variation modulates repertoire structure through the regulatory mechanisms we describe. Our findings provide a conceptual framework for investigating how intrinsic and extrinsic forces jointly regulate immune homeostasis, with implications for aging, disease susceptibility and therapeutic intervention.

T cells are essential for maintaining human health and their dysregulation contributes to a wide range of diseases, motivating therapeutic efforts to restore immune function (69, 70). A consistent feature of immune dysfunction is the loss of TCR diversity which is linked to poor clinical outcomes (20, 21, 24, 26, 30, 31, 71). Our findings suggest that TCR diversity is not directly regulated but instead emerges from clonal dynamics governed by repertoire size and intrinsic clonality, two properties that are likely directly regulated within immune homeostasis. By identifying these core determinants, our work provides guidance for therapeutic efforts aimed at preserving TCR diversity and restoring immune balance.

Materials and methods

Sequencing of human samples

Details of the sequencing data and IRB information are provided in Zahid et al. (20), here we highlight the most salient information. The CDR3 of TCRβ chains of T cells is sequenced with a multi-plexed PCR typically using 18µg of genomic DNA (7275). The median sequencing depth is 518,618 TCRβs with 95% of subjects having a sequencing depth between 222,082 and 853,647. The median TCRβ diversity is 319,802 and 95% of subjects have values between 120,576 and 597,890. 95% of subjects have ages ranging between 20 and 74 years with a median age of 50 years. Sex is self-identified with males comprising 47.2% and females comprising 52.5% of subjects.

T cell based CMV diagnostic

We use a sensitive and specific T cell based diagnostic on the T Detect Covid cohort to identify subjects exposed to CMV. We use a method previously described in (7678), which statistically identifies disease associated TCRβs based on serologically labeled cases and controls. 2181 labeled samples were used to build a CMV classifier with an area under the receiver operating characteristic curve (AUROC) of 0.96, measured on the same holdout set used in Emerson et al. (76). The performance of the T cell based test is comparable to serology and is limited by the accuracy of the serological labels. This diagnostic test allows us to identify subjects who are exposed to CMV using only their sequenced repertoire. Zahid et al. (20) demonstrate that CMV exposure primarily impacts TCR repertoire size.

Fitting TCRβ diversity

We first fit TCRβ diversity using the XGBRegressor routine implemented in version 2.1.6 of the XGBoost algorithm (33). We select XGBoost because of its ability to capture non-linear relationships, its strong out-of-the-box performance and its flexibility handling categorical variables like sex and CMV exposure status. We adopt the default hyperparameters of the algorithm and use its default squared error loss function. We derive predictions of TCRβ diversity using a five-fold cross-validation scheme implemented in the routine cross val predict from the scikitlearn (79) package version 1.2.0. We fit the model to a random 80% of the data and predict on the remaining 20%. This process is repeated across five distinct, randomly generated 80/20 splits of the data, ensuring that every data point is predicted without being used for model fitting. We determine feature importance by fitting all the data simultaneously.

We next fit the TCRβ diversity using a linear model described in Equation 1. We generate predictions in five-fold cross-validation and derive parameters by fitting all the data. We optimize the two parameters using the the optimize.curve fit module in version 1.15.2 of the SciPy package (80). We fit a linear model using the full dataset and generated parameter uncertainties using bootstrap resampling. Model evaluation was performed using five-fold crossvalidation and Figure 4 shows predicted versus observed TCRβ diversity values under this validation framework. The reported coefficients and uncertainties reflect the best-fit values and 1σ bootstrapped error estimates.

Estimating residual intrinsic scatter

To quantify the residual intrinsic (biological) scatter in the XGBoost model’s prediction of TCRβ diversity, we estimate and subtract the contribution of measurement error as:

σi=σr2σm2.(2)

Here σiis the intrinsic biological scatter, σris the model uncertainty and σmis the measurement uncertainty. The rationale is that for a perfect model the residual intrinsic scatter would be σi= 0, meaning the model’s uncertainty would be entirely limited by the measurement error.

Given that measurement errors in repertoire size and TCRβ diversity are correlated, we estimate the minimum achievable error (MAE) based on variability in D/S, the ratio of TCR diversity to repertoire size. Using repeat independent measurements from the same subjects, we calculate the MAE as the standard deviation of differences in Log D/S, yielding 0.027 dex (see Supplementary Material; Supplementary Figure S2). We fit TCRβ diversity using only S and S1000 as features and find the standard deviation of the fit residuals to be 0.031 dex. Subtracting the MAE from model uncertainty in quadrature yields a residual scatter of 0.015 dex, indicating that approximately 4% of the intrinsic variability in TCRβ diversity remains unexplained by S and S1000.

Data availability statement

Data tables with TCR repertoire metrics available at https://doi.org/10.5281/zenodo.14976210 and https://doi.org/10.5281/zenodo.13993996.

Ethics statement

The studies involving humans were approved by WIRB Copernicus Group Institutional Review Board. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

HZ: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. DM: Writing – original draft, Writing – review & editing. HR: Funding acquisition, Project administration, Resources, Writing – original draft, Writing – review & editing. JG: Investigation, Project administration, Resources, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research and/or publication of this article. The work was funded by Microsoft Corporation and Adaptive Biotechnologies. The author(s) declared that this work received funding from Microsoft and Adaptive Biotechnologies. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Acknowledgments

We thank the reviewers whose comments improved the manuscript. We also thank Ruth Taniguchi for discussion and feedback.

Conflict of interest

HZ and JG are employed by the company Microsoft. DM and HR are employed by Adaptive Biotechnologies.

Generative AI statement

The author(s) declared that Generative AI was used in the creation of this manuscript. The author(s) declare that ChatGPT was used for editing.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2025.1707727/full#supplementary-material

Supplementary Figure 1 | Distribution of clone frequencies for various compartments. (A) and (B) show clonal frequencies versus clone ranks derived from the sorted repertoires of 45 subjects split by CMV-negative and CMV-positive subjects, respectively. T cells are sorted on memory/naive and CD4/CD8 markers prior to sequencing. Details of the sorting procedure are provided in Zahid et al. (81). The 1000 most abundant clones are dominated by memory CD8+ T cells.

Supplementary Figure 2 | Measurement uncertainty and XGBoost model residuals. (A) Distribution of of the difference in the quantity Log D/S determined from 396 repeat samples. To avoid overestimating measurement uncertainties due to a small number of outliers, we estimate the standard deviation of the distribution by fitting a Gaussian. We attribute this measure from repeat samples of the same subjects as an estimate of the measurement uncertainties and adopt the standard deviation as the minimum achievable error for any model that is not overfitting to the data. (B) Distribution of residuals of the XGBoost model fit of TCR diversity using S and S1000 as features. The model error is only slightly larger than the measurement uncertainty, indicating that the model accounts for nearly all the intrinsic biological scatter in the data.

Supplementary Figure 3 | Modeling TCR diversity with S100 and S10 instead of S1000. (A) and (B) are the same as Figures 2A, B, respectively, but for a model fit using the fraction of repertoire comprised of the 100—not 1000—most abundant clones, i.e., S100. (C) Distribution of residuals of the model fit with S and S100. (D), (E) and (F) are the same as (A, B) and (C), respectively, for a model using S10 instead of S100. Model performance is only slightly degraded when using S100 or S10 instead of S1000.

Supplementary Figure 4 | Distribution of the number of TCRβs in the 1000 most abundant clones that are HLA-associated and their cluster membership. (A) Histogram of the number of TCRβs that are one of the 1000 most abundant which are also HLA-associated public TCRβs for all subjects. (B) Histogram of the number of distinct HLA-associated TCR clusters represented by the TCRβs in (A) for all subjects. Each cluster is interpreted as mapping to a distinct immune exposure.

Footnotes

  1. ^ Clinical Laboratory Improvement Amendments of 1988
  2. ^ https://www.fda.gov/media/146481/download
  3. ^ Here dex refers to scatter measured on a log scale such that a value of x dex indicates a relative difference of 10x.

References

1. Burnet FM. A modification of jerne’s theory of antibody production using the concept of clonal selection. CA: A Cancer J Clin. (1976) 26:119–21. doi: 10.3322/canjclin.26.2.119

PubMed Abstract | Crossref Full Text | Google Scholar

2. Hedrick SM, Cohen DI, Nielsen EA, and Davis MM. Isolation of cDNA clones encoding T cell-specific membrane-associated proteins. Nature. (1984) 308:149–53. doi: 10.1038/308149a0

PubMed Abstract | Crossref Full Text | Google Scholar

3. Yanagi Y, Yoshikai Y, Leggett K, Clark SP, Aleksander I, and Mak TW. A human T cell-specific cDNA clone encodes a protein having extensive homology to immunoglobulin chains. Nature. (1984) 308:145–9. doi: 10.1038/308145a0

PubMed Abstract | Crossref Full Text | Google Scholar

4. Davis MM, Boniface JJ, Reich Z, Lyons D, Hampl J, Arden B, et al. Ligand recognition by (alpha)(beta) T cell receptors. Annu Rev Immunol. (1998) 16:523. doi: 10.1146/annurev.immunol.16.1.523

PubMed Abstract | Crossref Full Text | Google Scholar

5. Nikolich-Žugich J, Slifka MK, and Messaoudi I. The many important facets of T-cell repertoire diversity. Nat Rev Immunol. (2004) 4:123–32. doi: 10.1038/nri1292

PubMed Abstract | Crossref Full Text | Google Scholar

6. Foster AD, Sivarapatna A, and Gress RE. The aging immune system and its relationship with cancer. Aging Health. (2011) 7:707–18. doi: 10.2217/ahe.11.56

PubMed Abstract | Crossref Full Text | Google Scholar

7. Martin MP and Carrington M. Immunogenetics of HIV disease. Immunol Rev. (2013) 254:245–64. doi: 10.1111/imr.12071

PubMed Abstract | Crossref Full Text | Google Scholar

8. Corthay A. Does the immune system naturally protect against cancer? Front Immunol. (2014) 5:197.

Google Scholar

9. Montgomery RA, Tatapudi VS, Leffell MS, and Zachary AA. HLA in transplantation. Nat Rev Nephrol. (2018) 14:558–70. doi: 10.1038/s41581-018-0039-x

PubMed Abstract | Crossref Full Text | Google Scholar

10. Kovacs AA, Kono N, Wang CH, Wang D, Frederick T, Operskalski E, et al. Association of HLA genotype with T-cell activation in human immunodeficiency virus (HIV) and HIV/hepatitis C virus–coinfectedWomen. J Infect Dis. (2020) 221:1156–66. doi: 10.1093/infdis/jiz589

PubMed Abstract | Crossref Full Text | Google Scholar

11. Francis JM, Leistritz-Edwards D, Dunn A, Tarr C, Lehman J, Dempsey C, et al. Allelic variation in class I HLA determines CD8+ T cell repertoire shape and cross-reactive memory responses to SARS-CoV-2. Sci Immunol. 7:eabk3070.

PubMed Abstract | Google Scholar

12. Granadier D, Iovino L, Kinsella S, and Dudakov JA. Dynamics of thymus function and T cell receptor repertoire breadth in health and disease. Semin immunopathology. (2021) 43:119–34. doi: 10.1007/s00281-021-00840-5

PubMed Abstract | Crossref Full Text | Google Scholar

13. Olafsdottir TA, Bjarnadottir K, Norddahl GL, Halldorsson GH, Melsted P, Gunnarsdottir K, et al. HLA alleles, disease severity, and age associate with T-cell responses following infection with SARS-CoV-2. Commun Biol. (2022) 5:914. doi: 10.1038/s42003-022-03893-w

PubMed Abstract | Crossref Full Text | Google Scholar

14. Tonegawa S. Somatic generation of antibody diversity. Nature. (1983) 302:575–81. doi: 10.1038/302575a0

PubMed Abstract | Crossref Full Text | Google Scholar

15. Jameson SC. Maintaining the norm: T-cell homeostasis. Nat Rev Immunol. (2002) 2:547–56. doi: 10.1038/nri853

PubMed Abstract | Crossref Full Text | Google Scholar

16. Janeway C, Travers P, Walport M, Shlomchik M, et al. Immunobiology: the immune system in health and disease Vol. 2. . New York: Garland Pub (2001).

Google Scholar

17. Surh CD and Sprent J. Homeostasis of naive and memory T cells. Immunity. (2008) 29:848–62. doi: 10.1016/j.immuni.2008.11.002

PubMed Abstract | Crossref Full Text | Google Scholar

18. Kumar BV, Connors TJ, and Farber DL. Human T cell development, localization, and function throughout life. Immunity. (2018) 48:202–13. doi: 10.1016/j.immuni.2018.01.007

PubMed Abstract | Crossref Full Text | Google Scholar

19. Britanova OV, Putintseva EV, Shugay M, Merzlyak EM, Turchaninova MA, Staroverov DB, et al. Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling. J Immunol. (2014) 192:2689–98. doi: 10.4049/jimmunol.1302064

PubMed Abstract | Crossref Full Text | Google Scholar

20. Zahid HJ, Taniguchi R, Noceda MG, Robbins H, and Greissl J. T cell receptor diversity, cancer and sex: insights from 30,000 TCR β Repertoires. bioRxiv. (2024), 2024–10.

Google Scholar

21. Khan N, Shariff N, Cobbold M, Bruton R, Ainsworth JA, Sinclair AJ, et al. Cytomegalovirus seropositivity drives the CD8 T cell repertoire toward greater clonality in healthy elderly individuals. J Immunol. (2002) 169:1984–92. doi: 10.4049/jimmunol.169.4.1984

PubMed Abstract | Crossref Full Text | Google Scholar

22. Naylor K, Li G, Vallejo AN, Lee WW, Koetz K, Bryl E, et al. The influence of age on T cell generation and TCR diversity. J Immunol. (2005) 174:7446–52. doi: 10.4049/jimmunol.174.11.7446

PubMed Abstract | Crossref Full Text | Google Scholar

23. Goronzy JJ and Weyand CM. T cell development and receptor diversity during aging. Curr Opin Immunol. (2005) 17:468–75. doi: 10.1016/j.coi.2005.07.020

PubMed Abstract | Crossref Full Text | Google Scholar

24. Messaoudi I, LeMaoult J, Guevara-Patino JA, Metzner BM, and Nikolich-Žugich J. Agerelated CD8 T cell clonal expansions constrict CD8 T cell repertoire and have the potential to impair immune defense. J Exp Med. (2004) 200:1347–58. doi: 10.1084/jem.20040437

PubMed Abstract | Crossref Full Text | Google Scholar

25. Qi Q, Liu Y, Cheng Y, Glanville J, Zhang D, Lee JY, et al. Diversity and clonal selection in the human T-cell repertoire. Proc Natl Acad Sci. (2014) 111:13139–44. doi: 10.1073/pnas.1409155111

PubMed Abstract | Crossref Full Text | Google Scholar

26. Palmer S, Albergante L, Blackburn CC, and Newman T. Thymic involution and rising disease incidence with age. Proc Natl Acad Sci. (2018) 115:1883–8. doi: 10.1073/pnas.1714478115

PubMed Abstract | Crossref Full Text | Google Scholar

27. Krishna C, Chowell D, Gönen M, Elhanati Y, and Chan TA. Genetic and environmental determinants of human TCR repertoire diversity. Immun Ageing. (2020) 17:1–7. doi: 10.1186/s12979-020-00195-9

PubMed Abstract | Crossref Full Text | Google Scholar

28. Cardinale A, De Luca CD, Locatelli F, and Velardi E. Thymic function and T-cell receptor repertoire diversity: implications for patient response to checkpoint blockade immunotherapy. Front Immunol. (2021) 12:752042. doi: 10.3389/fimmu.2021.752042

PubMed Abstract | Crossref Full Text | Google Scholar

29. Brown AJ, White J, Shaw L, Gross J, Slabodkin A, Kushner E, et al. MHC heterozygosity limits T cell receptor variability in CD4 T cells. Sci Immunol. (2024) 9:eado5295. doi: 10.1126/sciimmunol.ado5295

PubMed Abstract | Crossref Full Text | Google Scholar

30. Turner SJ, La Gruta NL, Kedzierska K, Thomas PG, and Doherty PC. Functional implications of T cell receptor diversity. Curr Opin Immunol. (2009) 21:286–90. doi: 10.1016/j.coi.2009.05.004

PubMed Abstract | Crossref Full Text | Google Scholar

31. Gleason L, Porcu P, and Nikbakht N. Reduced overall T-cell receptor diversity as an indicator of aggressive cutaneous T-cell lymphoma. Blood. (2022) 140:3539–40. doi: 10.1182/blood-2022-170357

Crossref Full Text | Google Scholar

32. Lindau P, Porcu P, and Nikbakht N. Cytomegalovirus exposure in the elderly does not reduce CD8 T cell repertoire diversity. J Immunol. (2019) 202:476–83. doi: 10.4049/jimmunol.1800217

PubMed Abstract | Crossref Full Text | Google Scholar

33. Chen T and Guestrin C. (2016). Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, . pp. 785–94.

Google Scholar

34. Klenerman P and Oxenius A. T cell responses to cytomegalovirus. Nat Rev Immunol. (2016) 16:367–77. doi: 10.1038/nri.2016.38

PubMed Abstract | Crossref Full Text | Google Scholar

35. Li HM, et al. TCRβ repertoire of CD4+ and CD8+ T cells is distinct in richness, distribution, and CDR3 amino acid composition. J Leucocyte Biol. (2016) 99:505–13. doi: 10.1189/jlb.6A0215-071RR

PubMed Abstract | Crossref Full Text | Google Scholar

36. Gaimann MU, Nguyen M, Desponds J, and Mayer A. Early life imprints the hierarchy of T cell clone sizes. Elife. (2020) 9:e61639. doi: 10.7554/eLife.61639

PubMed Abstract | Crossref Full Text | Google Scholar

37. Tan JT, et al. IL-7 is critical for homeostatic proliferation and survival of naive T cells. Proc Natl Acad Sci. (2001) 98:8732–7. doi: 10.1073/pnas.161126098

PubMed Abstract | Crossref Full Text | Google Scholar

38. Moses CT, Thorstenson KM, Jameson SC, and Khoruts A. Competition for self ligands restrains homeostatic proliferation of naive CD4 T cells. Proc Natl Acad Sci. (2003) 100:1185–90. doi: 10.1073/pnas.0334572100

PubMed Abstract | Crossref Full Text | Google Scholar

39. Becker TC, et al. Interleukin 15 is required for proliferative renewal of virus-specific memory CD8 T cells. J Exp Med. (2002) 195:1541–8. doi: 10.1084/jem.20020369

PubMed Abstract | Crossref Full Text | Google Scholar

40. Boyman O, Purton JF, Surh CD, and Sprent J. Cytokines and T-cell homeostasis. Curr Opin Immunol. (2007) 19:320–6. doi: 10.1016/j.coi.2007.04.015

PubMed Abstract | Crossref Full Text | Google Scholar

41. Thomson AW, Turnquist HR, and Raimondi G. Immunoregulatory functions of mTOR inhibition. Nat Rev Immunol. (2009) 9:324–37. doi: 10.1038/nri2546

PubMed Abstract | Crossref Full Text | Google Scholar

42. Powell JD, Pollizzi KN, Heikamp EB, and Horton MR. Regulation of immune responses by mTOR. Annu Rev Immunol. (2012) 30:39–68. doi: 10.1146/annurev-immunol-020711-075024

PubMed Abstract | Crossref Full Text | Google Scholar

43. Delgoffe GM, et al. The mTOR kinase differentially regulates effector and regulatory T cell lineage commitment. Immunity. (2009) 30:832–44. doi: 10.1016/j.immuni.2009.04.014

PubMed Abstract | Crossref Full Text | Google Scholar

44. Yang K, Neale G, Green DR, He W, and Chi H. The tumor suppressor Tsc1 enforces quiescence of naive T cells to promote immune homeostasis and function. Nat Immunol. (2011) 12:888–97. doi: 10.1038/ni.2068

PubMed Abstract | Crossref Full Text | Google Scholar

45. van den Berg SP, Pardieck IN, Lanfermeijer J, Sauce D, Klenerman P, Baarle D van, et al. The hallmarks of CMV-specific CD8 T-cell differentiation. Med Microbiol Immunol. (2019) 208:365–73. doi: 10.1007/s00430-019-00608-7

PubMed Abstract | Crossref Full Text | Google Scholar

46. Roederer M, Quaye L, Mangino M, Beddall MH, Mahnke Y, Chattopadhyay P, et al. The genetic architecture of the human immune system: a bioresource for autoimmunity and disease pathogenesis. Cell. (2015) 161:387–403. doi: 10.1016/j.cell.2015.02.046

PubMed Abstract | Crossref Full Text | Google Scholar

47. Aguirre-Gamboa R, Joosten I, Urbano PC, Molen RG van der, Rijssen E van, Cranenbroek B van, et al. Differential effects of environmental and genetic factors on T and B cell immune traits. Cell Rep. (2016) 17:2474–87. doi: 10.1016/j.celrep.2016.10.053

PubMed Abstract | Crossref Full Text | Google Scholar

48. Li Y, Oosting M, Smeekens SP, Jaeger M, Aguirre-Gamboa R, Le KT, et al. A functional genomics approach to understand variation in cytokine production in humans. Cell. (2016) 167:1099–110. doi: 10.1016/j.cell.2016.10.017

PubMed Abstract | Crossref Full Text | Google Scholar

49. De Craen A, Posthuma D, Remarque E, Van Den Biggelaar A, Westendorp R, and Boomsma D. Heritability estimates of innate immunity: an extended twin study. Genes Immun. (2005) 6:167–70. doi: 10.1038/sj.gene.6364162

PubMed Abstract | Crossref Full Text | Google Scholar

50. Bakker OB, Aguirre-Gamboa R, Sanna S, Oosting M, Smeekens SP, Jaeger M, et al. Integration of multi-omics data and deep phenotyping enables prediction of cytokine responses. Nat Immunol. (2018) 19:776–86. doi: 10.1038/s41590-018-0121-3

PubMed Abstract | Crossref Full Text | Google Scholar

51. Goetzl EJ, Huang MC, Kon J, Patel K, Schwartz JB, Fast K, et al. Gender specificity of altered human immune cytokine profiles in aging. FASEB J. (2010) 24:3580. doi: 10.1096/fj.10-160911

PubMed Abstract | Crossref Full Text | Google Scholar

52. Ter Horst R, Jaeger M, Smeekens SP, Oosting M, Swertz MA, Li Y, et al. Host and environmental factors influencing individual human cytokine responses. Cell. (2016) 167:1111–24. doi: 10.1016/j.cell.2016.10.018

PubMed Abstract | Crossref Full Text | Google Scholar

53. Bernardi S, Toffoli B, Tonon F, Francica M, Campagnolo E, Ferretti T, et al. Sex differences in proatherogenic cytokine levels. Int J Mol Sci. (2020) 21:3861. doi: 10.3390/ijms21113861

PubMed Abstract | Crossref Full Text | Google Scholar

54. Piasecka B, Duffy D, Urrutia A, Quach H, Patin E, Posseme C, et al. Distinctive roles of age, sex, and genetics in shaping transcriptional variation of human immune responses to microbial challenges. Proc Natl Acad Sci. (2018) 115:E488–97. doi: 10.1073/pnas.1714765115

PubMed Abstract | Crossref Full Text | Google Scholar

55. Orrù V, et al. Complex genetic signatures in immune cells underlie autoimmunity and inform therapy. Nat Genet. (2020) 52:1036–45. doi: 10.1038/s41588-020-0684-4

PubMed Abstract | Crossref Full Text | Google Scholar

56. Liston A, Humblet-Baron S, Duffy D, and Goris A. Human immune diversity: from evolution to modernity. Nat Immunol. (2021) 22:1479–89. doi: 10.1038/s41590-021-01058-1

PubMed Abstract | Crossref Full Text | Google Scholar

57. Poisner H, Faucon A, Cox N, and Bick AG. Genetic determinants and phenotypic consequences of blood T-cell proportions in 207,000 diverse individuals. Nat Commun. (2024) 15:6732. doi: 10.1038/s41467-024-51095-1

PubMed Abstract | Crossref Full Text | Google Scholar

58. Brodin P, et al. Variation in the human immune system is largely driven by non-heritable influences. Cell. (2015) 160:37–47. doi: 10.1016/j.cell.2014.12.020

PubMed Abstract | Crossref Full Text | Google Scholar

59. Saxton RA and Sabatini DM. mTOR signaling in growth, metabolism, and disease. Cell. (2017) 168:960–76. doi: 10.1016/j.cell.2017.02.004

PubMed Abstract | Crossref Full Text | Google Scholar

60. Klein SL and Flanagan KL. Sex differences in immune responses. Nat Rev Immunol. (2016) 16:626–38. doi: 10.1038/nri.2016.90

PubMed Abstract | Crossref Full Text | Google Scholar

61. Westergaard D, Moseley P, Sørup FKH, Baldi P, and Brunak S. Population-wide analysis of differences in disease progression patterns in men and women. Nat Commun. (2019) 10:666. doi: 10.1038/s41467-019-08475-9

PubMed Abstract | Crossref Full Text | Google Scholar

62. Patwardhan V, Gil GF, Arrieta A, Cagney J, DeGraw E, Herbert ME, et al. Differences across the lifespan between females and males in the top 20 causes of disease burden globally: a systematic analysis of the Global Burden of Disease Study 2021. Lancet Public Health. (2024) 9:e282–94. doi: 10.1016/S2468-2667(24)00053-7

PubMed Abstract | Crossref Full Text | Google Scholar

63. Stankiewicz LN, Salim K, Flaschner EA, Wang YX, Edgar JM, Durland LJ, et al. Sex-biased human thymic architecture guides T cell development through spatially defined niches. Dev Cell. (2025) 60:152–69. doi: 10.1016/j.devcel.2024.09.011

PubMed Abstract | Crossref Full Text | Google Scholar

64. Yoshida K, Cologne JB, Cordova K, Misumi M, Yamaoka M, Kyoizumi S, et al. Aging-related changes in human T-cell repertoire over 20 years delineated by deep sequencing of peripheral T-cell receptors. Exp Gerontology. (2017) 96:29–37. doi: 10.1016/j.exger.2017.05.015

PubMed Abstract | Crossref Full Text | Google Scholar

65. Chu ND, Bi HS, Emerson RO, Sherwood AM, Birnbaum ME, Robins HS, et al. Longitudinal immunosequencing in healthy people reveals persistent T cell receptors rich in highly public receptors. BMC Immunol. (2019) 20:1–12. doi: 10.1186/s12865-019-0300-5

PubMed Abstract | Crossref Full Text | Google Scholar

66. Mitchell AM, Baschal EE, McDaniel KA, Simmons KM, Pyle L, Waugh K, et al. Temporal development of T cell receptor repertoires during childhood in health and disease. JCI Insight. (2022) 7:e161885. doi: 10.1172/jci.insight.161885

PubMed Abstract | Crossref Full Text | Google Scholar

67. Xia S, Zhang X, Zheng S, Khanabdali R, Kalionis B, Wu J, et al. An update on inflamm-aging: mechanisms, prevention, and treatment. J Immunol Res. (2016) 2016:8426874. doi: 10.1155/2016/8426874

PubMed Abstract | Crossref Full Text | Google Scholar

68. Fulop T, Larbi A, Dupuis G, Le Page A, Frost EH, Cohen AA, et al. Immunosenescence and inflamm-aging as two sides of the same coin: friends or foes? Front Immunol. (2018) 8:1960.

Google Scholar

69. Iriguchi S, et al. A clinically applicable and scalable method to regenerate T-cells from iPSCs for off-the-shelf T-cell immunotherapy. Nat Commun. (2021) 12:430. doi: 10.1038/s41467-020-20658-3

PubMed Abstract | Crossref Full Text | Google Scholar

70. Stankiewicz LN, Rossi FM, and Zandstra PW. Rebuilding and rebooting immunity with stem cells. Cell Stem Cell. (2024) 31:597–616. doi: 10.1016/j.stem.2024.03.012

PubMed Abstract | Crossref Full Text | Google Scholar

71. Wang GC, Dash P, McCullers JA, Doherty PC, and Thomas PG. T cell receptor αβ diversity inversely correlates with pathogen-specific antibody levels in human cytomegalovirus infection. Sci Trans Med. (2012) 4:128ra42–128ra42. doi: 10.1126/scitranslmed.3003647

PubMed Abstract | Crossref Full Text | Google Scholar

72. Robins HS, Campregher PV, Srivastava SK, Wacher A, Turtle CJ, Kahsai O, et al. Comprehensive assessment of T-cell receptor β-chain diversity in αβ T cells. Blood J Am Soc Hematol. (2009) 114:4099–107. doi: 10.1182/blood-2009-04-217604

PubMed Abstract | Crossref Full Text | Google Scholar

73. Robins H, Desmarais C, Matthis J, Livingston R, Andriesen J, Reijonen H, et al. Ultra-sensitive detection of rare T cell clones. J Immunol Methods. (2012) 375:14–9. doi: 10.1016/j.jim.2011.09.001

PubMed Abstract | Crossref Full Text | Google Scholar

74. Carlson CS, Emerson RO, Sherwood AM, Desmarais C, Chung MW, Parsons JM, et al. Using synthetic templates to design an unbiased multiplex PCR assay. Nat Commun. (2013) 4:1–9. doi: 10.1038/ncomms3680

PubMed Abstract | Crossref Full Text | Google Scholar

75. Dalai SC, Dines JN, Snyder TM, Gittelman RM, Eerkes T, Vaney P, et al. Clinical validation of a novel T-cell receptor sequencing assay for identification of recent or prior severe acute respiratory syndrome coronavirus 2 infection. Clin Infect Dis. (2022) 75:2079–87. doi: 10.1093/cid/ciac353

PubMed Abstract | Crossref Full Text | Google Scholar

76. Emerson RO, DeWitt WS, Vignali M, Gravley J, Hu JK, Osborne EJ, et al. Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire. Nat Genet. (2017) 49:659–65. doi: 10.1038/ng.3822

PubMed Abstract | Crossref Full Text | Google Scholar

77. Greissl J, Pesesky M, Dalai SC, Rebman AW, Soloski MJ, Horn EJ, et al. Immunosequencing of the T-cell receptor repertoire reveals signatures specific for diagnosis and characterization of early Lyme disease. medRxiv. (2021). doi: 10.1101/2021.07.30.21261353

Crossref Full Text | Google Scholar

78. Elyanow R, Snyder TM, Dalai SC, Gittelman RM, Boonyaratanakornkit J, Wald A, et al. T cell receptor sequencing identifies prior SARS-CoV-2 infection and correlates with neutralizing antibodies and disease severity. JCI Insight. (2022) 7. doi: 10.1172/jci.insight.150070

PubMed Abstract | Crossref Full Text | Google Scholar

79. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. (2011) 12:2825–30.

Google Scholar

80. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in python. Nat Methods. (2020) 17:261–72. doi: 10.1038/s41592-019-0686-2

PubMed Abstract | Crossref Full Text | Google Scholar

81. Zahid HJ, Taniguchi R, Ebert P, Chow IT, Gooley C, Lv J, et al. Large-scale statistical mapping of T-cell receptor β sequences to Human Leukocyte Antigens. BioRxiv. (2025), 2024–4. Available online at: https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2025.1603730.

PubMed Abstract | Google Scholar

Keywords: intrinsic clonalty, T cell receptor diversity, immune repertoires, systems immunology, immune homeostasis

Citation: Zahid HJ, May D, Robins H and Greissl J (2026) A fundamental relationship between TCR diversity, repertoire size and systemic clonal expansion: insights from 30,000 TCRβ repertoires. Front. Immunol. 16:1707727. doi: 10.3389/fimmu.2025.1707727

Received: 17 September 2025; Accepted: 21 November 2025; Revised: 21 November 2025;
Published: 08 January 2026.

Edited by:

Peter S Linsley, Benaroya Research Institute, United States

Reviewed by:

Isha Monga, Weill Cornell Medicine, United States
Aaron Michels, University of Colorado, United States

Copyright © 2026 Zahid, May, Robins and Greissl. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: H. Jabran Zahid, aHphaGlkQG1pY3Jvc29mdC5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.