Persistent T Cell Repertoire Perturbation and T Cell Activation in HIV After Long Term Treatment

Objective In people living with HIV (PLHIV), we sought to test the hypothesis that long term anti-retroviral therapy restores the normal T cell repertoire, and investigate the functional relationship of residual repertoire abnormalities to persistent immune system dysregulation. Methods We conducted a case-control study in PLHIV and HIV-negative volunteers, of circulating T cell receptor repertoires and whole blood transcriptomes by RNA sequencing, complemented by metadata from routinely collected health care records. Results T cell receptor sequencing revealed persistent abnormalities in the clonal T cell repertoire of PLHIV, characterized by reduced repertoire diversity and oligoclonal T cell expansion correlated with elevated CD8 T cell counts. We found no evidence that these expansions were driven by cytomegalovirus or another common antigen. Increased frequency of long CDR3 sequences and reduced frequency of public sequences among the expanded clones implicated abnormal thymic selection as a contributing factor. These abnormalities in the repertoire correlated with systems level evidence of persistent T cell activation in genome-wide blood transcriptomes. Conclusions The diversity of T cell receptor repertoires in PLHIV on long term anti-retroviral therapy remains significantly depleted, and skewed by idiosyncratic clones, partly attributable to altered thymic output and associated with T cell mediated chronic immune activation. Further investigation of thymic function and the antigenic drivers of T cell clonal selection in PLHIV are critical to efforts to fully re-establish normal immune function.


INTRODUCTION
Effective anti-retroviral therapy (ART) suppresses circulating human immunodeficiency virus (HIV) to undetectable levels, and has transformed the health of people living with HIV (PLHIV) by abating progression to AIDS and allowing near-normal life expectancy (1)(2)(3). However, PLHIV still experience greater morbidity due to chronic cardiovascular and respiratory disease, cancer, and infection (4)(5)(6). Some of this is attributed to exposures such as tobacco smoking (7,8) and ART-related toxicity (9), but predictors of clinical outcome in this context also include circulating markers of inflammation such as pro-inflammatory cytokines (IL6), acute phase proteins (CRP, fibrinogen) and markers of leukocyte activation (soluble CD14 and CD163), suggesting persistent immune dysregulation in PLHIV despite effective ART (10,11). Enrichment of pro-inflammatory states has also been reflected in transcriptional studies of blood monocytes (12), myeloid dendritic cells (13) and CD4 T cells (14) from ART-treated PLHIV. The immunological pathways that underpin these observations are not fully understood. They are partly attributed to low-level HIV replication in specific anatomical niches (15)(16)(17)(18), and to ongoing translocation of microbial products (19,20) as a result of reduced gastrointestinal barrier integrity, particularly related to a reduction in mucosal-associated invariant T cells (21,22) and Th17 cells (23,24). Importantly, HIV-associated changes in circulating T cell numbers may also persist despite ART. This includes incomplete reversal of CD4 T cell depletion (25) and relatively elevated numbers of CD8 T cell counts (26,27) leading to a low CD4:CD8 ratio as a persistent immunological hallmark of PLHIV (28)(29)(30)(31).
We and others have previously established that untreated HIV infection leads to a profound reduction in the diversity of T cell clones and oligoclonal T cell expansion reflected by T cell receptor (TCR) sequence analysis, resulting in skewed and highly idiosyncratic repertoires (32)(33)(34)(35)(36)(37). Effective ART over three months that reduced HIV viral load and increased circulating CD4 T cells, did not reverse the overall changes in the T cell repertoire (37). The effect of long-term ART on the T cell clonal repertoire is not known. In the present study, we sought to extend our understanding of chronic immunological dysfunction in PLHIV on long-term ART. We used high-throughput TCR sequencing to identify persistent abnormalities in the clonal T cell repertoire, and genome-wide whole blood transcriptional profiling to evaluate whether changes in the repertoire are associated with changes in expression of gene sets indicative of altered immune function.

Ethical Approvals
This study was approved by the London Hampstead Research Ethics Committee (14/LO/1409) and registered with the ISRCTN registry (http://www.isrctn.com/ISRCTN38386321). All participants provided written informed consent.

Study Population
Samples were collected as part of a prospective cohort study of PLHIV attending routine ambulatory HIV care at the Royal Free London NHS Foundation Trust, and of HIV-negative healthy controls (HC) recruited from Sexual Health and General Practice clinics. At recruitment, blood samples were collected in Tempus tubes for cryo-storage prior to RNA extraction. Clinical and laboratory data including lymphocyte subset counts and HIV viral load measurements were obtained from participants' hospital records. In addition, each participant completed the health-related quality of life EuroQol 5D (EQ5D) questionnaire, where a lower score represents a worse outcome (38), and the St George's Respiratory Questionnaire (SGRQ), a participant reported measure of respiratory health status, where a higher score represents a worse outcome (39). To minimise confounding by differences in ethnicity or gender, only samples from white European male participants were utilised for the present study.

T Cell Receptor Repertoire Sequencing and Data Processing
Total RNA was extracted from blood and subjected to next generation sequencing of alpha and beta chains of the TCR repertoire on the Illumina NextSeq platform, using an established quantitative TCR sequencing pipeline that integrates experimental library preparation and computational analysis with Decombinator (version 3.1) (40,41). All software is freely available at https://github.com/JamieHeather/ Decombinator. For each TCR, Decombinator specifies the V and J gene used, the number of V and J gene deletions (relative to the germline sequence), and the nucleotide insert sequence between the end of the deleted V and J genes. Unless otherwise stated, analyses were performed with TCRs defined by the amino acid sequence of the encoded complementary determining region 3 (CDR3), running from the last conserved cysteine in the V to the conserved phenylalanine in the FGXG motif in the J gene.

Quantifying the T Cell Receptor Repertoire
The T cell receptor repertoire of an individual is uniquely defined by the number of different TCR sequences which it contains, and their relative abundance. Many different algorithms have been developed which seek to capture this information in a single metric, so that different repertoires can be compared. In this study we use three metrics commonly used in studies of the repertoire. Repertoire richness is simply the total number of distinct TCRs present in the repertoire, irrespective of their abundance. The Gini coefficient captures repertoire inequality, ranging from zero (all TCRs are present at equal abundance) to one (repertoire contains only one TCR). The Shannon entropy, like richness, is a measure of repertoire diversity. In contrast to richness, it gives weight to sequences on the basis of their abundance. TCR-rich samples with a more even repertoire distribution yield a higher Shannon index value. Gini and Shannon indices were calculated as previously described (37,42). All three metrics are influenced by sample size, since in practice we estimate the true TCR repertoire on the basis of a small sample of the whole repertoire. To account for different sequencing depth between samples, therefore, these indices were also calculated by randomly sub-sampling individual repertoires 100 times to the same number of total sequences, and calculating the average metric for each individual. The customized python script is available at https://github.com/innate2adaptive/Decombinator/blob/master/ SupplementaryScripts/RandomlySample.py.

CMV Status Prediction and Quantitation of HIV, CMV and EBV-Reactive T Cell Receptor Sequences
CMV status of participants was predicted based on the presence or absence of CMV-targeting CDR3 beta sequences that are used in the context of the correct V and J genes, following a previously described statistical classification approach (43). Alpha and beta CDR3s among the 100 most expanded sequences were annotated as HIV, CMV or EBV-reactive if they were listed as sequences known to target these viruses in the context of the correct chain on the VDJdb database (https://vdjdb.cdr3.net/; accessed 26/11/2019) (44).

Intra-Individual Similarity of T Cell Receptor Sequences
To measure similarity between pairs of CDR3s within individual repertoires, the Levenshtein distance (the minimum number of single-character edits required to turn one string into another) was calculated using the R package stringdist.

Inter-Individual Sharing of Identical T Cell Receptor Sequences
Pairwise assessments of the overlap of identical CDR3 sequences between individuals were made with the Jaccard index, which is the number of TCRs shared between two repertories counting each unique TCR only once (i.e. independent of abundance). Sharing of identical CDR3 sequences among PLHIV or HC was further determined by counting in how many individuals a given CDR3 appeared. To correct for the larger size of the group of PLHIV (n=26 versus n=12 HC), this analysis was performed in 100 random samples of 12 out of the 26 PLHIV samples available.

Blood RNA Sequencing and Data Processing
Genome wide mRNA sequencing of the same samples subjected to TCR repertoire analysis was performed as previously described (45), resulting in a median of 26 million (range 21-31 million) 41 bp paired-end reads per sample. RNAseq data were mapped to the reference transcriptome (Ensembl Human GRCh38 release 95) using Kallisto (46). The transcript-level output counts and transcripts per million (TPM) values were summed on gene level and annotated with Ensembl gene ID, gene name, and gene biotype using the R/Bioconductor packages tximport and BioMart (47,48). Downstream analyses were restricted to gene biotypes with selected BioMart annotations (protein coding, IG_C_gene, IG_D_gene, IG_J_gene, I G _ V _ g e n e , T R _ C _ g e n e , T R _ D _ g e n e , T R _ J _ g e n e ,

Molecular Degree of Perturbation
The modified MDP was derived using the mdp R/Bioconductor package (50). This provides a single measure of the quantitative difference between a given transcriptome and a standard reference representative of a healthy state. For each participants' transcriptome the MDP was represented by the median Z score >2 of individual gene expression values calculated by subtracting the mean and dividing by the standard deviation of gene expression values among the 12 healthy controls used as the standard reference.

Ingenuity and Reactome Pathway Analysis
Ingenuity pathway analysis (Qiagen) was used to identify the interactome of differentially expressed genes, and to probe interacting genes further for predicted upstream regulators. The ten most significant upstream regulators with activation Z-scores >2 were visualized as a network in Gephi v0.9.2. Reactome pathway enrichment of differentially expressed, interacting genes was analyzed with the XGR R package (51). For visualization, 20 pathway groups were identified by hierarchical clustering of Jaccard indices to quantify similarity between the gene compositions of each pathway. For each group, the pathway with the largest total number of genes was then selected to provide a representative annotation.

Transcriptional Modules
The HIV module was derived from blood microarray data of an independent set of healthy controls and HIV patients before and after three months of anti-retroviral therapy. Transcriptional profiling by Agilent microarrays (SurePrint G3 Human Gene Expression v3 8×60K or Human Gene Expression v2 4×44K platform) and subsequent data processing were undertaken as previously described (52). Probes that were represented on both microarray platforms were retained for analysis. Probe annotations were downloaded from Agilent's eArray web portal, and duplicate gene names removed to retain the gene name with highest average expression across all samples. Differential expression among the resulting 14,706 gene names was analyzed using Mann-Whitney tests in MultiExperiment Viewer v4.9 (http://www.tm4.org/mev.html) with FDR <0.05. Genes with a median expression value ≥2−fold in the untreated HIV patient group compared to healthy controls were included in the HIV module (Supplementary File 1). The derivation and validation of the macrophage type 1 IFN module has previously been published (53). The T helper cell type 1 IFN module was derived from published transcriptomes of CD4 T cells cultured in the presence of IFNa and stimulated with anti-CD3 and anti-CD28 (dataset GSE54627) (54). Genes over-expressed more than 1.5-fold compared to Th1, Th2, or Th17 polarized CD4 T cells by paired t-test with p <0.05 were included in the T cell type 1 IFN module (Supplementary File 2). To quantify CD4 T cell frequency we used the 'Cluster0127: High in CD4 T cells' module by Mabbott et al. (55), and to quantify CD8 T cell frequency we used the 'CD8' module by Watkins et al. (56). In a comparison with 15 other CD4 T cell modules and 7 other CD8 T cell modules, both these modules achieved the highest sensitivity and specificity for their target cell type, when applied to transcriptomic data from other immune cells across multiple datasets, as quantified by the modular discrimination index (MDI) score (57). Gene module scores were calculated as mean expression of the constituent gene names in each module.
Where duplicate gene names were present in the RNAseq data, the highest log 2 TPM value was used for each sample.

Statistics
Analyses were performed in R (version 3.6.0) or python (version 2.7.15) as described above. Statistical differences were assessed in GraphPad Prism (version 8.3.1) using the tests stated in the text and Figure legends, and considered significant for p <0.05.

T Cell Receptor Repertoires Remain Disturbed in People Living With HIV Despite Effective Anti-Retroviral Therapy
We used a cross-sectional study design to test the hypothesis that the T cell clonal repertoire returns to normal after long-term ART by comparison of 26 PLHIV on long-term ART and 12 HIV-negative controls among white European men, with comparable age range and health-related questionnaire scores ( Table 1). PLHIV were on ART for a median of 8.5 years   In TCR sequencing data derived from whole blood RNA, we interpreted unique alpha or beta chain sequences as surrogates for individual T cell clones. The number of total TCR sequences recovered for both alpha and beta chains were similar in PLHIV and controls ( Figure 1A). However, three metrics which have been widely used to capture the TCR repertoire profile in a single number (reflecting both the number of distinct TCRs and their relative abundance) were significantly different among PLHIV and controls. First, repertoire richness (simply the total number of distinct alpha and beta chains in a sample independent of their relative abundance) was significantly reduced in PLHIV ( Figure  1B). Second, the frequency distribution of alpha and beta TCR sequences was more skewed (showed greater inequality) in PLHIV, represented by a higher Gini index ( Figure 1C). Third, the repertoire diversity captured by the Shannon entropy, which reflects the number of distinct TCRs but gives different weight to sequences on the basis of their abundance, was significantly lower in PLHIV ( Figure 1D). These differences did not arise from any systematic differences in read depth, because they were still observed after randomly sub-sampling individual repertoires to the same number of total sequences (Supplementary Figure 1). None of these measures correlated with duration of therapy or pre-ART CD4 counts, but all were significantly associated with concurrent CD8 counts and the CD4:CD8 ratio ( Table 2).
We tested the hypothesis that CD8 T cells were expanded in our long-term ART-treated PLHIV compared to controls, as previously reported (26,27). Blood lymphocyte subset counts were not available in our HIV-negative control subjects for comparison to those of PLHIV. Instead, we used expression of validated transcriptional signatures for CD4-and CD8-positive T cells to compare their frequency in the different groups. We found comparable expression of the transcriptional CD4 T cell signature, but higher expression of the transcriptional CD8 T cell signature among PLHIV (Supplementary Figure 2A). Accordingly, the ratio of CD4:CD8 transcripts was lower in PLHIV compared to controls (Supplementary Figure 2B). To support the validity of this transcriptional analysis, we showed that CD4 and CD8 T cell counts correlated with their respective transcriptional signatures among PLHIV where flow cytometric qu antitation of lymphocyte subsets was available (Supplementary Figure 2C). Likewise, the flow cytometric quantitation of the CD4:CD8 ratio correlated with the ratio of CD4:CD8 transcripts (Supplementary Figure 2C). Taken together, these data suggest that the persistent decrease in TCR repertoire diversity in PLHIV on long-term ART may be driven by oligoclonal expansion of CD8 T cells.

Oligoclonal T Cell Expansion in People Living With HIV Is Unrelated to HIV, CMV or EBV
We confirmed oligoclonal expansion of the TCR repertoire in long-term ART-treated PLHIV more directly by focusing on the    (Figure 2A). Similarly, the frequency distribution of the 100 most abundant sequences was significantly shifted to larger clone sizes in PLHIV, with the mean abundance being two-fold greater compared to the controls (Figures 2B, C). The mean abundance of the 100 most common sequences was positively correlated with concurrent CD8 counts in PLHIV (  Figure 3A). Oligoclonal CD8 T cell expansion is also a hallmark of CMV infection (58), with an even larger proportion of the memory CD8 T cell pool recognizing CMV in HIV-infected individuals compared to HIV-uninfected controls (59,60). CMV serology was not available on all PLHIV and control subjects for which we had TCR sequence data. Instead, we used the TCR sequencing data to predict CMV serostatus as previously described (43). In this analysis, 87% of all the participants in our cohort were predicted to be CMV-positive with similar proportions in PLHIV and controls   (Table 1). Based on the known CMV serology status in a subset of our PLHIV group (n=13), the overall accuracy of this classification approach was 85% (10/11 CMV-positive and 1/2 CMV-negative individuals being correctly identified). In view of the high prevalence for CMV positivity among our cohort, we evaluated the frequency of CMV-reactive CDR3 sequences from VDJdb as described above for HIV-reactive sequences. Among PLHIV, 161/ 2311 (7%) alpha CDR3 sequences and 54/2401 (2.2%) beta CDR3 sequences were CMV-reactive. This compared to 94/1067 (0.2%) alpha CDR3 sequences and 20/1165 (1.7%) beta CDR3 sequences among controls. Neither alpha nor beta chain CMV-associated sequences were enriched in PLHIV compared to controls ( Figure  3B). Similarly, extension of this analysis to another prevalent herpes virus, EBV, failed to show any virus-specific CDR3 sequence enrichment among PLHIV ( Figure 3C). Taken together, these data reject the hypothesis that oligoclonal T cell expansion in PLHIV is due to HIV, CMV or EBV-associated responses.

Reduced Intra-Individual Similarity of CDR3 Sequences in People Living With HIV
In order to test whether specific or related antigens other than HIV, CMV or EBV may drive oligoclonal expansion of T cell clones in PLHIV compared to controls, we measured intraindividual CDR3 amino acid sequence similarity on the premise that TCRs with more similar CDR3s are more likely to recognize related antigens (61). We measured similarity between CDR3 pairs within an individual, using the Levenshtein distance (the minimum number of single-character edits required to turn one string into another) (37). To reduce computational time, we restricted this analysis to the 2500 most abundant CDR3 sequences. For each individual we constructed a similarity network connecting CDR3 sequences (nodes) that differed by a Levenshtein distance of one ( Figure 4A). In this analysis, the convergence of TCR clones into fewer but larger clusters would suggest that the repertoire may be targeting the same or related antigens. In fact, the number of clusters was equivalent in PLHIV and controls, and the cluster sizes were smaller in PLHIV among alpha chain sequences ( Figures 4B, C). In a sensitivity analysis, we found similar results when the analysis was restricted to CDR3 sequences that were present at least three times in an individual, and more likely to be derived from memory T cells (40) (Supplementary Figures 3B, C). These data suggest that  expanded clones in long-term ART treated PLHIV do not arise as a result of reactivity to a common target, but may represent expansion of a divergent population of T cells directed at a heterogenous set of antigens in each individual.

People Living With HIV Have More T Cell Receptors With Unusually Long CDR3 Sequences
To further investigate the mechanisms that contribute to the greater dissimilarity of CDR3 sequences within individual PLHIV, we compared the frequency distributions of Levenshtein distances across PLHIV and controls. The integrated data from PLHIV showed a population of highly dissimilar CDR3 sequences with Levenshtein distances >20 that were not present in controls ( Figure 5A). CDR3s with a clone size of ≥3 are most likely to represent memory T cells, whereas those that are present only once are most likely to represent naïve T cells (40). The highly dissimilar CDR3s in PLHIV were only evident in the least abundant sequences, suggesting that they were a feature of naïve T cells ( Figure 5A). At the individual level, this observation was not evident in every case, but more pronounced in a subset of PLHIV ( Figure 5B). The average CDR3 length of TCRs is 13-15 amino acids (62). Therefore, we hypothesized that such large Levenshtein distances of >20 reflect the presence of TCRs with unusually long CDR3 sequences in PLHIV. We evaluated this hypothesis both on the level of nucleotide and amino acid sequences. PLHIV had marginally more alpha and beta TCRs with long (>40) nucleotide inserts at the VJ gene junction, while the number of nucleotide deletions from the V and J genes (relative to their germline sequences) was comparable to controls (Supplementary Figure 4). However, some PLHIV showed more alpha and beta TCRs with very long CDR3 sequences (>20 amino acids) (Figures 5C, D). Consistent with the analysis of Levenshtein distances, these findings were more evident in the least abundant clones likely to represent naïve T cells. Such sequences are thought to be restricted during normal thymic selection (63). Therefore, their presence in peripheral blood of PLHIV may reflect thymic dysfunction.

Reduced Inter-Individual Sharing of Identical CDR3 Sequences in People Living With HIV
In our previous study of untreated and short-term treated HIV disease, we had found a reduction in the number of 'public' CDR3 sequences (public being defined as the number of sequences shared by more than one individual) (37). Interestingly, public sequences have also recently been reported to be dependent on normal thymic selection (64). In view of the evidence of potential thymic dysfunction among PLHIV described above, we tested the hypothesis that long-term ART treated PLHIV show a persistent reduction in public TCRs. Using the Jaccard index to calculate the overlap of identical CDR3s between pairs of PLHIV, we found that inter-individual sharing of both alpha and beta chain sequences remained significantly decreased in PLHIV compared to controls ( Figure 6A). Similarly, the number of CDR3 sequences that were highly public (found in at least 50% of individuals) was substantially reduced among the group of PLHIV ( Figure 6B).

Persistent T Cell Activation in People Living With HIV on Long-Term Anti-Retroviral Therapy
In order to evaluate the functional significance of the persistent changes in the T cell repertoire among PLHIV, we looked for evidence of associated immune dysfunction at systems level, by whole blood genome-wide transcriptional profiling of the same samples. PLHIV had significantly greater perturbation of blood transcriptional profiles compared to control subjects ( Figure 7A), as measured by the modified molecular degree of perturbation (MDP) (50), representing the extent to which each individual transcriptome deviated from the mean of control subjects as a standard reference. Next, we assessed the number of differentially expressed genes (DEG) between the two groups (FDR <0.05, Figure 7B). Of 353 DEG, 281 genes were higher in PLHIV. We hypothesized that these would reflect enrichment of specific immunological pathways which represented differences in functional immunological profiles at steady state. To evaluate these pathways, we first identified a subset of 149 genes that are predicted to interact directly or indirectly, thereby reflecting biological systems. This interactome showed enrichment of immune response pathways (Supplementary Figure 5,  Supplementary File 3). Upstream regulator analysis revealed the interactome to represent molecules primarily involved in T cell activation and interferon (IFN)g signaling ( Figure 7C,  Supplementary File 3). Together, these data suggest persistently elevated T cell activation in PLHIV on long-term ART. Importantly, the mean expression of the genes representing these systems correlated with the reduction in diversity of the TCR repertoires ( Figure 7D).

Anti-Retroviral Therapy Attenuates or Resolves Gene Signatures Associated With Untreated HIV Infection
Finally, we evaluated the extent to which the blood transcriptional findings in long-term ART-treated PLHIV were distinct from those that are associated with untreated HIV. We took advantage of independent microarray transcriptomic data from our previous longitudinal case-control study of TCR repertoires in HIV-infected individuals before and after three months ART compared to HIV-negative controls (37). In this dataset, we observed a large number of DEG that were enriched in untreated HIV patients compared to HIV-negative controls (4542 of 9292 DEG). In addition to a statistical cut-off (FDR <0.05), we applied a fold change filter of ≥2 to derive a signature of 434 transcripts that were most increased in untreated HIV (Supplementary Figure 6A, Supplementary File 1). Consistent with previous reports on transcriptional changes in untreated HIV (65)(66)(67), this signature was enriched for immune-and cell cycle-associated pathways, and dominated by type 1 IFN signaling (Supplementary Figures 6B, C, Supplementary File 1).
There was significant but partial reversal of the expression of this untreated HIV-associated signature after three months ART ( Figure 8A). In RNAseq data from our cohort of PLHIV on long-term ART, the expression of this transcriptional signature overlapped with that of HIV-negative controls, albeit with a distribution that remained statistically higher ( Figure 8A).
In view of the prediction from our bioinformatic analysis that the untreated HIV signature was primarily driven by type 1 IFN signaling, we extended our analysis to evaluate the expression of two separate type 1 IFN-inducible transcriptional signatures independently derived from macrophages (53) and T cells (Supplementary File 2), respectively. The expression of both these signatures was significantly enriched in untreated PLHIV, partially normalized in response to three months of ART, and comparable to HIV-negative controls in PLHIV on long-term ART ( Figures 8B, C). Taken together, our analysis of blood transcriptomic data suggests that long-term ART in PLHIV leads to resolution of elevated type 1 IFN activity associated with untreated infection, but increased levels of T cell activationassociated IFNg activity.

DISCUSSION
We present the first paired TCR repertoire sequencing and global transcriptomic profiling of blood samples from PLHIV. Similar to our previous analysis of untreated or short-term ART-treated HIV (37), PLHIV on long-term ART still showed significantly less TCR repertoire diversity compared to HIV-negative controls, underpinned by oligoclonal expansion of T cell clones. Based on evidence for a persistently elevated CD8 T cell fraction in the peripheral blood of long-term ART-treated PLHIV, we hypothesise that CD8 T cells are responsible for the persistent changes in the TCR repertoire. We hypothesise that increased clonal expansions are unlikely to be driven by HIV antigens themselves, since HIV levels remain undetectable in PLHIV. They also cannot be explained by differences in CMV reactivity that has otherwise been reported to drive large proportions of CD8 T cell responses (58)(59)(60). Identifying the targets of the expanded T cells will be important in trying to understand the underlying pathology associated with PLHIV. In addition, two lines of evidence suggest that persistent abnormalities of the TCR repertoire in long-term ART-treated PLHIV may reflect dysfunctional thymic output. First, we found increased frequency of long CDR3 sequences among the least abundant clones, most likely to represent the naïve T cell fraction, that are typically deleted in normal thymic selection (63). Secondly, we found reduced frequency of public TCR sequences that are thought to depend on normal thymic selection (64).
We complemented our repertoire analysis with evidence of T cell dysfunction in PLHIV by bioinformatic analysis of genomewide transcriptomes. We found that the striking perturbation of the blood transcriptome in untreated HIV, reflecting an exaggerated type 1 IFN response, was largely resolved in longterm ART-treated PLHIV. Nonetheless, the blood transcriptome remained consistently abnormal in these patients with changes in sets of genes linked to increased T cell activation which were significantly correlated to changes to the TCR repertoire. The present study is limited to white European adult males. Future extension to more diverse demographic groups is required to confirm its generalizability. In addition, we have not established the activation phenotype or the antigen specificity of the expanded T cells we observe in PLHIV. Single cell sequencing analysis will be needed to address this question, and to validate our application of unique alpha and beta sequences as surrogates for T cell clones. The possibility that they may react to microbial antigens arising from translocation of gastrointestinal products, or perhaps abnormally presented self-antigens, offers plausible alternative hypotheses to test. We were also not able to determine the relative contribution of oligoclonal expansion of CD8 T cells and abnormal recovery of the CD4 T cell repertoire.
Our data are consistent with a model in which thymic dysfunction may lead to repopulation of the T cell repertoire with clones that have greater propensity for functional dysregulation, for example as a result of autoreactivity or abnormal MHC restriction and manifest in the peripheral blood transcriptome with evidence of increased T cell activation. Such a model represents an important paradigm shift in our understanding of the mechanisms of chronic immune activation among PLHIV. In view of the relationship between chronic immune activation and adverse clinical outcomes in long-term ART-treated PLHIV, answers to these questions may inform novel therapeutic approaches to restore normal immune function in HIV infection and further reduce chronic morbidity in this population.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the London Hampstead Research Ethics Committee. The patients/participants provided their written informed consent to participate in this study.