Tracking TCRβ Sequence Clonotype Expansions during Antiviral Therapy Using High-Throughput Sequencing of the Hypervariable Region

To maintain a persistent infection viruses such as hepatitis C virus (HCV) employ a range of mechanisms that subvert protective T cell responses. The suppression of antigen-specific T cell responses by HCV hinders efforts to profile T cell responses during chronic infection and antiviral therapy. Conventional methods of detecting antigen-specific T cells utilize either antigen stimulation (e.g., ELISpot, proliferation assays, cytokine production) or antigen-loaded tetramer staining. This limits the ability to profile T cell responses during chronic infection due to suppressed effector function and the requirement for prior knowledge of antigenic viral peptide sequences. Recently, high-throughput sequencing (HTS) technologies have been developed for the analysis of T cell repertoires. In the present study, we have assessed the feasibility of HTS of the TCRβ complementarity determining region (CDR)3 to track T cell expansions in an antigen-independent manner. Using sequential blood samples from HCV-infected individuals undergoing antiviral therapy, we were able to measure the population frequencies of >35,000 TCRβ sequence clonotypes in each individual over the course of 12 weeks. TRBV/TRBJ gene segment usage varied markedly between individuals but remained relatively constant within individuals across the course of therapy. Despite this stable TRBV/TRBJ gene segment usage, a number of TCRβ sequence clonotypes showed dramatic changes in read frequency. These changes could not be linked to therapy outcomes in the present study; however, the TCRβ CDR3 sequences with the largest fold changes did include sequences with identical TRBV/TRBJ gene segment usage and high junction region homology to previously published CDR3 sequences from HCV-specific T cells targeting the HLA-B*0801-restricted 1395HSKKKCDEL1403 and HLA-A*0101-restricted 1435ATDALMTGY1443 epitopes. The pipeline developed in this proof of concept study provides a platform for the design of future experiments to accurately address the question of whether T cell responses contribute to SVR upon antiviral therapy. This pipeline represents a novel technique to analyze T cell dynamics in situations where conventional antigen-dependent methods are limited due to suppression of T cell functions and highly diverse antigenic sequences.

To maintain a persistent infection viruses such as hepatitis C virus (HCV) employ a range of mechanisms that subvert protective T cell responses. The suppression of antigen-specific T cell responses by HCV hinders efforts to profile T cell responses during chronic infection and antiviral therapy. Conventional methods of detecting antigen-specific T cells utilize either antigen stimulation (e.g., ELISpot, proliferation assays, cytokine production) or antigen-loaded tetramer staining. This limits the ability to profile T cell responses during chronic infection due to suppressed effector function and the requirement for prior knowledge of antigenic viral peptide sequences. Recently, high-throughput sequencing (HTS) technologies have been developed for the analysis of T cell repertoires. In the present study, we have assessed the feasibility of HTS of the TCRβ complementarity determining region (CDR)3 to track T cell expansions in an antigen-independent manner. Using sequential blood samples from HCV-infected individuals undergoing antiviral therapy, we were able to measure the population frequencies of >35,000 TCRβ sequence clonotypes in each individual over the course of 12 weeks. TRBV/TRBJ gene segment usage varied markedly between individuals but remained relatively constant within individuals across the course of therapy. Despite this stable TRBV/TRBJ gene segment usage, a number of TCRβ sequence clonotypes showed dramatic changes in read frequency. These changes could not be linked to therapy outcomes in the present study; however, the TCRβ CDR3 sequences with the largest fold changes did include sequences with identical TRBV/TRBJ gene segment usage and high junction region homology to previously published CDR3 sequences from HCV-specific T cells targeting the HLA-B*0801-restricted 1395 HSKKKCDEL 1403 and HLA-A*0101-restricted 1435 ATDALMTGY 1443 epitopes. The pipeline developed in this proof of concept study provides a platform for the design of future experiments to accurately address the question of whether T cell responses contribute to SVR upon antiviral therapy. This pipeline represents a novel technique to analyze T cell dynamics in situations where conventional antigen-dependent methods are limited due to suppression of T cell functions and highly diverse antigenic sequences. The importance of T cell populations as mediators of protective immunity is well documented for a range of viral infections (1-3).
To maintain a persistent infection viruses such as hepatitis C virus (HCV) employ a range of mechanisms that subvert these protective T cell responses. These include escape from immune pressure, exhaustion of immune cells, and suppression of immune pathways. Host genetic studies have identified strong associations between specific class I and II human leukocyte antigens (HLA), which present viral peptides to T cells, and spontaneous viral clearance (4,5). However, the role of T cell responses in individuals with chronic infection is less clear. The reversion of viral escape variants to consensus sequences upon immune suppression highlights the fact that HCV-specific T cells actively exert immune pressure during persistent infection (6). Furthermore, studies have indicated that HCV-specific T cell proliferation and/ or interferon (IFN)γ production can predict sustained virological response (SVR) upon treatment, although the evidence for this is conflicting (7)(8)(9)(10)(11)(12).
Defining the role of HCV-specific T cells during both chronic infection and antiviral therapy is challenging due to limitations inherent in the experimental assays used to identify HCV-specific T cells. Conventional methods of detecting antigen-specific T cells utilize two basic techniques: (1) antigen stimulation (e.g., ELISpot, proliferation assays, and flow cytometry) and (2) antigen-loaded tetramer staining. Antigen stimulation assays rely on a functional readout of T cell activity, however, chronic HCV infection and exogenous IFNα used during antiviral therapy are both associated with suppressed T cell function. This causes difficulty when assessing T cell responses in these groups of patients. Both antigen stimulation and tetramer staining also rely on prior knowledge of the specific antigenic peptides against which immune responses are directed. For a hugely diverse pathogen such as HCV, this means that even within a single-infected individual, it is unfeasible to profile immune responses targeting all antigenic sequence variants.
During viral infection, antigens from viral proteins, presented by host HLA molecules, are recognized by naive T cells expressing antigen-specific T cell receptors (TCR). This recognition results in T cell activation, which together with co-stimulation signals, leads to the clonal expansion of viral-specific T cells that express cytokines and cytotoxic mediators to facilitate pathogen clearance. Following clearance of the invading pathogen, this clonal population of effector T cells (Teff) contracts and a proportion of these cells develop into memory T cell populations, which can persist for a number of years and can rapidly respond upon subsequent re-challenge with the same antigen (13). Throughout this process the antigen-specificity of any particular T cell clone is defined by the unique TCR expressed by the parent naive T cell (as illustrated in Figure 1A). While this is a simplified model of T cell responses, it forms the basis of our theoretical knowledge of how T cells respond to a specific antigen and suggests that it should be possible to utilize sequential TCRβ sequence clonotype size as a proxy for clonal expansion and contraction of antigen-specific T cells during infection in an antigen-independent manner ( Figure 1B).
The TCR expressed by conventional T cells is composed of a heterodimer of a single alpha and beta chain (14). This allows recognition of cognate antigen presented by HLA molecules, activating the T cell and inducing the development of an adaptive immune response. During the development and maturation of naive T cells, the TCR gene locus undergoes RAG-mediated recombination of the germline V and J gene segments (in the case of alpha and gamma receptor chains), or V, D, and J gene segments (in the case of beta and delta receptor chains) to produce a functional rearranged TCR genome locus (14,15). This rearrangement process results in a novel V-J or V-D-J junction region, and gives rise to an estimated theoretical TCRαβ repertoire diversity of 10 18 in humans, which then undergo thymic selection (16). In the case of the TCRβ chain, the rearrangement involves an estimated 47 possible V regions, two possible D regions, and 13 possible J regions (15). This hypervariable junction region, formed via gene segment rearrangement, is known as the complementarity determining region (CDR)3 and is the most diverse region defining the peptide-binding properties of the TCR.
Recently, high-throughput sequencing (HTS) technologies have been developed for the analysis of CDR3 recombination events, enabling analysis of T cell populations with unprecedented detail. In the present study, we have assessed the feasibility of HTS of the TCRβ CDR3 region to track temporal changes in T cell populations in HCV-infected individuals undergoing antiviral therapy, a setting where conventional T cell assays are limited due to antigenic variation and T cell immune suppression. We describe a pipeline to identify expanding and contracting T cell clones from sequential blood samples and highlight the potential and pitfalls of HTS-based techniques to study T cell responses during infection.

MaTerials anD MeThODs clinical Data
Clinical samples were used with informed consent, conforming to the ethical guidelines of the 1975 Declaration of Helsinki and study protocols were approved by the West of Scotland Research Ethics Committee. Blood samples were obtained from chronically HCV-infected patients who were beginning antiviral therapy as part of a larger clinical cohort (17). Blood samples were taken pre-treatment, and at weeks 1, 2, 4, and 12 following the initiation of PEGylated IFNα and ribavirin therapy. Six patients were included in the study: three patients infected with HCV genotype 1 and three patients infected with HCV genotype 3 (as detailed in Table 1). All patients were initially treated with PEGylated IFNα and ribavirin, with the HCV genotype 1-infected individuals also receiving a protease inhibitor from week 4 of therapy.
Four patients showed a rapid virologic response (RVR) with viral loads ≤50 IU/mL within 4 weeks, of whom two achieved this reduction within 2 weeks. Two patients (both infected with HCV genotype 1) showed little decline in viral load in response to PEGylated IFNα and ribavirin but cleared the virus by week 12 following the initiation of a protease inhibitor after week 4. All patients successfully cleared the virus following the completion of their treatment and all patients were classified as SVR at 6 months following the end of treatment.
amplification of the Tcr cDr3 region from Peripheral Blood Mononuclear cells DNA-free total RNA was extracted from peripheral blood mononuclear cells (PBMCs) and reverse transcribed using the SuperScript ® VILO™ cDNA Synthesis Kit (Invitrogen, Life Technologies, Paisley, UK). RNA quality was assessed on an Agilent 2200 TapeStation (Agilent Technologies, Santa Clara, CA, USA), prior to reverse transcription. To amplify the TCR CDR3 region, a previously published set of primers was utilized consisting of 45 forward primers, each specific to a functional T cell receptor beta variable (TRBV) gene segment, and 13 reverse primers, each specific to a T cell receptor beta joining (TRBJ) gene segment (18

Tcr sequence analysis
Raw sequencing reads were trimmed, to remove poor quality sequence and adaptor sequences, using fastq-mcf 1 and Trim Galore 2 . The paired-end reads were then joined using fastq-join 3 .
The remaining filtered and joined sequence reads were analyzed using the pipeline summarized in Figure 2. Briefly, reads were assigned a unique identifier and the TRBV and TRBJ primer sequences were analyzed using custom perl scripts (SeqRenamer. pl and PrimerTrim.pl). The PrimerTrim.pl script trims the primer sequences from the reads, identifies reads lacking primer sequences, and outputs TCR Vβ and Jβ usage. Sequences lacking a TRBV or TRBJ primer pair or containing two TRBV or TRBJ primer sequences were removed using custom perl scripts (RemoveSamePrimer.pl and FastqFilter.pl), and a gene usage text file was generated from the PrimerTrim.pl output (using ParseStats.pl). The .fastq sequencing files were converted to .fasta files and identical reads were grouped into TCRβ sequence clonotypes using cd-hit-est (19). For the cd-hit-est step the global sequence identity threshold was set to 100%, the word size was set at 10, and sequences were aligned in both directions. The output files were parsed using a perl script (sort_cdhit.pl) to extract the number of reads for each TCRβ sequence clonotype, filter on the basis of read frequency and output a representative sequence for each TCRβ sequence clonotype. To explore TCRβ sequence clonotypes between different samples, the output files from the sort_cdhit.pl script from two or more samples were concatenated and analyzed using cd-hit-est a second time. The output files were parsed using a perl script (Count_clstr.pl) to generate a comparison table detailing the TCRβ sequence clonotype number and size for each sample. Representative sequences from non-redundant TCRβ sequence clonotypes, which were represented by more than 0.01% of the total reads (equivalent to a frequency of 0.0001), were analyzed using IMGT/HighV-QUEST (20) to identify sequences with productive CDR3 regions. This cut-off was selected in order to reduce the detection of TCRβ sequence clonotypes that arose due to sequencing error, the frequency of which was estimated from the known primer sequences to range between 0.001 and 0.003 per nucleotide across the sequencing datasets (data not shown). This also had the effect of biasing the TCRβ sequence clonotypes toward expanded Teff and memory T cell populations as opposed to low frequency naive T cell populations. Output files detailing productive sequence frequency for each sample were generated using a perl script (ParseIMGT.pl) and final data normalization and visualization was performed in R. All perl scripts used are publically available at https://github.com/josephhughes/ TCRclust.

resUlTs
The Tcr repertoire and TrBV/TrBJ gene Usage are relatively stable within individuals over the First 12 Weeks of antiviral Therapy The results of the TCR sequencing and data processing are summarized in Table 2. Over 400,000 raw paired-end reads were obtained for each sample, which resulted in greater than 300,000 TCR sequences per sample (range 346,336-677,297) after filtering. Following the removal of singleton sequences, which likely represent sequencing/PCR errors, between 35,000 and 115,000 TCRβ sequence clonotypes were detected ( Table 2). The number of TCRβ sequence clonotypes varied markedly between patients but was relatively consistent between sampling time points, with patients 2 and 4 both having a low total number of TCRβ sequence clonotypes across all time points sampled ( Table 2). These two patients were the oldest in the study (at 51 and 65 years of age), which potentially contributed to the reduced TCRβ repertoire diversity observed. Both individuals also failed to achieve RVR, although both subsequently achieved SVR ( Table 1).
In addition to the intra-individual consistency in the total number of TCRβ sequence clonotypes, there was a high degree of intra-individual similarity in terms of gene segment usage. The frequency of TRBV and TRBJ gene segment usage within sequence datasets, from each individual patient and time point, was used to calculate a distance matrix, using Manhattan distance measures, to provide a metric for the similarity in gene segment usage between datasets, as visualized for TRBV in Figure 3A and for TRBJ in Figure 3B. The TRBV and TRBJ gene segment usage was most similar (as indicated by small distance values, visualized as dark red colors) between samples collected at different time points from the same individual. In contrast, the TRBV and TRBJ gene segment usage was less similar (visualized as a shift toward dark blue colors) between samples from different individuals. The average distance value between intra-individual datasets from different time points for TRBV gene segment usage was 0.093 versus 0.221 for the inter-individual datasets (p < 0.0001; Mann-Whitney test). The average distance value between intraindividual datasets from different time points for TRBJ gene segment usage was 0.059 versus 0.146 for the inter-individual datasets (p < 0.0001; Mann-Whitney test). In all six patients the stability of the TCR repertoire, both in terms of numbers of TCRβ sequence clonotypes and of TRBV/TRBJ gene segment usage, is maintained despite a significant decrease in circulating lymphocytes across the course of therapy (data not shown), as previously identified during IFNα-based treatments (21).

Mapping sequential Tcrβ sequence Frequencies identifies changing Tcrβ sequence clonotypes
To explore changing T cell populations at a level of detail beyond TRBV and TRBJ gene segment usage, a pipeline was established to compare productive TCRβ sequence clonotypes within individuals across different time points (Figure 2). In order to avoid detection of TCRβ sequence clonotypes arising due to sequencing error, and to bias our dataset toward expanded Teff and memory T cell populations, we focused on clonotypes with a frequency of >0.0001. While the majority of TCRβ sequence clonotypes had stable read frequencies across the 12 weeks of antiviral therapy, a minority of TCRβ sequence clonotypes showed dramatic changes in read frequency (Figure 4). In certain cases, this represented a fold change in read frequency of hundreds for individual TCRβ sequence clonotypes which either expanded or contracted between sequential time points. These dramatic changes within specific TCRβ sequence clonotypes are not apparent at the level of TCRβ gene segment usage (Figure 3) or the total number of TCRβ sequence clonotypes detected ( Table 2).
Comparing the four patients with RVR upon PEGylated IFNα and ribavirin therapy versus the two patients (patients 2 and 4) who only showed a reduction in viral load upon initiation of protease inhibitor treatment at week 4, a trend toward increased TCRβ sequence clonotype size was noted (Figure 4). The significance of this observation in regard to RVR is unclear. Patients 2 and 4 were the oldest and had lower total numbers of TCRβ sequence clonotypes across all time points, which contributed to an increase in the size of individual clonotypes when expressed as a % of total reads. This limitation in the data highlights one of the difficulties in directly comparing TCRβ repertoires between patients and necessitates larger, age-matched patient groups in future studies. The temporal nature of the TCRβ sequence data allows it to be compared to clinical changes over the course of antiviral therapy, e.g., viral load, as shown in Figure 4. This facilitates the identification of TCRβ sequence clonotypes that show frequency changes concurrent with a decline in HCV viral load. For patients 3, 5, and 6, dramatic declines in HCV viral load are associated with the large expansions of particular TCRβ sequence clonotypes (Figure 4). In these three patients, the maximum TCRβ sequence clonotype fold increase between day 0 and day 14 was 114×, 83×, and 220×, respectively. The association with declines in viral load suggests that these TCRβ sequence clonotypes may be relevant for viral control in these patients and distinguishes these clonotypes from others, which show even larger expansions, e.g., the TCRβ sequence clonotype in patient 4 that expands 742× between day 0 and day 14 (Figure 4). To explore the possible antigen-specificity of expanding TCRβ sequence clonotypes identified in Figure 4, we interrogated sequence databases containing information on experimentally validated antigen-specific TCRβ sequences. Using BLAST searches (blastp) of non-redundant human protein sequences, we identified 12 TCRβ sequence clonotypes with ≥80% identity to published HCV-specific TCRβ CDR3 sequences ( Table 3). These HCV-specific TCRβ CDR3 sequences are specific for either the HLA-B*0801-restricted 1395 HSKKKCDEL 1403 epitope or the HLA-A*0101-restricted, 1435 ATDALMTGY 1443 epitope (22). Three of the 12 TCRβ sequence clonotypes with ≥80% identity to HCV-specific TCRβ CDR3 sequences had matching TRBV and TRBJ gene segment usage ( Table 3) and highly similar junction regions, differing by two (clonotype 3765) or three amino acids (clonotypes 10041, and 11938). This provides additional evidence that the expanding TCRβ sequence clonotypes detected in the present study are HCV-specific although in the absence of functional validation this is impossible to definitively prove.

DiscUssiOn
High-throughput sequencing technologies are revolutionizing our understanding of T cell biology. The ability to detect and sequence thousands of TCR sequences simultaneously has provided detailed understanding of the clonal T cell population structure in both health and disease. In this context, techniques allowing the integration of TCR sequence data from sequential time points provide the possibility to track clonal T cell populations over time and identify responding T cell populations in the absence of prior knowledge of antigen-specificity. This is particularly relevant in human diseases where the specific antigen driving T cell activation is unknown (e.g., autoimmunity) or the nature of the infectious pathogen results in high inter-and intrahost antigenic variation (e.g., hepatitis C virus infection). It is also valuable in settings (such as HCV treatment) where T cell numbers and function are suppressed. In the present study, we have developed a pipeline for the integration of sequential TCR HTS data, allowing analysis of multiple sequential datasets each containing over half a million sequence reads.
Numerous TCRβ sequence clonotypes expanded and contracted over the first 12 weeks of therapy, despite a lack of gross changes in TRBV and TRBJ gene usage over the course of HCV antiviral therapy, and the overall stability of the TCRβ repertoire within individuals. The clonotype-specific nature of this T cell response may contribute to the conflicting data regarding the importance of HCV-specific T cell responses during HCV therapy (7)(8)(9)(10)(11)(12). Diverse T cell repertoires can predict good disease outcomes in mice infected with herpes simplex virus-1 (23); however, it is unclear if this also applies in human disease. A more diverse T cell response potentially allows for the selection of high avidity T cell clones and limits pathogen immune escape (24). In HCV infection, T cell responses are critical in controlling infection, as evidenced by class I and II HLA associations (25)(26)(27)(28), CD4 + and CD8 + depletion experiments in the chimpanzee model (29)(30)(31) and functional assays during early HCV infection (32). Studies during acute HCV infection have shown by ELISpot and flow cytometry that the T cell response during spontaneous clearance is broad, targets multiple antigens and must be sustained for HCV to be eliminated during early infection. However, only a single study has sequenced TCR repertoires from functionally validated HCV-specific T cells, and this only investigated two specific antigenic peptides (22). This small number of validated HCV-specific TCRβ CDR3 sequences limits the inferences that can be made using a sequence similarity approach as employed in the present study. In addition, this similarity approach does not prove that the identified TCRβ sequence clonotypes are HCV-specific, due to the influence that TCRα pairing and HLArestriction have on antigen recognition. This limitation in the present study highlights the need to integrate HTS techniques with the isolation of antigen-specific T cell populations to enable validation of antigen-specific TCR sequences.
Conventional studies of T cell responses during treatment are hampered by low numbers of T cells and reduced cytokine production, issues avoided when using an HTS approach. Evidence of a restoration of HCV-specific T cell populations over the course of HCV therapy has recently been obtained in patients treated with IFN-free regimes, where individuals who achieved SVR had an increased proliferation of HCV-specific CD8 + T cells within 4 weeks of starting therapy (33). Due to a reliance on HCV-specific tetramers, this study was limited to the analysis of T cells specific for one of four HCV peptides, highlighting the limitations imposed by current methodologies. Utilizing HTS of TCR sequences in a larger number of patients could help to answer fundamental questions on the role of the immune response in viral control during and immediately after treatment. It may also be of use in acute infection, where many current studies require live cells to be available from early in infection -a stage which is usually asymptomatic. HTS of TCR sequences has the potential to be carried out on stored blood samples without the need for PBMC separation. In conjunction with functional validation experiments, this would allow for a comprehensive analysis of T cell responses in a larger number of infected individuals.
The present proof of concept study highlights the large inter-individual variation detected in TCRβ sequence clonotype structures, which hampers comparisons between individuals in this sort of observational study design. In particular, patients 2 and 4, the two oldest patients in the present study, showed a lower total number of TCRβ sequence clonotypes, and a greater proportion of TCRβ sequence clonotypes with read frequencies >0.0001. This altered repertoire, with evidence of TCRβ sequence clonotype expansion, is similar to the T cell repertoire changes observed in aging individuals with human cytomegalovirus (HCMV) infection (34,35). Future studies should utilize large cohorts of age-and HLA-matched individuals with defined treatment outcomes to account for inter-individual variation and accurately address the question of whether T cell responses contribute to SVR upon antiviral therapy. The pipeline developed in the analysis presented here provides a platform for the design of these experiments.
To avoid sampling or sequencing biases that may have hindered the identification of T cell expansion or contraction, we included the use of pooled PCR reactions and a high read frequency cut-off for the analysis of TCRβ sequence clonotypes. The use of unique molecular identifiers has been recently introduced to limit the effects of PCR bias in HTS experiments and this technique offers significant advantages to the analysis of future HTS studies of the TCR locus (36,37). Sampling bias -arising from the fact that we can only sample a fraction of an individual's total T cell pool -is more difficult to address. This issue is of particular relevance for low frequency T cell clones and maintaining a high read frequency cut-off can help to avoid sampling bias. Sampling bias is well recognized in the field of population ecology where a number of techniques have been developed to control for these biases, and the application of these methods to HTS should be expanded in the future (37,38).
In summary, the analysis of T cell responses through HTS techniques has the potential to greatly expand our understanding of the role of T cells in health and disease. The development of sequential analysis of TCRβ sequence clonotypes can provide an antigen-independent view of T cell dynamics in complex tissue environments, complementing existing T cell phenotyping assays, and providing insights into the role of immunological diversity in protection from infectious disease.
aUThOr cOnTriBUTiOns MR contributed to study concept and design, acquisition and analysis of data, and wrote the manuscript. JH developed the bioinformatics pipeline and contributed to data analysis and manuscript writing. GW contributed technical and experimental support. RS, SB, and PM contributed to patient recruitment and administrative support. AP contributed to study supervision and funding. ET and JM contributed to study design, supervision, funding, and manuscript review.