Proteogenomic identification of Hepatitis B virus (HBV) genotype-specific HLA-I restricted peptides from HBV-positive patient liver tissues

Srivastava, Mayank; Copin, Richard; Choy, Augustine; Zhou, Anbo; Olsen, Olav; Wolf, Sarah; Shah, Darshit; Rye-Weller, Anna; Chen, Lisa; Chan, Newton; Coppola, Angel; Lanza, Kathryn; Negron, Nicole; Ni, Min; Atwal, Gurinder S.; Kyratsous, Christos A.; Olson, William; Salzler, Robert

doi:10.3389/fimmu.2022.1032716

ORIGINAL RESEARCH article

Front. Immunol., 13 December 2022

Sec. Cancer Immunity and Immunotherapy

Volume 13 - 2022 | https://doi.org/10.3389/fimmu.2022.1032716

This article is part of the Research TopicImmune Regulation and Intervention in Virus-related Tumor DevelopmentView all 8 articles

Proteogenomic identification of Hepatitis B virus (HBV) genotype-specific HLA-I restricted peptides from HBV-positive patient liver tissues

Mayank Srivastava^*

Richard Copin

Augustine Choy

Anbo Zhou

Olav Olsen

Sarah Wolf

Darshit Shah

Anna Rye-Weller

Lisa Chen

Newton Chan

Angel Coppola

Kathryn Lanza

Nicole Negron

Min Ni

Gurinder S. Atwal

Christos A. Kyratsous

William Olson

Robert Salzler^*

Regeneron Pharmaceuticals Inc., Tarrytown, NY, United States

The presentation of virus-derived peptides by HLA class I molecules on the surface of an infected cell and the recognition of these HLA-peptide complexes by, and subsequent activation of, CD8⁺ cytotoxic T cells provides an important mechanism for immune protection against viruses. Recent advances in proteogenomics have allowed researchers to discover a growing number of unique HLA-restricted viral peptides, resulting in a rapidly expanding repertoire of targets for immunotherapeutics (i.e. bispecific antibodies, engineered T-cell receptors (TCRs), chimeric antigen receptor T-cells (CAR-Ts)) to infected tissues. However, genomic variability between viral strains, such as Hepatitis-B virus (HBV), in combination with differences in patient HLA alleles, make it difficult to develop therapeutics against these targets. To address this challenge, we developed a novel proteogenomics approach for generating patient-specific databases that enable the identification of viral peptides based on the viral transcriptomes sequenced from individual patient liver samples. We also utilized DNA sequencing of patient samples to identify HLA genotypes and assist in target selection. Liver samples from 48 HBV infected patients, primarily from Asia, were examined to reconstruct patient-specific HBV genomes, identify regions within the human chromosomes targeted by HBV integrations and obtain a comprehensive view of HBV peptide epitopes using our HLA class-I (HLA-I) immunopeptidomics discovery platform. Two previously reported HLA associated HBV-derived peptides, HLA-A02 binder FLLTRILTI (S_194-202) from the large surface antigen and HLA-A11 binder STLPETTVVRR (C_141-151) from the capsid protein were validated by our discovery platform, but both were detected at very low frequencies. In addition, we identified and validated, using heavy peptide analogues, novel strain-specific HBV-HLA associated peptides, such as GSLPQEHIVQK (P_606-616) and variants. Overall, our novel approach can guide the development of bispecific antibody, TCR-T, or CAR-T based therapeutics for the treatment of HBV-related HCC and inform vaccine development.

Introduction

Chronic HBV infection affects ~250 million people worldwide and can result in serious complications including acute and chronic hepatitis, cirrhosis, and hepatocellular carcinoma (HCC) (1). Vaccination against HBV is effective at preventing infection, but there is currently no cure for patients chronically infected with the virus. Prolonged exposure of the liver to high viral loads results in persistent inflammation, which can stimulate fibrogenic and cirrhotic processes that can eventually lead to HCC. Additionally, the integration of HBV DNA fragments into the host genome can introduce genetic damage and chromosomal instability that can further contribute to the development of HCC (2). Globally, it is estimated that chronic HBV infections account for ~50% of primary liver cancers (3).

The persistence of HBV infection, or its control, is determined by the host immune response, which, following HBV evasion of the innate immune system, is primarily mediated by the adaptive immune system. Cytotoxic CD8+ T cells protect a host against intracellular pathogens and tumors (4). As such, cytotoxic CD8+ T cells play a pivotal role in defending against HBV infection, and are primarily responsible for clearing virus in acute infections (5), as well as preventing development of HCC. In cases of chronic HBV infection, HBV-specific CD8+ T cells are frequently undetectable, which is thought to result from the chronic exposure of the liver to high antigenic loads and tolerogenic hepatocytes (6).

Because of the critical role of CD8+ T cells in defending against viral infections, the cytotoxic T cell activity against infected cells can be utilized as an effective therapeutic approach. Cytotoxic T-cells mediate adaptive immunity by recognizing HLA-peptide complexes presented on the surface of infected cells via interactions with the T cell receptors (TCRs) expressed on their surface. These HLA-peptide complexes are utilized as targets for developing different immunotherapeutics including bispecific antibodies and CAR-Ts, but a lack of broadly expressed epitopes has made it difficult to implement these therapeutic strategies. Early attempts at identifying HBV-specific T cells typically utilize various T cell stimulation/proliferation assays (7, 8). However, the presence of these T cells does not necessarily mean that the peptide/HLA that they recognize is present HBV/HCC tumor cells. The most direct way to identify targetable HBV-HLA epitopes in HBV-HCC is to perform immunopeptidomics on these cells.

The emerging field of proteogenomics has been successfully applied to identify unique HLA-associated mutant peptides in cancer, characterizing HLA-presented epitopes that are expressed specifically by various tumors types (9–12). Previous work on identifying HBV-HLA epitopes in HCC has been performed, identifying several HBV peptides (13, 14). The greatest challenge to identifying these epitopes from HBV-HCC tissue is that HBV covalently closed circular DNA (cccDNA), the template for viral replication, is frequently absent from tumor cells. Instead, tumor cells contain fragments of HBV DNA that have been inserted into the host genome (15) that can be transcribed and, ultimately, result in the expression of viral antigens on the cell surface. This poses a challenge for identifying HBV epitopes as the exact regions of the HBV genome that are integrated can differ from patient to patient (16). Further complicating this effort is the existence of thousands of HBV strains distributed across 10 different HBV genotypes (genotype A to J) (17, 18), which differ from each other by at least 8% at the nucleotide level. These challenges emphasize the need for a new proteogenomic approach for comprehensive identification of HBV epitopes from patients. Here, we developed a novel proteogenomic approach to detect HLA-associated HBV peptides from HBV infected cells. We focused our efforts on characterizing the repertoire of HLA-I-HBV peptides that are expressed in the livers of infected patients, as this approach would likely provide the most accurate representation of HBV epitopes in the population. To achieve this, we performed bulk RNA sequencing of HBV-infected liver samples from Asian populations (19) to generate patient-specific HBV genome and construct associated-transciptome databases that were used to inform HLA-I-HBV peptide discovery from the same liver tissue. HLA-I associated peptides were purified from homogenized liver samples using immunoenrichment of HLA-I complexes and identified by subsequent mass spectrometric analysis (20). Our analysis identified previously reported HBV antigens derived from the large surface antigen and polymerase proteins as well as a novel HLA-A11 associated polymerase peptide. The polymerase peptide and its associated sequence variants collectively were found in a majority of our HLA-A11+ patients. We demonstrate that our proteogenomic discovery platform provides a new avenue for identifying patient specific HBV genome and associated-peptides as targets for anti-HBV related-HCC immunotherapy.

Experimental section

HBV-positive HCC tissue procurement

All patient tissues were purchased from Proteogenex or BioIVT in fresh frozen form. The patient details including age, sex and diagnostic tests are provided in Supplementary Table S1. Tissues were cryopreserved until sample preparation.

RNA sequencing

Total RNA was extracted from human liver tissue using MagMAX kit (ThermoFisher). Strand-specific RNA-seq libraries were prepared from 1 µg RNA using KAPA stranded mRNA-Seq Kit (KAPA Biosystems). Twelve-cycle PCR was performed to amplify libraries. The amplified libraries were size-selected at 400~600 bp using PippinHT (Sage Science). Sequencing was performed on Illumina HiSeq^®2500 (Illumina) by multiplexed paired-read run with 2X100 cycles.

Mapping of patient liver RNA sequences to HBV reference genome

Bulk RNA-seq reads were aligned to reference genome ayw (NC_003977.2) using minimap2 (v2.17) (21). Alignments were then sorted by coordinates and quality controlled by samtools flagstats (v1.9) (22) and bedtools genomeCoverageBed (v2.17.0) (23) inspection. Duplicates were then marked and removed by the Picard toolkit (v2.18.2) (https://github.com/broadinstitute/picard).

Workflow to reconstruct HBV genomes from patient liver RNAseq data

Paired-end Illumina RNA reads from each sample were de novo assembled into large contigs using megahit (24) (options: –min-count 3 –k-min 27 –k-max 127 –prune-level 2) and mapped using BLAST to HBV reference genome, ayw (NC_003977.2) to select HBV specific sequences. BLAST parameters for sequence comparisons included outfmt ‘7 std sgi stitle’; minimum E value = 0.001; cost to open a gap = 5; cost to extend a gap = 2; length of best perfect match = 11; reward for a nucleotide match = 2; reward for a nucleotide mis-match = -3. Contigs without BLAST matches were discarded as well as BLAST results with E values greater than 0.001, percentage identity below 79% or alignment length of less than 50 nucleotides. Overlapping contigs were merged using custom scripts and final sequence that covered the entire or partial length of the reference sequence with the highest identity was selected. The final sequence was mapped against the reference genome to extract and translate all coding sequences. Non-redundant protein sequences were then added to customized databases for peptide identification by mass-spectrometry.

Workflow to identify integration sites from patient liver RNAseq data

Contigs matching the HBV reference genome (ayw) were extracted and mapped against human reference genome (GRCh38) using BLAST with parameters described above. Contigs matching both HBV and human (hybrid contigs) were collected for annotation. The HBV and human integration sites of the hybrid contigs were annotated against the genome features in HBV ayw Genbank file and human GRCh38 GTF file.

HLA genotyping for patient samples

Sample Preparation and Sequencing: DNA sample quantity was determined by fluorescence (Life Technologies) and quality was assessed by running 25 ng of sample on a 1% pre-cast agarose gel (Life Technologies). Samples with high molecular weight gDNA with a majority of the DNA fragments greater than 20 kb and a concentration of no less than 10 ng/ul passed the quality assessment. The DNA samples were normalized to 10 ng/ul and 50 ng was used to amplify the full-length HLA amplicons. The targets were amplified in three pools of variant tolerant primers optimized for similar binding temperatures and PCR conditions with LA Taq DNA Polymerase (Takara Bio). The resulting amplicon pools were combined equimolarly as determined by automated capillary electrophoresis (Agilent) and fluorescence (Life Technologies). DNA libraries were prepared for Illumina-based sequencing with a custom NEB kit (New England Biolabs). The amplicons were enzymatically fragmented to a mean insert size of 250 bp and universal adapters were ligated onto the DNA fragments. Unique 10 base pair barcode sequences were added to the DNA fragments during PCR with NEBNext Ultra II Q5 Master Mix (New England Biolabs, M0544L) to facilitate highly multiplexed sequencing. The samples were pooled and sequenced using 150 base pair paired-end sequencing on an Illumina Nextseq 500.

Data Analysis: Upon completion of sequencing, raw data from each Illumina Nextseq run was gathered in local buffer storage and uploaded to a local high-performance computing platform for automated analysis. The FASTQ-formatted reads were converted from the BCL files and assigned to samples identified by specific barcodes using the bcl2fastq conversion software (Illumina Inc., San Diego, CA). All the reads in sample-specific FASTQ files were subjected to HLA typing analysis using an updated version of PHLAT program (25) with the reference sequences consisting of GRCh38 genomic sequences and HLA type reference sequences in the IPD-IMGT/HLA database v3.30.0 (26).

Tissue lysis, and HLA affinity enrichment

The tissues were pulverized using SPEX SamplePrep Freezer/Mill Dual-Chamber Cryogenix Grinder (Cole-Parmer) in liquid nitrogen, and lysed in ice-cold lysis buffer (1% NP-40, 150 mM NaCl, 50 mM Tris-HCl pH 8.0 and 10 mM EDTA pH 8.0) supplemented with HALT protease and phosphatase inhibitors (Thermo Scientific, cat. 78442) on ice using sonication.

Anti-HLA Class I (W6/32) was conjugated to NHS-sepharose beads by overnight incubation in coupling buffer (0.2 M NaHCO₃ + 0.5 M NaCl pH 8.3) in 4°C. Reaction was quenched with 0.1 M Tris-HCl pH 8.5, and beads were washed with the Tris-HCl solution and 0.1 M acetate buffer.

The pre-cleared tissue lysate was passed through a column packed with 1 ml of HLA Class-I beads bed under gravity. The column was subsequently washed with Seppro Dilution Buffer (Sigma, cat. S4199) and 20 mM Tris-HCl pH 8, and HLA-peptide complexes were eluted with 0.1 M glycine pH 2.7.

Sample preparation for mass spectrometry

The glycine elute was loaded onto the C₁₈ Sep-Pak (Waters, cat. WAT054960), followed by selective elution of peptides by 30% ACN/0.1% TFA. The peptides were further cleaned up by ziptip (C₁₈ resin), and analyzed by nano-LC-MS/MS.

LC-MS/MS

HLA peptides as above were loaded onto a nanoViper Acclaim PepMap100 C₁₈ trap column (75 μm i.d. × 2 cm, 3 μm, 100 Å, Thermo) and were separated using a nanoViper Acclaim PepMap RSLC C₁₈ column (75 μm i.d. × 25 cm, 2 μm, 100 Å, Thermo) heated to 40°C and retrofitted with a New Objective SilicaTip (7 cm) with a distal conductive coating at the inlet end of the emitter. The gradient was delivered by an EASY-nLC 1200 HPLC system (Thermo) at 300 nL/min. The following 120-minute elution gradient with mobile phase A (Water/0.1% formic acid) and B (80% Acetonitrile/0.1% formic acid) was used: 3% B at 3 min, linear to 35% B at 100 min, and linear to 45% B at 123 min. The peptides eluted from the column were ionized via Flex ion source at 1.9 kV and analyzed by the Thermo Fusion Lumos Tribrid mass spectrometer (Thermo) using Xcalibur 4.1.31.9 (Thermo). Data acquisition was performed in data-dependent mode, where survey scans were carried out in the high field Orbitrap analyzer (range of m/z 300-1500 at a resolution of 60,000) with the automatic gain control target of 4.0E5 and maximal ion fill time of 100 ms. The MS/MS analyses were performed by 1.2 m/z precursor ion isolation with the quadrupole, applying normalized HCD (higher-energy collisional dissociation) collision energy of 32%, and analysis of fragment ions in the Orbitrap at a resolution of 15,000. Dynamic exclusion window was set to 6 seconds, monoisotopic precursor selection (MIPS) to peptide, maximum injection time to 100 ms, and charge states unknown. +1-+4 charge states were included and the advanced peak determination was toggled on. For FAIMS-enabled experiments, the settings were identical except the FAIMS device was placed between the nanoelectrospray source and the mass spectrometer. FAIMS separations were performed with the following settings: inner and outer electrode temperature set to 100°C (except where noted), FAIMS carrier gas flow of 5.0 L/min, asymmetric waveform with DV −5000 V, entrance plate voltage 250 V, and CV settling time of 25 ms. The FAIMS carrier gas is N₂, and the ion separation gap is 1.5 mm. The noted CVs were applied to the FAIMS electrodes. For external stepping or single CV experiments, the selected CV was applied to all scans throughout the analysis. For internal CV stepping experiments, each of the selected CVs was applied to sequential survey scans and MS/MS cycles (1 s); the MS/MS CV was always paired with the appropriate CV from the corresponding survey scan.

Identification of HBV peptides from HLA-I immunopeptidomics of patient liver samples

All mass spectrometry raw files were searched against a consolidated database of human UniProtKB (homo sapiens) and HBV protein sequences, obtained from RNAseq of patient liver samples as described above, with PEAKS DB search engine (27). PEAKSX+ (PEAKS Studio 10.5, Bioinformatics Solutions Inc.) was used for De novo-assisted database search with precursor mass tolerance of 8 ppm, and fragment ion tolerance of 0.02 Da. Enzyme selectivity was set to none, and methionine oxidation as the only variable modification with two maximum allowed modifications per peptide. The search was performed with a 5% false-discovery rate (FDR) at peptide level, and peptides were further filtered based on -logP score of 20 (corresponding to 1% FDR).

Peptide validation using synthetic heavy peptide analogues

The HLA-eluted peptides were mixed with synthetic peptides (50 fmol) containing heavy Leucine (¹³C(6)¹⁵N(1), +7.0172) and analyzed by LC-MS/MS. The identity of endogenous peptides was confirmed by comparing the elution and fragementation profile with the corresponding heavy peptides. Furthermore, the abundance of endogenous peptides was estimated based on the intensity of heavy and endogenous peptides, using Avogadro’s number. For this calculation, we assumed a 50% sample loss during the sample preparation, and 100 million cells as the sample amount.

\frac{\frac{E n d o g e n o u s i n t e n s i t y}{H e a v y I n t e n s i t y} \times (6.023 \times 10^{23} c o p i e s / m o l) \times (1 \times 10^{- 15} m o l)}{(1 \times 10^{8} c e l l s) \times 50 %}

Results

Generation of RNAseq databases from HBV infected patient samples

Comparative genomic analyses are often guided by a limited number of ‘reference’ genomes. As of August 2022, the National Center for Biotechnology and Information (NCBI) database contains a repository of more than 14,300 HBV genome sequences classified into at least ten distinct phylogenetic lineages (A to J https://www.ncbi.nlm.nih.gov/nuccore). While each lineage is represented by a single HBV genome, these reference sequences do not capture the full breadth of genomic diversity present in nature. This lack of coverage can negatively impact identification of new viral peptides as the sequence of patient isolates can greatly diverge from the reference genome sequence used for the comparative analysis.

To circumvent this problem, we employed a novel RNAseq-based approach to generate a database of patient-specific HBV genome sequences directly reconstituted from the transcribed virus RNA sequences amplified from patient liver samples (Figure 1A). Total human RNA reads are first converted into large contigs using a de-novo assembly approach, and contigs with loose homology to HBV reference genomes are marked as HBV-specific sequences. HBV contigs are then grouped into two categories based on the percentage of the contig sequence covering the HBV reference. Fully-mapped contigs to HBV reference are re-arranged in a reference free fashion to assemble patient-specific HBV genomic sequences. Contigs with hybrid matches to the HBV reference are mapped against the human genome and the hybrid contigs matching both HBV and human sequences are selected. The patient-specific viral sequence is then used to extract and translate all HBV coding sequences (Figure 1B). Using this approach, we identified a total of 80 unique coding sequences (large S= 14, middle S= 14, small S= 16, pre-capsid= 6, capsid= 12, X= 16 and polymerase= 2) retrieved from virus isolates that are representative of four phylogenetic lineages (A, B, C, D) (Figure 1C). In accordance with the reported prevalence of HBV genotypes in Asian populations, HBV genotypes B and C were most commonly identified, having been found in 24 of the 48 samples (12 each). HBV genotypes A and D were detected in only one patient sample each (Table 1). RNA sequencing data for 18 of the samples had low HBV coverage, suggesting either an absence of HBV virus in these samples, possibly due to viral latency or clearance, or low HBV expression.

FIGURE 1

Figure 1 Approach to identify HBV specific signatures using bulk RNA sequencing of human liver samples. (A) De-novo sequencing strategy to reconstruct patient specific HBV genomes and identify HBV-integration sites within human chromosomes. RNA reads from patient samples are assembled into large contigs. Contigs fully matching HBV reference are used to reconstruct genomes from patient-infecting HBV strains. Specific coding sequences from each reconstructed genome are saved into a large patient-specific HBV database. In parallel, hybrid contigs matching both human and HBV references are filtered to identify site of integration within the human genomes using a homology sequence alignment approach. (B) Coverage of total RNA reads from bulk RNA sequencing of all human liver samples to HBV reference genome. (C) Phylogenetic relationship between reconstructed HBV genomes and reference HBV sequences representative of the eight HBV lineages (A–H). The tree is generated using a Neighbor-joining phylogenetic approach rooted on the mid-point. (D) Location of HBV integrated genome breakpoints across all samples. The RNAseq data revealed junctions where HBV sequences were integrated into human sequences. We noted the HBV sequence at these junctions and determined their location in the HBV genome, designated as “breakpoints”. The frequency of the genomic locations of these breakpoints across all studied samples are displayed here in 100 nucleotide intervals.

TABLE 1

Table 1 The Patient sample ID, diagnosis from vendor, and HBV serological test results for all the patient samples analyzed, along with number of HBV unique peptides detected by mass spectrometry and HBV genotype identified from RNAseq results.

Read mapping analysis showed that the depth of coverage of virus reads was high and consistent in the genomic region encoding the S and X proteins, while coverage for the rest of the genome contained significant gaps (Figure 1B). Additionally, these reads often terminate in the middle of an HBV gene. These gaps and partial reads suggest that the reads could be coming from RNA transcribed from partially integrated HBV DNA sequences instead of cccDNA. Of note, the N-terminal region of the genome-spanning polymerase gene was poorly covered, whereas the C-terminal portion of the polymerase gene was well detected. For the majority of HBV+ samples, we also identified hybrid contigs combining both HBV and human sequences. These sequences are expected to result from integration of HBV DNA into the coding sequence of a human gene. The genomic location of the HBV sequence in these junctions is referred to as “breakpoints”. We mapped the HBV breakpoints found in RNAseq reads with HBV/human junctions and found that the majority of HBV breakpoints occurred between nucleotide positions 1601-1900 in the HBV genome (Figure 1D) while no clear pattern was observed for the host genome (Supplementary Table S2). This data is consistent with a previous report by Yang et al (28), suggesting that at least some of the detected HBV sequences are coming from integrated DNA and not cccDNA. The reconstitution of HBV genome and the HBV sequences from patient liver samples were utilized to discover HBV peptides presented on the surface of HCC tumors.

Characterization of the immunopeptidome of HBV-infected patient samples

HLA Class-I complexes were immunopurified from homogenized hepatocellular carcinoma (HCC) liver tissue samples from 48 HBV seropositive patients (Proteogenex and BioIVT, Table 1, Supplementary Table S1) using an anti-HLA Class-I antibody (W6/32). Western blotting of liver lysates, flow throughs, and enrichment elutions demonstrated a near complete capture of HLA-I by W6/32 linked sepharose beads (Supplementary Figure S1). Following elution of the complexes, HLA-I bound peptides were identified using an Orbitrap Fusion Lumos mass spectrometer coupled to a nanoLC system. The raw mass spectrometry (MS) data files were searched against a combined database of human UniProt and HBV sequences derived from the RNA HBV reads from each patient sample. On average, approximately 8,500 total peptides were detected from liver samples from variably aged patients (Supplementary Figure S2). The distribution of peptide lengths isolated from the majority of liver tissue samples was consistent with those expected for Class I HLA enrichment, with peptides nine residues in length (9-mers) being the most abundant, followed by peptides ten (10-mers) and eleven (11-mers) residues in length (Figure 2A). Peptides with nine to twelve residues constituted over 50% of the total peptides isolated (Supplementary Figure S2).

FIGURE 2

Figure 2 (A) Peptide length distributions by mass spectrometry for all HBV-positive patient liver tissues by serology. The raw files were searched with PEAKSX with 5% FDR at peptide level. A vast majority of 9-mers (peptides with 9 amino acid residues) were detected from the HLA class-I enrichment from a high percent of patient liver samples. (B) Plot showing the %Binders to HLA-A, HLA-B and HLA-C alleles from identified 9-mer peptides by mass spectrometry. The HLA alleles were identified from DNA sequencing of each patient sample. A vast majority of detected nonamer peptides by mass spectrometry were predicted binders to the patient HLA alleles predicted by NetMHC pan 4.0. The combined percent binders to the three HLA alleles was over 100% for some of the samples due to promiscuity in peptide binding to HLA as predicted by NetMHC (Supplementary Table S3).

High percentage of identified peptides are predicted to bind HLA using NetMHC pan 4.0 analysis

The polymorphic nature of HLA-I has a profound effect on the epitopes presented on the tissue surface. Hence, DNA sequencing of liver tissue samples was performed to determine the HLA genotype (Table 2), which was used to predict the binding of detected peptides by NetMHC pan 4.0 (29). We found that a very high percentage (50 – 100% for most of the samples) of the detected 9-mer peptides were predicted binders of the identified HLA-I in the corresponding patient samples (Figure 2B; Supplementary Table S3). The percent of nonamers that bound to the patient HLAs exceeded 100% for some samples, owing to the promiscuity in peptide binding predicted by NetMHC Pan 4.0.

TABLE 2

Table 2 HLA class I alleles identified for all HBV-infected patient samples from DNA sequencing.

We next performed a motif analysis of detected peptides to identify overrepresented residues at different peptide positions using Seq2Logo (30) (Supplementary Figure S3). As expected, anchor residues at position 2 and 9, which specify HLA-I binding, were significantly overrepresented in all the tissue samples, with residue identities corresponding to the HLA-I genotype present. For example, hydrophobic anchor residues (i.e. Leu and Val) characteristic of HLA-A02 binders were enriched at positions 2 and 9 of peptides isolated from HLA-A02+ patient samples. Together, the high percentage of predicted HLA binders and the overrepresentation of characteristic anchor residues in isolated peptides provide a high degree of confidence in the HLA-I enrichment and peptide identifications.

Identification of unique HLA-associated HBV peptides using HBV genotype-specific databases

We reasoned that the number of HLA-I-presented HBV peptides would depend on the degree of HBV infection, proteasomal processing of antigen, and tissue sample quality. Indeed, the number of HBV specific peptide sequences identified from infected livers varied between samples and generally showed low sequence coverage of HBV proteins (Table 1; Figure 1B). In total, we identified 54 unique HBV peptides with lengths of nine to twelve residues using a PEAKS ion cut off score of 20 (Table 3). These peptides were detected from patient samples with diverse HLA genotype (Figure 3A). The most frequent HLA allele in the patient samples analyzed was HLA-A11 followed by HLA-B15 and HLA-A02. A majority of the viral peptides detected were NetMHC pan 4.0 predicted binders of the HLA-I allele present in the corresponding patient sample (Figure 3B; Supplementary Table S4). The large surface antigen protein was the source of a few strong HLA-A02 predicted binders (NetMHC pan 4.0) that were identified: GLSPTVWLSV (S_348-357), GMLPVCPLL (S_276-284), FLLTRILTI (S_194-202) and GMLPVCPLI (S_276-284), as well as a weak binder NILSPFMPLL (S_370-379) and were all found at a relatively low frequencies (Table 3). In contrast, some HLA-A11 associated peptides: STLPETTVVRR (C_141-151) from capsid protein and SAISSTFSK from polymerase were found at relatively higher frequencies (25-30%) (Figure 3C). Notably, we detected novel HLA-A11 restricted HBV-genotype specific polymerase peptides GSLPQEHIVQK (P_606-616) and GSLPQEHIIQK (P_606-616) from genotype B, and GTLPQEHIVHK (P_606-616) and GTLPQEHIVQK (P_606-616) from genotype C. The former peptide variant is predicted to be a weak HLA-A11 binder, while the latter three peptides are predicted to be strong binders of HLA-A11 by NetMHC pan 4.0 (Figure 3B). More importantly, this polymerase peptide and its variants (GSLPQEHIVQK, GSLPQEHIIQK, GTLPQEHIVHK and GTLPQEHIVQK) have a combined frequency of ~74% (14 out of 19 samples) in HLA-A11 tissue samples (Figure 3C, Supplementary Table S5). Notably, no HBV peptides were detected from any of the 18 tissues from which RNA sequencing had no or low HBV coverage, further validating our proteogenomic approach and highlighting a good correlation between the two complementary methods.

TABLE 3

Table 3 HBV peptides and proteins detected by search against patient-specific databases.

FIGURE 3

Figure 3 (A) The four most frequent HLA alleles from HLA-A, HLA-B and HLA-C among the patient samples analyzed, along with the number of samples that were HBV positive by Mass Spectrometry (HBV peptides detected). As expected, a high percent patient samples were HLA-02 and HLA-A11 samples due to Asian population being the source of the samples procured. (B) HBV peptide sequences identified from immunopeptidomics of patient liver samples, and their predicted binding to patient HLA alleles predicted by NetMHCpan 4.0. Strong binders are ranked below 0.5 and weak binders between 0.5 – 2, according to NetMHC prediction. For the HBV peptides with predicted binding to more than one alleles, the affinity rank for the strongest binding is shown. Refer to Supplementary Table S4 for the complete list of HBV peptides along with their binding affinities to the patient HLA alleles predicted by NetMHCpan 4.0. (C). Target HBV peptides and their frequencies of being detected in HBV+ samples. Refer to Supplementary Table S5 for the patient samples in which these peptides were detected.

Having discovered novel HBV Polymerase variant peptides, which are absent in Uniprot and other public databases, for the first time utilizing RNAseq of patient samples, we next aimed to validate the peptide identities using synthetic peptide analogues. HBV peptides predicted to bind HLA-A02 and HLA-A11 by NetMHC, including FLLTRILTI (S_194-202), STLPETTVVRR (C_141-151), GSLPQEHIVQK (P_606-616) and GTLPQEHIVHK (P_606-616), were validated by retention time alignment and matching the fragmentation patterns of peptides with their synthetic heavy versions (Figure 4 and Supplementary Table S6). The synthetic peptides used for this analysis differ from the HBV peptides by a heavy Leucine (¹³C(6)¹⁵N(1), +7.0172) and were mixed with HLA-eluted peptides prior to LC-MS/MS analysis. All peptides tested eluted at the same retention time and in the same MS1 scan as their heavy analogues in the LC-MS/MS run (Supplementary Figure S4) and also exhibited a similar fragmentation profile. The extracted ion chromatogram was also utilized to estimate the expression level or the copy number of some of the target peptides, including the polymerase peptide variants (copy number of ~100 – 500/cell) from the intensity of corresponding heavy peptide analogue parent ions. Even at relatively low surface presentation, we could differentiate between the polymerase peptide P_606-616 variants based on the fragment ions (Figure 4). These HLA-I restricted HBV peptides provide novel targets for developing a potential immunotherapy.

FIGURE 4

Figure 4 Validation of HBV peptides by heavy analogues (Supplementary Table S6). (A, C) Validation of two novel Polymerase variants. (B, D) Validation of two previously reported peptides by heavy peptide analogues. Synthetic heavy peptides with sequences analogous to target HBV peptides and a modified heavy leucine (Thermo), 50 fmol, were spiked-in into the HLA-eluted peptides and separated on the nano-LC connected online to the Mass Spectrometer. The MS raw files were searched against the combined database of human Uniprot and HBV with heavy leucine (+7.02) as a variable modification using PEAKS X. The fragmentation pattern (b- and y-ions) of HBV peptides and their heavy analogues along with their co-elution and identification in the same MS1 scan were used for peptide validation.

Discussion

T cell based therapeutics are a promising treatment for HBV-HCC. As such, the identification of appropriate peptide targets for these T cell therapeutics is critically important. Here we report the identification of HBV peptides from HBV+ HCC patient samples. We first reconstructed the specific HBV genome present in each patient sample using RNAseq. Then using these patient-specific sequences, we were able to analyze the LC-MS/MS data from each patient sample using an appropriate reference sequence that would account for any patient specific mutations. Some of the identified peptides have been previously reported (i.e. STLPETTVVRR (C_141-151)) (31), while others are novel. We detected 45 previously unreported peptides out of 54 total identified HLA-I restricted HBV peptides (Supplementary Figure S5) (32). Of particular interest, we were able to identify a novel peptide GSLPQEHIVQK (POL_606-616) and its variants from HBV polymerase. This peptide is a predicted HLA-A11 binder and is the most frequent HBV epitope found, appearing (together with its variants) in 74% of HLA-A11+ patient samples.

The reconstruction of the HBV genomes present in each sample also provides valuable insight to what regions of the HBV genome are integrated into hepatocytes during HCC development. We discovered that the region of the genome encompassing the small surface antigen protein, X protein, and the latter half of the polymerase protein was frequently detected. Conversely, the region encompassing the capsid protein and the first half of the polymerase protein was either not detected or only partially detected. This is in agreement with previous reports of integration of HBV genome fragments rather than the entire genome. Understanding what is frequently integrated is important to choosing a peptide target for T cell based therapeutics as a peptide that falls in a region of the HBV genome that is not commonly integrated into HCC cells would not be ideal.

Together, the combined use of the RNAseq data and LC/MS-MS data reported here demonstrate a sound approach to finding and choosing appropriate peptide targets for T cell based therapeutics. The ideal peptide target would be highly abundant on the surface of cells, found commonly across patients, conserved across multiple HBV genotypes, and fall in a region of the HBV genome that is commonly integrated in HCC. Using the methods reported here, we identified a peptide that meets some of these criteria. The POL_606-616 peptide was frequently found in the HLA-A11 patient samples and does fall in the HBV genome region that is commonly integrated. Its conservation across HBV genotypes is not ideal as there is usually 2-3 amino acid changes across the different variants detected. However, it may be possible to design T cell based therapeutics around the regions of this peptide that are conserved. As we use this approach to examine more patient samples representing different HLA alleles, we may be able to find other peptides that fit more of these criteria.

Data availability statement

The mass spectrometry immunopeptidomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (33) partner repository with the dataset identifier PXD037270.

Author contributions

MS, RC, AuC, OO, GA, CK, WO, and RS conceptualized the study. MS performed HLA immunoaffinity enrichment from patient samples, prepared samples for Mass Spectrometry, and analyzed data. RC conceptualized the strategy to identify HBV reads from bulk RNA sequencing of patient liver samples, and reconstruct HBV genomes. RC, AZ, and AuC performed read mapping analysis and also identification of integration sites. SW, NC, AnC, KL, NN, and MN performed DNA and RNA sequencing and HLA genotyping. OO, GA, CK, WO, and RS analyzed data and supervised the project. MS, RC, and AuC drafted the manuscript with contribution from OO and editing by WO and RS. All authors contributed to the article and approved the submitted version.

Conflict of interest

Regeneron authors own options and/or stock of the company. The novel proteogenomics method and unique HBV peptides detected in the study have been described in a pending provisional patent application. GA, CK, and WO are officers of Regeneron.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

This study received funding from Regeneron Pharmaceuticals, Inc. Regeneron Pharmaceuticals was involved in the study design, collection, analysis, interpretation of data, and the writing of this article.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2022.1032716/full#supplementary-material

References

1. Tsai KN, Kuo CF, Ou JJ. Mechanisms of hepatitis b virus persistence. Trends Microbiol (2018) 26(1):33–42. doi: 10.1016/j.tim.2017.07.006

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Bousali M, Papatheodoridis G, Paraskevis D, Karamitros T. Hepatitis b virus DNA integration, chronic infections and hepatocellular carcinoma. Microorganisms (2021) 9(8):1787. doi: 10.3390/microorganisms9081787

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Liu L. Clinical features of hepatocellular carcinoma with hepatitis b virus among patients on nucleos(t) ide analog therapy. Infect Agent Cancer (2020) 15:8. doi: 10.1186/s13027-020-0277-y

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Meng F, Zhen S, Song B. HBV-specific CD4+ cytotoxic T cells in hepatocellular carcinoma are less cytolytic toward tumor cells and suppress CD8+ T cell-mediated antitumor immunity. Apmis (2017) 125(8):743–51. doi: 10.1111/apm.12704

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Thimme R, Wieland S, Steiger C, Ghrayeb J, Reimann KA, Purcell RH, et al. CD8(+) T cells mediate viral clearance and disease pathogenesis during acute hepatitis b virus infection. J Virol (2003) 77(1):68–76. doi: 10.1128/JVI.77.1.68-76.2003

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Ferrari C. HBV and the immune response. Liver Int (2015) 35 Suppl 1:121–8. doi: 10.1111/liv.12749

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Chen A, Wang L, Zhang J, Zou L, Jia Z, Zhou W, et al. H-2 kd-restricted hepatitis b virus-derived epitope whose specific CD8+ T lymphocytes can produce gamma interferon without cytotoxicity. J Virol (2005) 79(9):5568–76. doi: 10.1128/JVI.79.9.5568-5576.2005

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Brinck-Jensen NS, Vorup-Jensen T, Leutscher PD, Erikstrup C, Petersen E. Immunogenicity of twenty peptides representing epitopes of the hepatitis b core and surface antigens by IFN-gamma response in chronic and resolved HBV. BMC Immunol (2015) 16:65. doi: 10.1186/s12865-015-0127-7

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Bassani-Sternberg M, Bräunlein E, Klar R, Engleitner T, Sinitcyn P, Audehm S, et al. Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry. Nat Commun (2016) 7:13404. doi: 10.1038/ncomms13404

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Chong C, Müller M, Pak H, Harnett D, Huber F, Grun D, et al. Integrated proteogenomic deep sequencing and analytics accurately identify non-canonical peptides in tumor immunopeptidomes. Nat Commun (2020) 11(1):1293. doi: 10.1038/s41467-020-14968-9

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Qi YA, Maity TK, Cultraro CM, Misra V, Zhang X, Ade C, et al. Proteogenomic analysis unveils the HLA class I-presented immunopeptidome in melanoma and EGFR-mutant lung adenocarcinoma. Mol Cell Proteomics (2021) 20:100136. doi: 10.1016/j.mcpro.2021.100136

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Cleyle J, Hardy MP, Minati R, Courcelles M, Durette C, Lanoix J, et al. Immunopeptidomic analyses of colorectal cancers with and without microsatellite instability. Mol Cell Proteomics (2022) 21(5):100228. doi: 10.1016/j.mcpro.2022.100228

PubMed Abstract | CrossRef Full Text | Google Scholar

13. de Beijer MTA, Bezstarosti K, Luijten R, Doff WAS, Boor PPC, Pieterman RFA, et al. Immunopeptidome of hepatocytes isolated from patients with HBV infection and hepatocellular carcinoma. JHEP Rep (2022) 4(11):100576. doi: 10.1016/j.jhepr.2022.100576

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Dong LQ, Peng LH, Ma LJ, Liu DB, Zhang S, Luo SZ, et al. Heterogeneous immunogenomic features and distinct escape mechanisms in multifocal hepatocellular carcinoma. J Hepatol (2020) 72(5):896–908. doi: 10.1016/j.jhep.2019.12.014

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Tantiwetrueangdet A, Panvichian R, Sornmayura P, Sueangoen N, Leelaudomlipi S. Reduced HBV cccDNA and HBsAg in HBV-associated hepatocellular carcinoma tissues. Med Oncol (2018) 35(10):127. doi: 10.1007/s12032-018-1191-7

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Tan AT, Yang N, Lee Krishnamoorthy T, Oei V, Chua A, Zhao X, et al. Use of expression profiles of HBV-DNA integrated into genomes of hepatocellular carcinoma cells to select T cells for immunotherapy. Gastroenterology (2019) 156(6):1862–76.e9. doi: 10.1053/j.gastro.2019.01.251

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Kao J-H, Chen D-S. HBV genotypes: epidemiology and implications regarding natural history. Curr Hepatitis Rep (2006) 5(1):5–13. doi: 10.1007/s11901-006-0017-9

CrossRef Full Text | Google Scholar

18. Sunbul M. Hepatitis b virus genotypes: global distribution and clinical importance. World J Gastroenterol (2014) 20(18):5427–34. doi: 10.3748/wjg.v20.i18.5427

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Schweitzer A, Horn J, Mikolajczyk RT, Krause G, Ott JJ. Estimations of worldwide prevalence of chronic hepatitis b virus infection: a systematic review of data published between 1965 and 2013. Lancet (2015) 386(10003):1546–55. doi: 10.1016/S0140-6736(15)61412-X

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Purcell AW, Ramarathinam SH, Ternette N. Mass spectrometry-based identification of MHC-bound peptides for immunopeptidomics. Nat Protoc (2019) 14(6):1687–707. doi: 10.1038/s41596-019-0133-y

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics (2018) 34(18):3094–100. doi: 10.1093/bioinformatics/bty191

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence Alignment/Map format and SAMtools. Bioinformatics (2009) 25(16):2078–9. doi: 10.1093/bioinformatics/btp352

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (2010) 26(6):841–2. doi: 10.1093/bioinformatics/btq033

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph. Bioinformatics (2015) 31(10):1674–6. doi: 10.1093/bioinformatics/btv033

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Bai Yu, Ni M, Cooper B, Wei Yi, Fury W. Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads. BMC Genomics (2014) 15(1):325. doi: 10.1186/1471-2164-15-325

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SGE. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res (2015) 43:D423–31. doi: 10.1093/nar/gku1161

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Ma B, Zhang K, Hendrie C, Liang C, Li M, Doherty-Kirby A, et al. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Communication Mass Spectrometry (2003) 17(20):2337–42. doi: 10.1002/rcm.1196

CrossRef Full Text | Google Scholar

28. Yang L, Ye S, Zhao X, Ji L, Zhang Y, Zhou P, et al. Molecular characterization of HBV DNA integration in patients with hepatitis and hepatocellular carcinoma. J Cancer (2018) 9(18):3225–35. doi: 10.7150/jca.26052

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Jurtz V, Paul S, Andreatta M, Marcatili P, Peters B, Nielsen M. NetMHCpan-4.0: Improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J Immunol (2017) 199(9):3360–8. doi: 10.4049/jimmunol.1700893

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Thomsen MC, Nielsen M. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res (2012) 40(Web Server issue):W281–7. doi: 10.1093/nar/gks469

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Missale G, Redeker A, Person J, Fowler P, Guilhot S, Schlicht HJ, et al. HLA-A31- and HLA-Aw68-restricted cytotoxic T cell responses to a single hepatitis b virus nucleocapsid epitope during acute viral hepatitis. J Exp Med (1993) 177(3):751–62. doi: 10.1084/jem.177.3.751

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Wu Y, Ding Y, Shen C. A systematic review of T cell epitopes defined from the proteome of hepatitis b virus. Vaccines (Basel) (2022) 10(2):257. doi: 10.3390/vaccines10020257

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Perez-Riverol Y, Bai J, Bandla C, García-Seisdedos D, Hewapathirana S, Kamatchinathan S, et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res (2022) 50(D1):D543–d552. doi: 10.1093/nar/gkab1038

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: proteogenomics, Hepatitis B, hepatocellular carcinoma, immunopeptidomics, human leukocyte antigen (HLA)

Citation: Srivastava M, Copin R, Choy A, Zhou A, Olsen O, Wolf S, Shah D, Rye-Weller A, Chen L, Chan N, Coppola A, Lanza K, Negron N, Ni M, Atwal GS, Kyratsous CA, Olson W and Salzler R (2022) Proteogenomic identification of Hepatitis B virus (HBV) genotype-specific HLA-I restricted peptides from HBV-positive patient liver tissues. Front. Immunol. 13:1032716. doi: 10.3389/fimmu.2022.1032716

Received: 31 August 2022; Accepted: 22 November 2022;
Published: 13 December 2022.

Edited by:

Ning Wei, Albert Einstein College of Medicine, United States

Reviewed by:

Anthony Wayne Purcell, Monash University, Australia
Nicola Ternette, University of Oxford, United Kingdom

Copyright © 2022 Srivastava, Copin, Choy, Zhou, Olsen, Wolf, Shah, Rye-Weller, Chen, Chan, Coppola, Lanza, Negron, Ni, Atwal, Kyratsous, Olson and Salzler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Mayank Srivastava, bWF5YW5rLnNyaXZhc3RhdmFAcmVnZW5lcm9uLmNvbQ==; Robert Salzler, cm9iZXJ0LnNhbHpsZXJAcmVnZW5lcm9uLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.