Heterozygote advantage at HLA class I and II loci and reduced risk of colorectal cancer

Objective Reduced diversity at Human Leukocyte Antigen (HLA) loci may adversely affect the host’s ability to recognize tumor neoantigens and subsequently increase disease burden. We hypothesized that increased heterozygosity at HLA loci is associated with a reduced risk of developing colorectal cancer (CRC). Methods We imputed HLA class I and II four-digit alleles using genotype data from a population-based study of 5,406 cases and 4,635 controls from the Molecular Epidemiology of Colorectal Cancer Study (MECC). Heterozygosity at each HLA locus and the number of heterozygous genotypes at HLA class -I (A, B, and C) and HLA class -II loci (DQB1, DRB1, and DPB1) were quantified. Logistic regression analysis was used to estimate the risk of CRC associated with HLA heterozygosity. Individuals with homozygous genotypes for all loci served as the reference category, and the analyses were adjusted for sex, age, genotyping platform, and ancestry. Further, we investigated associations between HLA diversity and tumor-associated T cell repertoire features, as measured by tumor infiltrating lymphocytes (TILs; N=2,839) and immunosequencing (N=2,357). Results Individuals with all heterozygous genotypes at all three class I genes had a reduced odds of CRC (OR: 0.74; 95% CI: 0.56-0.97, p= 0.031). A similar association was observed for class II loci, with an OR of 0.75 (95% CI: 0.60-0.95, p= 0.016). For class-I and class-II combined, individuals with all heterozygous genotypes had significantly lower odds of developing CRC (OR: 0.66, 95% CI: 0.49-0.87, p= 0.004) than those with 0 or one heterozygous genotype. HLA class I and/or II diversity was associated with higher T cell receptor (TCR) abundance and lower TCR clonality, but results were not statistically significant. Conclusion Our findings support a heterozygote advantage for the HLA class-I and -II loci, indicating an important role for HLA genetic variability in the etiology of CRC.


Introduction
Human leukocyte antigen (HLA) class I and II loci play an important role in adaptive and innate immunity.HLA class I (HLA-I) presents foreign antigens to cytotoxic T cells, and HLA class II (HLA-II) stimulates antibody production in response to specific antigens.Individuals who carry heterozygous genotypes at the HLA genes are able to display a greater variety of antigenic peptides than those with homozygous genotypes at the HLA genes, resulting in an immune response to a broader range of antigens (1).The heterozygote advantage, proposed by Doherty and Zinkernagel (2), suggests that individuals with greater diversity (i.e. more heterozygous genotypes) at HLA genes have better fitness by presenting a broader set of tumor antigens for T cells recognition.This phenomenon has been observed in infectious diseases and cancers associated with viral infections, such as progression of AIDS (3), HBV-associated hepatocellular carcinoma (4)(5)(6), inflammatory bowel diseases (7), and non-Hodgkin's lymphoma (NHL) (8).For example, patients who carry heterozygous HLA class I alleles appear to have slower progression from HIV to AIDS, while subjects with heterozygotes at HLA class II loci have greater ability to clear hepatitis B virus (HBV) and hepatitis C virus (HCV) infections.
Colorectal cancer (CRC) is a complex disease, involving a series of genetic events, immune responses, and exogenous factors.Disease progression and responses to immunotherapies may vary by stage at diagnosis, microsatellite instability status, tumor mutation burden (TMB), and other factors in the tumors (9).HLA, especially HLA-I, is frequently lost in CRC tumors, resulting in tumor immune escape from cytotoxic T cells during the cancer development and progression (10).Although T cellmediated immunotherapies have appeared as a promising regimen for several cancers including CRC, only a subset of patients are responsive to treatment.A recent study by Chowell et al. showed that germline diversity of HLA class I alleles is associated with better response to checkpoint blockade immunotherapy in patients with melanoma and non-small cell lung cancer (11,12).These findings provide further evidence in support of the HLA heterozygote advantage hypothesis, where HLA-heterozygous individuals present a broader immunopeptidome for recognition by cytotoxic T cells.
Given the growing evidence of the importance of HLA diversity in tumor development, progression and immunotherapy response, we proposed that increased diversity in germline HLA class I and/or II loci is associated with reduced risk of developing CRC.Heterozygous HLA genotypes may facilitate the presentation of a broader set of tumor antigens in a greater range of contexts for T cell recognition, thereby leading to early tumor elimination and reduction of CRC risk (Figure 1).HLA genotype, although not modifiable, may be beneficial as a biomarker to be incorporated into risk-stratified screening guidelines.To test this hypothesis, we conducted the largest population-based study to date to examine the association between diversity in HLA class I and class II loci, measured by heterozygosity, and the risk of developing CRC.

Study population
The Molecular Epidemiology of Colorectal Cancer Study (MECC) is a population-based study of incident CRC cases and healthy controls recruited in northern Israel from 1998 through 2017.Cases include those with invasive colorectal adenocarcinoma.Controls are participants without prior history of CRC selected from the same source population as cases and with individual matching on age, gender, Jewish ethnicity, and usual clinic location.A detailed description of the study population has been described elsewhere (13).Baseline demographic and clinical characteristics of the MECC subjects contributing to this study are described in Table 1.

Genotyping and quality control
Germline DNA samples from 10,041 MECC subjects (5,406 CRC cases and 4,635 controls) were genotyped using three genotyping platforms.485 cases and 498 controls were genotyped in two batches using Illumina HumanOmni2.5 chips, measuring approximately 2.3 million SNPs (14).Batch 1 (384 cases and 143 controls) was genotyped at the Case Western Reserve University, and batch 2 (101 cases and 355 controls) was genotyped at the University of Michigan.1,155 cases and 1,117 controls were genotyped using a custom Affymetrix Axiom genome-wide platform measuring 1.2 million SNPs (15).3,768 cases and 3,028 controls were genotyped using a custom Illumina OncoArray chip measuring 495K SNPs (genome-wide backbone and known cancer susceptibility loci) (16).All genotype data were cleaned by platform using common quality control metrics at the individual level and SNP levels described previously (14)(15)(16).After QC, a total of 10,041 subjects including 5,406 cases and 4,635 controls were included.
Principal component analysis (PCA) was performed using PLINK 1.9 on directly genotyped SNPs shared across the four genotyping panels: Illumina Omni2.5, Affymetrix Axiom, Illumina Custom OncoArray, and Illumina Infinium OncoArray-500K.After LD pruning (r 2 >0.2), removing SNPs with minor allele frequency (MAF) < 0.01, and SNPs with PC1 and PC2 loading >4.0, 55,852 autosomal SNPs were retained for PCA.Due to possible residual population substructure, the first 5 principal components for global ancestry were included in association analyses.

HLA genotype imputation
Of the directly genotyped 55,852 SNPs, we performed imputation for the HLA region using 7,727 SNPs on chromosome 6.HLA class I (HLA-A, HLA-B, and HLA-C) and class II (HLA-DQA1, HLA-DQB1, HLA-DRB1, HLA-DPA1, and HLA-DPB1) loci were imputed using SNP2HLA and a reference panel of 5,225 unrelated individuals from the Type 1 Diabetes Genetics Consortium (17).In summary, 278 classical HLA alleles (2-and 4-digit resolution) were successfully imputed with information score r 2 > 0.3 and available for analysis.Due to the strong linkage disequilibrium between class II A1 and B1 loci, we present results only for each of the B1 loci (HLA-DQB1, HLA-DRB1, and HLA-DPB1).
Heterozygosity and homozygosity at each HLA locus and the number of heterozygous genotypes at class I loci (A, B, C) and class II loci (DQB1, DRB1, DPB1) were quantified using the imputed 4-digit resolution alleles.For each HLA locus, individuals were coded as homozygous (for any allele) or heterozygous, determined from the imputed alleles based on 4digit resolution.To examine the joint effect of class I and class II loci, we combined the total number of heterozygotes for all loci and categorized into 3 groups: 0 to 1 heterozygote at all loci, 2 to 5 heterozygotes, or 6 heterozygotes.Subjects with 0 or 1 heterozygous genotype for all loci were used as reference group for subsequent analyses.

HLA diversity and T cell features
To investigate the associations between germline HLA diversity and T cell features in tumors, we examined 2,839 patients of the 5,406 CRC cases who underwent pathology review for quantification of tumor infiltrating lymphocytes (TILs).Tumors were classified into two groups (TILs/high power field (hpf)>=2 or TILs/hpf <2) (18).Another subset of 2,357 patients from the 5,406 MECC cases with sufficient DNA macrodissected from bulk colorectal tumor tissues (2,335 formalin-fixed paraffin embedded and 22 frozen tissues) underwent survey-level T cell receptor (TCR) immunosequencing using immunoSEQ (Adaptive Biotechnologies, Seattle, WA; assay versions V2 and V4).ImmunoSEQ utilizes a multiplex PCR system to amplify hypervariable complementarity determining region 3b (CDR3b) sequences of the TRB gene (T cell receptor beta locus; https://www.genenames.org/data/gene-symbolreport/#!/hgnc_id/HGNC:12155).The number of unique rearrangements identified from our tissue samples ranged from 4 to 6,209.Detailed methods were reported elsewhere (19).In summary, TCR abundance (i.e.fraction_productive_of_cells) was estimated using the normalized number of productive TRB reads divided by the estimate of the total number of cells.We performed log transformation of TCR abundance due to the right-skewed distributions of the raw data.Productive simpson clonality (i.e.TCR clonality) was calculated as the square root of Simpson's diversity index for all productive rearrangements for each sample.This metric scored TCR clonality from high to low, with high clonality indicating few unique clones and low clonality indicating a diverse T cell repertoire.Because two versions of the immunoSEQ assay, V2 (N=1,024) and V4 (N=1,333) were used, we calculated a ztransformation for TCR clonality for each sample based on the distribution of samples run on the same assay version.

Statistical analysis
Unconditional logistic regression was used to estimate the association between HLA locus heterozygosity and CRC.Odds ratios (OR) and 95% confidence intervals (CIs) were calculated.Individuals with homozygous genotypes for each locus served as the reference category, and analyses were adjusted for sex, age, genotyping platform, and global ancestry (PC1-PC5).P values for trend tests were calculated by modeling the number of heterozygotes as an ordinal variable in the logistic regression, with individuals with homozygous genotypes at all loci as the reference group.To further evaluate the importance of microsatellite instability (MSI) in relation to HLA diversity, we stratified CRC cases by the MSI status of tumors (MSI-High (MSI-H) or microsatellite stable (MSS)) and compared with all controls to evaluate the associations between HLA diversity and CRC with or without this tumor molecular feature.Linear regression was used to evaluate the associations between HLA heterozygosity and each quantitative immunosequencing variable (TCR clonality and abundance).Logistic regression was conducted to examine the association between HLA diversity and pathology-based TILs.All regression models were adjusted for the factors listed above.All statistical analyses were performed using SAS 9.4 (SAS Institute).
All tests of statistical significance were two-sided.

Results
Table 1 shows the distribution of demographic and clinical characteristics in the 5,406 CRC cases and 4,635 controls from the MECC study.On average, cases were 68 years old, which was similar to the mean age of 71 in the controls.Our study population comprised 52% males and 48% females, and around 58% are of Ashkenazi Jewish descent.A detailed description of the study population has been published elsewhere previously (13).
There were no significant associations between each of the HLA class I loci and CRC risk.Approximately 90% of the subjects were heterozygous for HLA-A, HLA-B, and HLA-C, respectively.However, in the joint analysis of all three HLA class I loci (HLA-A, -B, and -C) together, there was a 26% reduction in the odds of developing CRC for subjects with heterozygous genotypes at all 3 loci when compared to those with all homozygous genotypes (OR: 0.74, 95% Confidence Interval (CI): 0.56-0.97,p=0.0314;Table 2).However, no statistically significant linear trend was identified between the number of heterozygous HLA class I genotypes and CRC risk (p trend = 0. 9168).
Similarly, there were no significant associations between each of the HLA class II loci and CRC risk.Roughly 91%, 85% and 74% of the subjects were heterozygous for HLA-DRB1, HLA-DQB1, and HLA-DPB1, respectively.However, joint analyses suggested a 25% decreased odds of developing CRC for subjects with 3 heterozygotes at HLA-II loci as compared to those with all homozygotes at class II loci (OR: 0.75, 95% CI: 0.60-0.95,p=0.0155, p trend = 0.2328; Table 2).
In joint analyses for class I and II loci, individuals with 2 to 5 heterozygous genotypes at HLA class I or II loci had significantly decreased odds of developing CRC (OR: 0.61, 95% CI: 0.46-0.82,p=0.0008) when compared to those with 0 or one heterozygous genotype.Moreover, individuals with all heterozygous genotypes at class I or II loci were at significantly lower odds of developing CRC (OR: 0.66, 95% CI: 0.49-0.87,p=0.0038,Table 2), when compared to those with no or one heterozygous genotype.We did not observe a significant linear dose-response relationship in the evaluation of the association between the number of heterozygous genotypes and CRC (p trend = 0.9077).

Stratified analyses by MSI
When restricting to 2,857 cases with microsatellite stable tumors, we observed a 34% decreased odds of developing CRC for individuals with heterozygous genotypes at all 3 HLA class I loci (OR: 0.66, 95% CI: 0.49-0.90,p=0.0094,Table 3).Subjects with 3 heterozygotes at HLA class II loci combined were at a reduced odds of CRC when compared to those with all homozygous HLA class II genotypes (OR: 0.67, 95% CI: 0.51-0.86,p=0.0022, p trend = 0.0305).In the joint analysis for HLA class I and II loci, there was a 42% decreased odds of developing CRC among individuals with six heterozygous genotypes when compared to those with zero or one heterozygous genotypes (OR: 0.58, 95% CI: 0.42-0.80,p=0.0009, p trend = 0.5767).
Analyses on a smaller subset of 621 cases with MSI tumors were conducted, and similar strengths and directions of associations between zygosity of HLA class I and II loci and CRC risk were observed as in the overall analysis with all cases.However, the results did not reach statistical significance, potentially due to the smaller sample sizes (Table 3).

HLA heterozygosity and T cell features
To further examine if germline HLA diversity is associated with tumor T cell landscapes, we conducted analyses limited to a subset of 2,357 CRC patients who underwent immunosequencing and 2,839 CRC patients with pathology-based TIL scoring.We found that patients with more heterozygote HLA class I and/or II genotypes generally displayed lower clonality (higher diversity) and higher abundance in their tumor T cell repertoires (Supplementary Tables 1 and 2).Similarly, patients with more heterozygotes in class I and II alleles were likely to manifest higher TILs in their tumors (Supplementary Table 3).Having a heterozygous genotype at DRB1 or DQB1 was significantly associated with higher TILs (OR: 1.74 and 1.44, p=0.003 and 0.0024, respectively).The presence of either 2 or 3 heterozygous genotypes for class II alleles was marginally associated with higher TILs (p=0.0650 and 0.0789, respectively; p trend = 0.0161).Results also suggested that CRC patients with all 6 heterozygous genotypes at class I and II were likely to have higher TILs (p=0.1006).However, these results did not reach statistical significance.

Discussion
Using the largest dataset of imputed HLA genotypes in CRC to date, we demonstrated that germline genetic diversity at HLA class I and II loci is associated with reduced CRC risk.Individuals with more diverse genotypes (more heterozygotes) in HLA class I and/or II loci bear a reduced risk of developing CRC when compared to those with less diverse genotypes (homozygotes at all HLA class I and/or II loci).This increase in HLA diversity remained statistically significantly associated with reduced risk of CRC even when we restricted to MSS CRC cases.
It has been suggested that pathogen-mediated selection is the driving force maintaining diversity at the HLA loci.Heterozygote advantage, one of the proposed mechanisms of pathogen-mediated selection, was originally described in infectious and autoimmune diseases, such as individuals with AIDS (3,20), HBV and HCV infections (21), inflammatory bowel disease (7), and psoriatic arthritis (22).Individuals who are heterozygous at HLA loci are able to respond to a broader range of pathogen peptides than those who are homozygous, therefore sustaining efficient immune responses against a larger variety of pathogens (1).
A diverse germline HLA genotype can also affect tumor surveillance and shape the cancer genome.Marty at al. showed that individuals' HLA class I and class II genotypes together impact the oncogenic mutational landscape in a complementary manner (23, 24).Using The Cancer Genome Atlas (TCGA), they developed a predictive tool based on HLA class I genotypes to evaluate the  Together, these results demonstrated that HLA class I and II genotypes are both involved in the immunoediting process in carcinogenesis by establishing the patterns of immune escape from both CD8 + and CD4 + T-cell responses in a complementary way.Similarly, rather than identifying specific HLA allelic associations in CRC, we showed that the combined total number of heterozygous genotypes at HLA class I and/or II is associated with decreased CRC risk, suggesting a potential complementary effect of HLA class I and class II diversity.Of note, because we did not observe a dose-response relationship between the number of heterozygous genotypes in HLA loci and CRC, it is likely that the association between HLA diversity and CRC risk follows a nonlinear trend.Heterozygote advantage for HLA class I and/or II in relation to cancer was first demonstrated in non-Hodgkin's lymphoma (NHL), with distinct associations between class I and/or II homozygosity in different subtypes of non-Hodgkin's lymphoma (8).While risk of diffuse large B-cell lymphoma and marginal zone lymphoma were increased with homozygosity of class I HLA-B and HLA-C loci and the class-II HLA-DRB1 locus, follicular lymphoma risk was associated with the increase in homozygous loci for HLA class II genes.The authors suggested a potential role for HLA zygosity in NHL etiology, and distinct immune pathways for the different NHL subtypes.Unlike NHL or other virus-associated diseases, we did not observe zygosity of any specific HLA alleles to be associated with risk of developing CRC.It is possible that the specific HLA allelic association simply was not observed in this population, or that non-exclusive HLA class-I and -II loci play an important role in CRC etiology.We did not observe individual 4-digit alleles to be statistically significantly associated with odds of developing CRC in our study population after multiple testing correction, potentially attributable to low power (data not shown).However, we demonstrated that HLA diversity in HLA class I and II loci combined is associated with reduced CRC risk.Our results, in line with the previous findings from Marty et al., implied that complementary functions of heterozygous HLA class I and/or II alleles in the antigen presenting machinery are more relevant for CRC than individual 4-digit alleles in either class.
With the growing body of literature, more research focused on germline HLA zygosity in cancer etiology has emerged.A recent study examined 17,405 cancer patients diagnosed with non-virus related solid tumors and 11,448 controls and found no evidence of an association between number of homozygotes at HLA class I, class II or class I and II loci and overall cancer risk (25).Colon cancer was one of the twelve non-virus associated tumors evaluated.However, the study included only 82 colon cancer patients and reported no significant association specific to this cancer type.Another pancancer analysis on HLA diversity and risk of 25 cancers using UK biobank has shown that the diversity of HLA class II is associated with a lower risk of lung cancer, head and neck cancer, and non-Hodgkin lymphoma (26).They also identified protective effects of HLA diversity in pathological subtypes with higher mutation burden, such as lung squamous cell carcinoma and diffuse large B cell lymphoma.However, there was no significant association observed for class I or class II diversity and risk for colon and rectal cancer among 3,232 colon and 2,035 rectal cancer patients.Our study is the first and largest population-based study with HLA imputed genotypes to investigate the role of HLA diversity in risk for developing CRC.With the large sample size, we were able to examine HLA diversity and CRC risk stratified by the MSI status of tumors, given that tumor escape mechanisms differ by MSI status.Individuals who have 6 heterozygous genotypes at HLA class I and In our analysis of a subset of CRC patients on HLA heterozygosity and tumor T cell features, we found that having more heterozygotes in germline HLA class I and/or class II alleles may be associated with a more diverse T cell receptor repertoire, higher TCR abundance and tumor infiltrating lymphocytes in tumors.Although the results were not statistically significant, possibly due to smaller sample sizes or a relatively weak effect, the directions of associations are in line with our hypothesis where germline HLA diversity affects tumor immune surveillance, and consequently, reduces CRC risk (Figure 1).Nonetheless, studies with larger sample sizes are warranted to provide more support on the relationship between HLA diversity and T cell landscapes in CRC.
Having a diverse HLA system may also be relevant to response to immune checkpoint inhibitor (ICI) treatments.Several recent studies have focused on HLA class I diversity and the response to immune checkpoint inhibitors.HLA class-I diversity measured by HLA evolutionary divergence (HED) has been associated with better response to immunotherapy in advanced-stage melanoma or non-small cell lung cancer (11).Other studies in gastrointestinal (including CRC), non-small cell lung and kidney cancer have observed similar results where cancer patients with higher HLA class-I divergence are more responsive to ICI treatments (27)(28)(29).However, a recent meta-analysis with more than 1,000 patients underwent ICI across seven tumor types failed to report HLA evolutionary divergence as a predictor for ICI treatment (30).Very few CRC patients in our study received ICI because they were recruited by 2017; therefore, we were unable to examine the association between HLA diversity and response to immunotherapy.More research is needed to understand the implications of HLA diversity to ICI response in CRC.Further, expanded datasets are needed to powerfully examine the treatmentindependent prognostic relevance of HLA diversity.
Our study is the largest population-based study to examine HLA diversity and CRC risk to date.It revealed for the first time that heterozygosity at HLA class I and/or II loci confers decreased risk for CRC.These findings highlight the potential role of germline HLA diversity in CRC susceptibility.Our study had a few key limitations.First, misclassification of HLA alleles from imputation is possible.However, other studies using four-digit HLA genotype data imputed from SNP2HLA have shown that the concordance rate between imputed and directly genotyped HLA data is greater than 95% in Caucasian populations using T1DGC as a reference population (17).Second, because our study subjects were mostly of European ancestry, these results cannot necessarily be generalized to other ancestral groups.Studies from other racial or ethnic groups are warranted to investigate the role of HLA diversity in the biologic mechanisms for CRC in different populations.Third, TCR clonality metrics can be skewed or more difficult to interpret when there is a lower number of rearrangements; however, productive Simpson clonality is nonetheless a valuable starting to point to begin examining the relationship between HLA diversity and tumorassociated T cell responses.Finally, larger datasets will be needed to analyze the zygosity of specific HLA alleles associated with CRC risk.
In summary, our findings support a heterozygote advantage for HLA class I and II loci in reducing CRC susceptibility.This underscores an important role for germline HLA genetic variability in the etiology of CRC, potentially operating through a mechanism of increased diversity of tumor neoantigens that can be displayed to the adaptive immune system.

FIGURE 1
FIGURE 1Diversity of HLA genotypes and its potential role in the development of colorectal cancer.

TABLE 1
Descriptive table for the Molecular Epidemiology of Colorectal Cancer (MECC) study population.

TABLE 3
Association of heterozygosity at HLA Class I and Class II loci and susceptibility to colorectal cancer stratified by microsatellite instability status of CRC cases.
+ T cells, are also important for immunoediting in early cancer development (24).Having a stronger selective pressure on driver mutations in tumors, HLA class II plays an important role in early cancer development.

TABLE 3 Continued
reduced odds of developing CRC regardless of MSI status.Slightly stronger associations between HLA classes I and/or II zygosity and CRC risk were observed in participants with MSS tumors than the overall case set, indicating that our initial observations were not driven by MSI-H tumors.Findings similar to the MSS group were observed in MSI group.However, the associations did not reach statistical significance; this may be due to small sample size (MSI-H tumors are 11.27% of the full case dataset).Additional studies with larger numbers of patients with MSI tumors are needed to replicate the results.