The rare DRB1*04:08-DQ8 haplotype is the main HLA class II genetic driver and discriminative factor of Early-onset Type 1 diabetes in the Portuguese population

Introduction Early-onset Type 1 diabetes (EOT1D) is considered a disease subtype with distinctive immunological and clinical features. While both Human Leukocyte Antigen (HLA) and non-HLA variants contribute to age at T1D diagnosis, detailed analyses of EOT1D-specific genetic determinants are still lacking. This study scrutinized the involvement of the HLA class II locus in EOT1D genetic control. Methods We conducted genetic association and regularized logistic regression analyses to evaluate genotypic, haplotypic and allelic variants in DRB1, DQA1 and DQB1 genes in children with EOT1D (diagnosed at ≤5 years of age; n=97), individuals with later-onset disease (LaOT1D; diagnosed 8-30 years of age; n=96) and nondiabetic control subjects (n=169), in the Portuguese population. Results Allelic association analysis of EOT1D and LaOT1D unrelated patients in comparison with controls, revealed that the rare DRB1*04:08 allele is a distinctive EOT1D susceptibility factor (corrected p-value=7.0x10-7). Conversely, the classical T1D risk allele DRB1*04:05 was absent in EOT1D children while was associated with LaOT1D (corrected p-value=1.4x10-2). In corroboration, HLA class II haplotype analysis showed that the rare DRB1*04:08-DQ8 haplotype is specifically associated with EOT1D (corrected p-value=1.4x10-5) and represents the major HLA class II genetic driver and discriminative factor in the development of early onset disease. Discussion This study uncovered that EOT1D holds a distinctive spectrum of HLA class II susceptibility loci, which includes risk factors overlapping with LaOT1D and discriminative genetic configurations. These findings warrant replication studies in larger multicentric settings encompassing other ethnicities and may impact target screening strategies and follow-up of young children with high T1D genetic risk as well as personalized therapeutic approaches.


Introduction
Type 1 Diabetes (T1D) is a multifactorial disease with a strong genetic component that results from the immune-mediated destruction of insulin-producing pancreatic beta cells, leading to lifelong insulin dependency.T1D predominantly manifests in childhood and typically presents with a peak of incidence near or at adolescence (10 to 14 years of age) (1,2).
T1D susceptibility conferred by distinct HLA alleles has been proposed to result from variation in amino acid residues at specific positions within the peptide-binding groove, which may affect the set of antigenic peptides presented to T cells.Hence, the presence of non-aspartate residues at position 57 of DQB1 and arginine at position 52 of DQA1 was found to confer strong susceptibility to T1D (7,8).More recently, it has been established that two positions in DRB1 (13 and 71) together with position 57 of DQB1 capture more than 90% of the phenotypic variance attributed to the HLA locus (9), implicating the P4 and P9 pockets in the antigen-binding groove of DRB1 and DQB1, respectively, in T1D risk.In addition to HLA, more than 57 loci located outside of this region were found to contribute to T1D risk with relatively modest effects (10).Interestingly, several of these non-HLA loci have been suggested to influence immune cells or pancreatic beta-cell functions (11).
Recent observations support the notion that T1D is a heterogeneous clinical entity composed of distinct disease subtypes that are distinguished by different pancreatic immunophenotypes and increased clinical severity in patients with younger age at diagnosis (12)(13)(14)(15).Accordingly, children diagnosed with T1D under the age of 7 usually present with a more aggressive form of insulitis, characterized by both T and B cell infiltrates whereas children diagnosed at 13 years of age or later, tend to show milder insulitis with predominant T cell infiltration (12).Moreover, children diagnosed under the age of 5 years exhibit reduced residual beta-cell mass at diagnosis and sharply decreased insulin production shortly after disease onset, which is associated with severe metabolic decompensation (13)(14)(15).This early-onset disease subtype poses a significant clinical challenge, as these patients are at a higher risk of long-term diabetic complications compared to those with disease onset at older ages (16).
In search for HLA variants that discriminate susceptibility to early-onset T1D (EOT1D), we analyzed HLA class II polymorphisms in a collection of Portuguese children with diagnosis at ≤5 years of age in comparison with individuals diagnosed between 8 and 30 years of age (Later-onset T1D; LaOT1D).Our results revealed that the genetic susceptibility to EOT1D conferred by the HLA class II locus comprised risk factors that overlap with LaOT1D susceptibility and private genetic configurations that mainly pertain to DRB1 gene variants.Molecular and mechanistic studies are warranted to uncover the role of these genetic factors on age at T1D onset, and may provide novel tools to improve risk prediction, earlier diagnosis, and targeted preventive interventions in individuals at higher risk of developing EOT1D.

Ethics
Ethical permissions for this study were obtained from the Ethics Committee from Hospital de Dona Estefania (HDE; Comissão de E ́tica para a Saude, Centro Hospital de Lisboa Central, #318/2016), and the Ethics Committee from Associacão Protetora dos Diabeticos de Portugal (APDP).All procedures in this study were in accordance with National and European regulations, including the Helsinki Declaration.

Statistical analysis
Allelic, haplotypic and genotypic association tests as well as Odds-Ratio calculations were performed with Plink software package (v1.07), using the BCGene user interface.In this analysis, 8 genotypes; 19 DRB1, 9 DQA1 and 12 DQB1 alleles; and 18 haplotypes were independently evaluated.P-values were computed using the Fisher's exact test and multi-comparison analyses were corrected with the Holm-Bonferroni method.
To estimate the predictive power of HLA class II haplotypes, regularized logistic regression was performed using scikit-learn python package v1.1 (30).This modeling approach was executed in a cross-validated fashion, where the dataset was partitioned into five blocks.In this process, 80% of the data, corresponding to four blocks, was used for training the model, and the remaining 20%, or one block, was reserved for evaluating model performance.The robustness of the fitting was assessed by reiteration of this process.

Contribution of HLA class II genotype classes to disease susceptibility in EOT1D and LaOT1D
We used a two-way case-control study design to analyze HLA class II locus variants in 97 unrelated Portuguese subjects with Early-onset T1D (EOT1D, age at diagnosis 0-5 years) and 96 unrelated Portuguese subjects with disease onset after 7 years of age (LaOT1D, age at diagnosis 8-30 years), using as controls 169 non-diabetic individuals, representative of the Portuguese population (Table 1).
The clinical manifestation of T1D in preschool children is characterized as more severe when compared to those who develop the disease later in childhood or in adulthood (15,31,32).Accordingly, we found evidence that EOT1D patients had lower fasting c-peptide at diagnosis (Supplementary Table 1) denoting diminished insulin secretion capacity, presumably due to extensive pancreatic beta-cell autoimmune destruction.Moreover, despite presenting with a lower percentage of glycated hemoglobin at diagnosis, EOT1D patients showed worse metabolic control under insulin treatment, as indicated by the lower proportion of individuals with glycated hemoglobin below 7.5% one year after diagnosis (EOT1D, 19.4% versus LaOT1D, 44.4%; Supplementary Table 1).Consistent with prior studies (15,24,32), EOT1D patients also displayed heightened humoral reactivity against insulin at diagnosis when compared to LaOT1D subjects (anti-insulin antibodies at diagnosis present in 74.1% versus 36.8%,respectively; Supplementary Table 1).Collectively, these data indicate that the EOT1D cohort followed the typical T1D clinical phenotype observed in preschool children, and differed from the LaOT1D cohort.
In our cohorts, we observed that 91/97 EOT1D (93.8%We found that disease risk conferred by DR3 and DR4 susceptibility genotypes was not significantly different in EOT1D and LaOT1D cohorts (Figure 1A and Supplementary Table 2).Likewise, HLA class II genotypes not containing DR3 or DR4 (X/X) had comparable protective effects in the two disease groups (Figure 1A and Supplementary Table 2).As expected, association analysis detected a significantly increased frequency of DR3 and DR4 genotypes in EOT1D and LaOT1D patients, in comparison with non-diabetic controls (Figure 1B and Supplementary Table 2) and confirmed that the frequency of risk and protective genotype classes was not significantly different in EOT1D in comparison with LaOT1D subjects (Supplementary Figure 1).Moreover, the distribution of the different genotype classes was not significantly  different in EOT1D when compared with LaOT1D, while clearly being so when the 3 groups were compared together (Figure 1C).Nonetheless, in line with previous studies (15, 23-

Private and shared HLA class II alleles associated with EOT1D and LaOT1D
To scrutinize the contributions of different HLA class II genes in disease susceptibility, we analyzed the genetic risk conferred by DRB1, DQA1 and DQB1 alleles with a frequency ≥2.5% in the EOT1D and LaOT1D cohorts or in control subjects.Most risk alleles had comparable genetic effects in EOT1D and LaOT1D cohorts (Figure 2A and Supplementary Table 3).Interestingly, the T1D risk allele DRB1*04:05 was absent in EOT1D subjects, suggesting it may not significantly contribute to the development of EOT1D (Figure 2A and Supplementary Table 3).We also noted that the DQA1*05:05 and DQB1*02:02 protective alleles were under-represented in EOT1D when compared to LaOT1D subjects, suggesting that disease protection conferred by these alleles is particularly relevant in EOT1D (Supplementary Figure 2 and Supplementary Table 4).
Strikingly, the DRB1*04:08 allele was significantly associated with EOT1D but not with LaOT1D (Figure 2B and Supplementary Table 3).Consistent with earlier findings in Portuguese (35,36) and Spanish cohorts (37), the DRB1*04:08 allele was found at a remarkably low frequency in nondiabetic controls (0.3%, Supplementary Table 3).Conversely, we found that the classical T1D risk allele DRB1*04:05 was associated with LaOT1D while it was absent in the EOT1D cohort.Although the absence of the DRB1*04:05 allele in EOT1D patients might appear surprising, it aligns with findings from a previous study involving Caucasian children diagnosed with T1D before the age of five (13).Therefore, we next directly compared allelic frequencies in EOT1D and LaOT1D subjects.This analysis corroborated that the DRB1*04:05 risk allele as well as the DQA1*05:05 and DQB1*02:02 protective alleles have a significantly higher frequency in LaOT1D when compared to EOT1D (Figure 2C and Supplementary Figure 2C, Supplementary Tables 3, 4).However, no difference in the frequency of  2C).Notably, we found that the rare allele DRB1*04:08, while not associated with LaOT1D, was present in 20 of the 97 EOT1D subjects and conferred the highest HLA class II allelic risk to EOT1D (Figure 2 and Supplementary Table 3).T1D susceptibility conferred by distinct HLA alleles may arise from variations in amino acid residues at specific positions within the peptide-binding groove.We thus performed amino acid sequence alignment of the DRB1*04 alleles analyzed in this study (Supplementary Figure 3) and found that the presence of serine (S) instead of aspartic acid (D) at amino acid position 57 was the only difference between the LaOT1D-specific DRB1*04:05 and the EOT1D-associated DRB1*04:08 risk alleles.While D and S are both polar amino acids, D is negatively charged whereas S is uncharged.This distinction in charge within pocket 9 may influence both the nature and the affinity of diabetogenic peptides binding to the MHC molecules encoded by these two alleles.
Previous studies have also established that DQ-susceptible alleles code for arginine (R) residue at position 52 of the DQA1 molecule and are negative for D at the DQB1 position 57 (7,8).Consistent with these findings, we found that the presence of R in DQA1 position 52 conferred significant risk, whereas its substitution by histidine or serine was protective in EOT1D or in both EOT1D and LaOT1D, respectively (Supplementary Table 5).Moreover, while D at position 57 of DQB1 was highly protective, the presence of alanine (A) was associated with significant risk in both cohorts (Supplementary Table 6).The representation of susceptible and protective alleles defined by these amino acid residues was not different in EOT1D when compared to LaOT1D (Supplementary Figure 4).Together, these findings

EOT1D distinctive HLA class II haplotypes
Haplotype reconstruction identified 29, 50 and 55 distinct class II haplotypes in EOT1D, LaOT1D and control subjects, respectively.This analysis revealed 3 protective and 3 risk haplotypes significantly associated with EOT1D and LaOT1D (Figures 3A, B and Supplementary Figure 5 and Supplementary Tables 7, 8 5 and Supplementary Table 8).
We also observed that the high-risk DRB1*03:01-DQ2 and DRB1*04:01-DQ8 haplotypes were not differently represented in EOT1D and LaOT1D cohorts (Figure 3C and Supplementary Table 7), suggesting they are shared susceptibility factors in both disease subtypes.Notably, we found that the risk DRB1*04:08 allele exceedingly occurred in the context of the DQ8 haplotype in EOT1D (17/20 subjects), indicating that the DRB1*04:08-DQ8 haplotype is distinctively associated with this disease subtype (Figure 3 and Supplementary Table  It has been reported that the amino acid residues in positions 13 and 71 in the DRB1 molecule and position 57 in DQB1 together capture more than 90% of the phenotypic variance controlled by the HLA locus (9).We found that the A-H-K and A-S-K haplotypes defined by these positions conferred risk in both cohorts (Supplementary Table 9).
Additionally, while the D-S-R haplotype was protective in both cohorts, D-S-E was protective in EOT1D whereas D-G-R and D-R-A conferred protection in LaOT1D only (Supplementary Table 9).Moreover, no differential representation was observed in risk and protective haplotypes defined by these amino acid residues in EOT1D when compared to LaOT1D (Supplementary Figure 6).
In summary, these data revealed that in addition to the classical HLA class II susceptibility haplotypes shared between EOT1D and LaOT1D (DRB1*03:01-DQ2 and DRB1*04:01-DQ8), a specific DR4 allelic variant, DRB1*04:08, mostly occurring in the context of the DQ8 haplotype, is distinctively associated with EOT1D and confers the highest risk among the HLA class II haplotypes represented in this cohort.To estimate the predictive of the DR3 and DR4 risk haplotypes in EOT1D and LaOT1D classification, we performed regularized logistic regression with cross-validation, using all three groups simultaneously (each one versus the other two).This analysis revealed that the combined effect of DR3-DQ2 and DR4-DQ8 haplotypes provided solid discrimination between the 3 groups of subjects (Area Under the Curve, AUC, HC: 0.875 ± 0.035; LaOT1D: 0.725 ± 0.058; EOT1D: 0.829 ± 0.039), with the DRB1*04:08-DQ8 haplotype accounting for the highest genetic difference between EOT1D and the other two groups (LaOT1D and controls).Similarly, the DRB1*04:05-DQ8 haplotype was the main contributor distinguishing LaOT1D from the other two cohorts (Figure 4A).Furthermore, binary logistic regression results (case group versus controls) aligned with the case-control findings above-mentioned.Accordingly, we found that the DRB1*03:01-DQ2 and DRB1*04:01-DQ8 haplotypes are the main risk factors in LaOT1D (Figure 4B), with an associated OR of 8.43 and 8.63, respectively (Supplementary Table 10), whereas the DRB1*04:08-DQ8 haplotype is the predictor variable with the highest impact in EOT1D development (OR 13.62; Figure 4C and Supplementary Table 10).
We next evaluated whether these haplotypes influenced age at disease onset in all T1D subjects enrolled in this study (n=193).We found no significant impact of protective haplotypes as well as of the DR3/DR4 genotype on age at disease presentation in T1D subjects (Supplementary Figures 7, 8).Importantly, we observed that T1D patients harboring one copy of the DRB1*04:05-DQ8 risk haplotype were on average 13.7 years of age at T1D onset whereas the age at diagnosis in individuals with other haplotypes decreased to 8.6 years.Conversely, subjects with one copy of the DRB1*04:08-DQ8 haplotype were on average 3.1 years of age at disease onset while individuals carrying other haplotypes were 9.6 years old (Figure 5).This analysis demonstrated that the discriminative EOT1D and LaOT1D haplotypes significantly influenced the age at T1D presentation, with the DRB1*04:08-DQ8 haplotype decreasing age at onset on average by 6.5 years and the DRB1*04:05-DQ8 haplotype increasing it by 5.1 years.
In sum, while our cohorts of EOT1D and LaOT1D share the expected DQ risk alleles and haplotypes, including DR3-DQ2 and DR4:01-DQ8 (4-6, 35, 37), we revealed the differential impact of two risk DR4 haplotypes, DRB1*04:05-DQ8 and DRB1*04:08-DQ8, and identify the latter as the main single HLA class II discriminative factor and genetic driver in the development of EOT1D, in Portuguese subjects.

Discussion
In this study, we addressed the question of whether patients with EOT1D harbor a distinctive HLA class II genetic spectrum.Compared with patients with later-onset disease and non-diabetic controls, we found that a rare HLA class II haplotype (DRB1*04:08-DQ8) was primarily present in EOT1D and accounted for the main HLA class II  Our study design used an enrollment strategy that targeted subjects who developed T1D at an early age (under 6 years) thereby enriching for genetic susceptibility factors underlying this condition.This approach is in contrast with most studies that evaluated the genetic susceptibility conferred by HLA class II alleles, haplotypes and genotypes within T1D cohorts and families (4-6, 35, 37), where EOT1D subjects are often a minority group.It is thus not surprising that rare alleles and haplotypes, such as DRB1*04:08 and its DRB1*04:08-DQ8 haplotype, may have passed unnoticed in these studies.
In this study, our choice of a healthy control cohort deviates from the gold standard case-control analysis as these individuals are not age-matched with the patient populations (EOT1D and LaOT1D).However, this decision aligns with the study's specific focus on uncovering specific HLA II genetic determinants of EOT1D.By opting for healthy older controls (IQR [35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51] years), representative of the Portuguese population, we aimed to ensure their unlikely development of T1D during the study, making them an ideal group for identifying HLA II alleles and haplotypes associated with T1D susceptibility.One potential limitation of our control selection strategy arises from the known accumulation of somatic mutations with age.These mutations can act as confounding factors in genetic association studies, particularly when de novo variants confer a significant clonal advantage.Somatic mutations in HLA genes have been identified in patients with solid and hematological cancers (38,39); in hematological cancer recipients of allogeneic and haploidentical stem cell transplantation (39, 40); and in patients with aplastic anemia (41).The significance of these mutations lies in their provision of a selective advantage to the mutated clones, enabling them to evade immune surveillance.An increased accumulation of somatic mutations with age can also be found in healthy individuals (a phenomenon designated by age-related clonal hematopoiesis), but these primarily target genes with malignancy implications, such as DNMT3A, TET2, and ASXL1, and increase their carriers´predisposition to hematological malignancies and cardiovascular disease (42)(43)(44)(45).These mutations are, in addition, individually very rare.Consequently, while we do not rule out the possibility of somatic mutations being present, at least in some of our healthy controls, we consider it improbable that these mutations would specifically target HLA class II genes at a frequency substantial enough to significantly impact our association analysis and the conclusions derived from it.
Our findings bear consequences on age at disease onset.Considering the increasing incidence of EOT1D in the last decades and a prevalence of T1D in the general population of 0.4% (46), along with the assumption that EOT1D presently accounts for 25% of all T1D cases, the application of Bayes theorem predicts that one individual with the DRB1*04:08-DQ8 haplotype has a 3.15% probability of developing EOT1D.On the other hand, the chance of developing T1D at an older age (LaOT1D) is roughly 6 times lower (0.54%).Regarding the DRB1*04:05-DQ8 haplotype, a carrier has 1.18% probability of developing LaOT1D and is unlikely to develop EOT1D.
The DRB1*04:08 allele and the DRB1*04:08-DQ8 haplotype were not found associated with T1D in prior cohort analyses of Portuguese and Spanish T1D patients nor in large Caucasian cohorts (4-6, 35, 37).To the best of our knowledge, this allele was only found significantly overrepresented in T1D patients in comparison with controls in eastern Baltic individuals, when in combination with DQB1*03:04 (47).However, the DRB1*04:08 allele has been previously associated with other autoimmune conditions and associated clinical phenotypes, namely anticitrullinated protein antibody-positive and childhood Rheumatoid Arthritis (RA), clinical severity of RA and anti-drug antibody development in Multiple Sclerosis patients under Interferon beta treatment (48-50).These observations raise the possibility the DRB1*04:08 allele impacts the development, age at onset and clinical severity of autoimmunity phenotypes other than T1D.
Previous studies demonstrated T1D patients diagnosed at a young age present with a more restricted range of DR and DQ haplotypes (24,34).Accordingly, we found that the HLA class II haplotypic diversity was significantly more homogeneous in EOT1D (29 haplotypes identified) than LaOT1D (50 haplotypes identified; p-value=8.0x10 - , Fisher´s exact test).This restriction of haplotypic heterogeneity is likely influenced by the decreased number of DRB1 alleles in EOT1D (19 versus 27 alleles in EOT1D and LaOT1D, respectively), as the allelic heterogeneity in DQA1 (7 versus 9 alleles) and DQB1 (13 versus 12 alleles) is rather similar in the two cohorts.It is conceivable that the limited spectrum of DRB1 susceptibility alleles represents a distinctive feature of EOT1D, with probable impact on CD4 T cell repertoire selection in the thymus, including regulatory T cells, as well as in the activation of these cells in the periphery.Particularly relevant in this context will be to evaluate the binding affinity of distinct diabetogenic peptides to DRB1*04:05-DQ8 compared to DRB1*04:08-DQ8, as it is known that the presence of a non-D (S)/D residue at position 57 within pocket 9 determines the peptide anchor residue accommodated in this pocket (acidic versus small aliphatic) (51, 52).

Conclusion
Despite the limited size of the analyzed here, our data suggest EOT1D a clinical entity bearing non-overlapping genetic determinants when compared to LaOT1D.The distinctive HLA class II differences we revealed may be relevant in the implementation of EOT1D screening strategies and personalized therapeutic approaches, such as peptide-based immunotherapy.Multicentric studies with broader ethnic coverage would be helpful in replicating these findings and in identifying additional private EOT1D genetic factors, that may be of use as predictors of age at T1D onset and disease severity.

1
FIGURE 1 Genetic risk conferred by classical T1D-associated HLA class II genotypes in Early-onset T1D (EOT1D; n=97) and Later-onset T1D patients (LaOT1D; n=96).In (A), the mean log -odds ratio (OR) ± 95% CI of HLA class II individual genotypes in case versus controls (n=169) is shown.EOT1D versus control comparisons are represented in red and LaOT1D versus controls in blue.The dashed line represents log -OR=0.In (B), allelic association tests in case versus controls, represented as -log p-value, after Holm-Bonferroni correction are depicted.The dashed line represents -log p-value=0.05.In (C), the proportion of subjects with the indicated DR3 and DR4 genotypic classes in EOT1D and LaOT1D patients as well as controls are shown (C versus LaOT1D versus EOT1D, p-value<1x10 -3 ; LaOT1D versus EOT1D, p-value=5.63x10 - ; by chi-square test).

2
FIGURE 2 Genetic risk conferred by HLA class II alleles in EOT1D and LaOT1D patients.In (A), the mean log-odds ratio (OR) ± 95% CI of individual HLA class II genotypes in cases versus controls is displayed.Comparisons of EOT1D versus controls are highlighted in red, and LaOT1D versus controls are shown in blue.The dashed line represents log-OR=0.In (B), allelic association tests in cases versus controls, presented as -log p-values after Holm-Bonferroni correction, are depicted.The dashed line represents -log p-value=0.05.In (C), allelic association analysis in EOT1D versus LaOT1D patients (purple).The dashed line represents -log p-value=0.05.

7
FIGURE 3 Genetic risk conferred by HLA class II susceptibility haplotypes in EOT1D and LaOT1D patients.In (A), the mean log-odds ratio (OR) ± 95% CI of individual HLA class II genotypes in cases versus controls is illustrated.Comparisons of EOT1D versus controls are highlighted in red, and LaOT1D versus controls are represented in blue.The dashed line represents log -OR=0.In (B), allelic association tests in cases versus controls are presented, depicting -log p-values after Holm-Bonferroni correction.The dashed line represents -log p-value=0.05.In (C), allelic association analysis in EOT1D versus LaOT1D patients (purple).The dashed line represents -log p-value=0.05.

4
FIGURE 4Phenotype predictive power of HLA class II DR3 and DR4 haplotypes in EOT1D and LaOT1D.Coefficients of regularized logistic regression are represented with error bars for each phenotype class.AUC was derived from ROC analysis of one versus the rest in (A), or case group vs controls in (B, C) after fitting a binary logistic regression model.

TABLE 1
Demographic and clinical characteristics of patients and non-diabetic controls.