Analysis of HLA Variants and Graves’ Disease and Its Comorbidities Using a High Resolution Imputation System to Examine Electronic Medical Health Records

Hyperthyroidism is a prevalent endocrine disorder, and genetics play a major role in the development of thyroid-associated diseases. In particular, the inheritance of HLA has been demonstrated to induce the highest susceptibility to Graves’ disease (GD). However, thus far, no studies have reported the contribution of HLA to the development of GD and the complications that follow. Thus, in the present study, to the best of our knowledge, for the first time, a powerful imputation method, HIBAG, was used to predict the HLA subtypes among populations with available genome-wide SNP array data from the China Medical University Hospital (CMUH). The disease status was extracted from the CMUH electronic medical records; a total of 2,998 subjects with GD were identified as the cases to be tested and 29,083 subjects without any diagnosis of thyroid disorders were randomly selected as the controls. A total of 12 HLA class I genotypes (HLA-A*02:07-*11:01, HLA-B*40:01-*46:01 and *46:01-*46:01, and HLA-C*01:02-*01:02, *01:02-*03:04, and *01:02-*07:02) and 17 HLA class II genotypes (HLA-DPA1*02:02-*02:02, HLA-DPB1*02:01-*05:01, *02:02-*05:01, and *04:01-*05:01, HLA-DQA1*03:02, HLA-DRB1*09:01-*15:01, and *09:01-*09:01) were found to be associated with GD in the Taiwanese population. Moreover, the HLA subtypes HLA-A*11:01, HLA-B*46:01, HLA-DPA1*01:03, and HLA-DPB1*05:01 were found to be associated with heart disease, stroke, diabetes, and hypertension among subjects with GD. Our data suggest that several HLA alleles are markedly associated with GD and its comorbidities, including heart disease, hypertension, and diabetes.


INTRODUCTION
Hyperthyroidism is a prevalent endocrine disorder characterized by an inappropriately high synthesis and secretion of the thyroid hormones triiodothyronine (T3) and thyroxine (T4) (1). Graves' disease (GD) is an organ-specific autoimmune disorder caused by thyroid-stimulatory immunoglobulins and is the most common type of hyperthyroidism (2). The autoantibodies produced imitate the activity of the thyroid-stimulating hormone (TSH) and lead to the stimulation of thyroid function, thus suppressing the TSH levels, while elevating the serum free T4 and T3 levels. Furthermore, thyroid disorders are associated with an abnormal elevation of the levels of serum lipids (3) and are associated with a range of clinical consequences, including an increased risk of metabolic disorders, cardiovascular mortality, and thyroid cancer (4).
Family and twin studies have indicated that genetics plays a major role in the development of thyroid diseases. Many susceptibility loci associated with autoimmunity (human leukocyte antigen [HLA], protein tyrosine phosphatase, nonreceptor type 22 [PTPN22], cytotoxic T-lymphocyte associated protein 4 [CTLA4], and interleukin 2 receptor subunit alpha [IL2RA]) or thyroid-specific genes (thyroid-stimulating hormone receptor [TSHR] and forkhead box E1 [FOXE1]) have been identified to be associated with various thyroid diseases (5).
In particular, the inheritance of HLA has been demonstrated to induce the highest susceptibility to GD (6)(7)(8). HLA is the most prominent candidate genetic factor for several autoimmune diseases because the major histocompatibility complex region is highly polymorphic and relevant to many immune response genes. The HLA locus is located on chromosome 6p21 and encodes (1): class I genes, such as the HLA-A, HLA-B, and HLA-C, and (2) class II genes, such as the HLA-DP, HLA-DQ, and HLA-DR genes (9). According to the IPD-IMGT/HLA Database, there are nearly 8,464 identified HLA-B alleles (10). However, HLA-genotyping methods, such as polymerase chain reaction using sequence-specific oligonucleotides or primers, or nextgeneration sequencing, remain impractical for analyzing largescale associations.
Nevertheless, to the best of our knowledge, thus far, no study has examined the contribution of HLA to the development of GD and the complications that follow among patients with GD. Hence, the aim of this study was to evaluate HLA variants in GD patients with different complications.

Study Population
The China Medical University Hospital (CMUH) Precision Medicine Project was initiated in 2018, and more than 170,000 subjects have been enrolled in this project to date (5). The recruitment and sample-collection procedures were approved by the ethical committees of CMUH (CMUH107-REC3-058 and CMUH110-REC3-005). In this project, each subject signed an approved informed consent form and provided blood samples for genome-wide genotyping. All clinical information, including disease diagnoses, medical and surgical procedures, prescriptions, laboratory measurements, physiological measurements, hospitalization, and catastrophic illness status were collected from the electronic medical records (EMRs) of the CMUH. The EMRs of CMUH contain the medical records of patients who sought care at the CMUH between 1992 and 2019 (6,7). This part of the study was approved by the ethical committees of the CMUH (CMUH110-REC1-095). , with at least one TSH and free T4 or total T4 value, and with genotyping information were identified as subjects with GD (case group). In addition, 29,083 subjects without any diagnosis of thyroid disorders, not using any anti-thyroid/thyroxin medications, and without any abnormal TSH/T4/free values were randomly selected from the EMRs as the control group. Other comorbidities were also identified by ICD-9-CM from the EMRs.

HLA Genotyping
Human genomic DNA was extracted from the blood samples of all the participants using standardized protocols. The Axiom-Taiwan Precision Medicine (TPM) genotyping SNP array for the high-throughput Affymetrix Axiom genotyping platform was produced to obtain the maximum amount of genetic information from the samples of the Taiwanese population. SNP genotyping was performed using the Axiom-TPM array. A total of 653,291 SNPs across the whole human genome were included in the Axiom-TPM Array Plate (Affymetrix, Inc., Santa Clara, CA, USA).

HLA Imputation and Prediction
The HIBAG R package prediction algorithm was used to generate Taiwanese population-specific parameter estimates. The estimates were based on individual classifiers consisting of HLA and SNP haplotype probabilities. Probabilities were estimated from the samples and SNP subsets. Finally, the different HLA types were predicted by averaging the posterior probabilities above 0.9 from all the generated classifiers. A total of 16 parameter estimates were generated for two-field and threefield parameter estimates covering HLA-A, C, B, DRB1, DQA1, DQB1, DPA1, and DPB1 (8).

Statistical Analysis
For the baseline characteristics, continuous data were presented as the means with standard deviation, and categorical data were presented as proportions. We used t-tests to compare the mean values of continuous variables and chi-squared tests to compare the frequencies of categorical variables between the two groups. All tests were two-sided, and differences with P values < 0.05 were considered statistically significant. Statistical analyses were performed using SPSS software (version 21.0; IBM, Armonk, NY, USA) and R version 3.4.4 (11).

RESULTS
A total of 2,998 subjects with GD were identified using the EMRs of the CMUH. Demographic and clinical information is shown in Table 1. More than 70% of the subjects were women, 66.4% were aged between 30 and 60 years, and 51.8% were obese or overweight. With regard to the comorbidities, 5.4%, 6.6%, and 9.2% of the subjects with GD had hypertension, heart disease, stroke, and diabetes, respectively.
Furthermore, the association between the identified HLA genotypes and different comorbidities among subjects with GD was investigated. HLA-A*11:01-*33:03 (13.7% vs. 7.3%, P value = 0.042) and *02:07-*11:01 (2.7% vs. 12.9%, P value = 0.010), and 2.3%, P value = 0.009) genotypes were identified among the subjects with GD that had diabetes than those with GD that did not have diabetes (Table 4). However, none of the above associations passed the P value for multiple testing correction.
Our study had several strengths. We used the HIBAG algorithm, an ensemble haplotype-based classifier, to impute HLA types using attribute BAGging, which makes predictions by averaging HLA-type posterior probabilities over an ensemble of classifiers that are built on the basis of different samples (30,31). This HIBAG method has been successfully used to investigate the association between HLA alleles and rheumatoid arthritis in a Taiwanese population (31). In the present study, by using these imputed HLA alleles, we identified several HLA alleles that were notably associated with GD in a population for which genome-wide SNP array data were available. Another strength of our study was that we extracted data from the EMR, which provides comprehensive information regarding the diagnosis of diseases, medicine use, and laboratory data for each individual and clearly defines the cases and controls. Moreover, the sample size in the present study was large, which increased its statistical power. However, our study also has a few limitations: firstly, this study was performed based on the EMR from a single medical center, which may limit the generalizability of our findings; secondly, no information regarding lifestyle-associated factors, such as physical activity, alcohol use, and tobacco smoking on the EMR, were analyzed, due to which we could not exclude the potential residual confounding effects of these factors.
In conclusion, we used a powerful imputation method, HIBAG, to predict the HLA subtypes among populations with available genome-wide SNP array data and linked EMR data to identify several HLA alleles that were markedly associated with GD and its comorbidities, including heart disease, hypertension, and diabetes; to the best of our knowledge, our study is the first to report the influence of different HLA subtypes on GD-associated comorbidities. The functional considerations for the relevant HLA alleles and the implications of our study's findings could be study in future research on this field.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by China Medical University Hospital. The patients/ participants provided their written informed consent to participate in this study.