HLA-A*11:01:01:01, HLA-C*12:02:02:01-HLA-B*52:01:02:02, Age and Sex Are Associated With Severity of Japanese COVID-19 With Respiratory Failure

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus causing coronavirus disease 2019 (COVID-19) was announced as an outbreak by the World Health Organization (WHO) in January 2020 and as a pandemic in March 2020. The majority of infected individuals have experienced no or only mild symptoms, ranging from fully asymptomatic cases to mild pneumonic disease. However, a minority of infected individuals develop severe respiratory symptoms. The objective of this study was to identify susceptible HLA alleles and clinical markers that can be used in risk prediction model for the early identification of severe COVID-19 among hospitalized COVID-19 patients. A total of 137 patients with mild COVID-19 (mCOVID-19) and 53 patients with severe COVID-19 (sCOVID-19) were recruited from the Center Hospital of the National Center for Global Health and Medicine (NCGM), Tokyo, Japan for the period of February–August 2020. High-resolution sequencing-based typing for eight HLA genes was performed using next-generation sequencing. In the HLA association studies, HLA-A*11:01:01:01 [Pc = 0.013, OR = 2.26 (1.27–3.91)] and HLA-C*12:02:02:01-HLA-B*52:01:01:02 [Pc = 0.020, OR = 2.25 (1.24–3.92)] were found to be significantly associated with the severity of COVID-19. After multivariate analysis controlling for other confounding factors and comorbidities, HLA-A*11:01:01:01 [P = 3.34E-03, OR = 3.41 (1.50–7.73)], age at diagnosis [P = 1.29E-02, OR = 1.04 (1.01–1.07)] and sex at birth [P = 8.88E-03, OR = 2.92 (1.31–6.54)] remained significant. The area under the curve of the risk prediction model utilizing HLA-A*11:01:01:01, age at diagnosis, and sex at birth was 0.772, with sensitivity of 0.715 and specificity of 0.717. To the best of our knowledge, this is the first article that describes associations of HLA alleles with COVID-19 at the 4-field (highest) resolution level. Early identification of potential sCOVID-19 could help clinicians prioritize medical utility and significantly decrease mortality from COVID-19.


INTRODUCTION
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the virus responsible for the onset of COVID-19, and has so far infected more than 87 million people worldwide, with the global death toll surpassing 1.8 million (as of January 10, 2021). The majority of infected individuals have experienced only mild or minor symptoms, ranging from asymptomatic cases to mild pneumonic disease. However, a minority of infected individuals developed severe respiratory symptoms but eventually survive with the aid of mechanical ventilatory support.
As of January 10, 2021, mortality rates of COVID-19 (https:// coronavirus.jhu.edu/data/mortality) were reported as highest in Mexico (8.8%), followed by Iran, Italy, Hungary, and Indonesia with approximate mortality rates of 3-4%, the United Kingdom, South Africa, Colombia, Canada, Spain, Brazil, France, Poland and Germany at around 2-2.9%, and other countries with reported mortality rates less than 2%. Differences in mortality rates can be caused by several factors, such as differences in the number of people tested, the capacity of local healthcare systems to handle increasing numbers of patients during the outbreak, the demographics of the population (with mortality rates known to be higher in older populations) and the different preventive measurements attempted by local governments. Nonetheless, while Japan stands as one of the oldest populations in the world (with citizens 65 years or older accounting for approximately 28% of the total population), the country has only suffered a mortality rate of 1.9%. Emerging clinical evidence has suggested that factors such as Bacillus Calmette-Gueŕin (BCG)-vaccination status (1), viral genomic strain (2,3) and host genetic factors (4,5) may be related to the severity and mortality of COVID-19.
The immunogenetic background of the host, such as HLA diversity, is well known to play an essential role in determining host responses to emerging infectious diseases such as TB (6), HBV (7-10), HCV (11,12), HIV (13,14), SARS-CoV-1 (15)(16)(17)(18)(19) and SARS-CoV-2 (20)(21)(22). In cases of viral infection, viruses have successfully breached the early layers of the innate defense system (early nonspecific responses) such as fever, phagocytosis and inflammation. The human second line of defense is heavily reliant on the HLA-restricted T-cell response mechanism, in which viral epitopes are presented by dendritic cells to CD8+ T lymphocytes through interactions with HLA class I alleles and CD4+ T lymphocytes through HLA class II alleles. Viral epitope presentation by HLA class I leads to clonal expansion of HLArestricted CD8+ cytotoxic T lymphocytes (CTLs), which are primed to perform antiviral defense during acute infection. Subsequently, latent reinfection and reactivation of the virus are controlled by memory CTLs.
The 2002-2003 SARS-CoV-1 outbreak is known to have higher pathogenicity and higher mortality rates, while SARS-CoV-2 infections show higher transmissibility. Experimental HLA association studies have indicated that HLA class I and class II alleles such as B*46:01 (15), B*07:03 (16), Cw*08:01 (17), and DRB1*12:02 (18) carriers are more susceptible to SARS-CoV-1 infection. Early studies of SARS-CoV-2 associations with HLA alleles generally used small sample sizes, but B*15:27 and C*07:29 have nonetheless been reported as associated with SARS-CoV-2 susceptibility in the Chinese population (20) and a SARS-CoV-2 susceptibility study with larger sample size in China reported associations of A*11:01, B*51:01 and C*11:01 with severity of COVID-19 in China (23 Due to emerging evidence that the diversity of HLA allele distributions across different populations is related to the susceptibility to and severity of COVID-19, this study was designed to examine HLA alleles associated with the susceptibility and severity of Japanese COVID-19.

High-Resolution HLA Genotyping by Next-Generation Sequencing
The AllType assay (One Lambda, West Hills, CA) was designed to cover the full length of five HLA genes (HLA-A, -C, -B, -DQA1 and -DPA1) and partial coverage for six HLA genes (HLA-DRB1, -DRB3, -DRB4, -DRB5, -DQB1 and -DPB1). Experimental protocols were carried out following vendor instructions, consisting of HLA target amplification, HLA library preparation, HLA template preparation, and HLA library loading onto an ion 530v1 chip (Thermo Fisher Scientific, Waltham, MA) in the ion chef (Thermo Fisher Scientific) followed by final sequencing on the ion S5 machine (Thermo Fisher Scientific).

HLA Genotype Assignment
Demultiplexing of barcodes and base-calling was carried out in Torrent Suite version 5.8.0 (Thermo Fisher Scientific). Raw fastq reads were extracted using the FileExporter function in Torrent Suite version 5.8.0. HLA genotype assignments were carried out using two different types of software, namely HLATypeStream Visual (TSV v2.0; One Lambda, West Hills, CA) as the default software for the Alltype ™ NGS Assay and NGSengine ® (v2.18.0.17625) by the GenDX company (GenDX, Utrecht, the Netherlands). The default analysis parameters and healthy metrics threshold was applied for TSV2.0 while we applied the "ignore regions" function in the NGSengine to eliminate known sequencing error sites in the ion S5 system. For the fully covered HLA genes (HLA-A, -C, -B, -DQA1 and -DPA1), 4-field HLA alleles were determined after comparing the reads with the IMGT 3.40.0 database; 3-field HLA alleles were determined for partially covered HLA genes (HLA-DRB1, -DRB345, -DQB1 and -DPB1). Novel HLA alleles that are absent from the IMGT 3.40.0 database or ambiguous HLA alleles were subjected to Pacbio Sequel sequencing by H.U. Group Research Institute (Tokyo, Japan). After confirming the presence of novel HLA alleles using Pacbio Sequel sequencing, Pacbio consensus reads were submitted to GenBank and the IMGT nomenclature committee for the official naming of HLA alleles.

Statistical Analysis
Case-control HLA allele association tests, HLA haplotype estimations, case-control HLA haplotype association tests, Hardy-Weinberg equilibrium tests, and HLA amino acid association tests were prepared and analyzed using the Bridging ImmunoGenomics Data Analysis Workflow Gaps (24) (BIGDAWG) R package. The default parameters of BIGDAWG were used except for the manual specification of HLA haplotypes for testing. Logistic regression, univariate, and multivariate analyses of COVID-19 comorbidities were calculated using R statistics software v4.0.1 (R Foundation for Statistical Computing, Vienna, Austria).
To evaluate associations of risk HLA alleles and comorbidities with the severity of COVID-19, we applied univariate analysis (generalized linear model) to observe the association of associated risk HLA alleles, age, sex, and comorbidities such as high blood pressure, type 2 diabetes, obesity, respiratory diseases (chronic obstructive pulmonary disease [COPD], asthma, tuberculosis [TB]), hyperuricemia and dyslipidemia with the severity of COVID-19.
We developed a risk prediction model for identifying potential sCOVID-19 patients. Samples were given a score of 2 for homozygous risk-HLA allele carriers, 1 for heterozygous risk-HLA allele carriers, or 0 for non-risk HLA-allele carriers. We created a risk prediction model using unconditional logistic regression in which the respective regression coefficient (weight) is multiplied by the number of risk-HLA alleles carried by the individual and associated factors. The receiver operating characteristic (ROC) curve was plotted using the true-positive rate (sensitivity) against the false-positive rate (1-specificity). The area under the curve (AUC) was used to evaluate the capability of the prediction model to distinguish between sCOVID-19 patients and non-sCOVID-19 patients. Positive predictive value (PPV) and negative predictive value (NPV) were also calculated. These analyses were carried out using RStudio v1.3.1093 (RStudio Team, Boston, MA) with the pROC package v.1.16.2 (25).

No Association of HLA Alleles With All COVID-19 (aCOVID-19 vs. Controls)
For the identification of HLA alleles associated with SARS-CoV-2 infection, all 190 aCOVID-19 patients (comprising both mCOVID-19 and sCOVID-19) were compared with the 423 Japanese healthy controls. None of the HLA alleles in the 8 HLA genes tested showed significance after multiple corrections (Supplementary Table 1).

No Association of HLA Alleles With Mild COVID-19 (mCOVID-19 vs. Controls)
To identify potential HLA alleles associated with mild symptoms of COVID-19, the 137 mCOVID-19 patients were compared with the healthy controls. No HLA alleles remained significant after multiple corrections (Supplementary Table 2). This result indicated that HLA frequencies for mCOVID-19 were similar to those of healthy controls.

Associations of HLA Alleles With Severe COVID-19 (sCOVID-19 vs. Controls)
Severity of COVID-19 is directly related to the management of COVID-19 infection and also contributes to mortality from COVID-19. Association studies of the 8 HLA genes were carried out by comparing HLA allele frequencies in the 53 sCOVID-19 patients and 423 Japanese healthy controls ( Table  1)

Risk Prediction Model Using Associated HLA alleles and Comorbidities
To evaluate the association of HLA-A*11:01:01:01 and HLA-C*12:02:02:01-HLA-B*52:01:01:02 haplotypes with various comorbidities, we performed univariate logistic regression analyses (  (Figure 3). PPV and NPV are illustrated in Figure 4, evaluating the distribution of cases and controls against log(OR) values. The threshold of the plot was calculated from the optimal sensitivity and specificity in the AUC. A significant difference (P = 5.22E-08) was observed between risk and non-risk of case-control in this study, with an odds ratio of 6.37 (95%CI = 3.15-12.86).

DISCUSSION
We conducted a high-resolution sequencing-based typing of eight HLA genes to evaluate the association of HLA alleles/ haplotypes with SARS-CoV-2 infection susceptibility and the

NS
Only HLA alleles with frequencies >5% are shown. Significant HLA alleles after multiple correction are highlighted in bold.
OR, odd ratio; 95% CI, 95% confidence interval; Pc-value, multiple testing corrected p-value; bin, rare HLA alleles with expected count < 5 are combined into a common class severity of COVID-19. A total of 53 sCOVID-19 patients and 137 mCOVID-19 patients were recruited from the NCGM, Tokyo, Japan, from February to August 2020 and 423 previously recruited healthy controls were used as healthy comparators in this study.
To evaluate HLA alleles potentially associated with susceptibility to SARS-CoV-2 infection, aCOVID-19 (Supplementary Table 1) and mCOVID-19 (Supplementary Table 2) were compared with healthy controls, respectively. We found that no HLA alleles remained significant after Bonferroni correction and there was no disease-specific in the HLA genes, suggesting that no HLA alleles were either protective or conferred susceptibility to SARS-CoV-2 infection.
Patients with sCOVID-19 are tightly associated with COVID-19 mortality. We observed that the HLA- However, we could not replicate results for B*46:01, which has been predicted to have the fewest predicted binding peptides for SARS-CoV-2, thus Individuals carrying B*46:01 are predicted to be more vulnerable to SARS-CoV-2 infection (27). Further, we could not confirm the findings of an HLA-binding prediction study (3) that predicted HLA-A*11:01, HLA-A*02:06, and HLA-B*54:01 as protective against SARS-CoV-2. The two studies mentioned above are based solely on the imperfect HLA prediction algorithm between host HLA alleles and SARS-CoV-2 sequences; besides, it should be noted that peptides predicted with high binding scores do not necessarily imply immunogenicity from the host.
Subsequently, we have developed a risk prediction model (Figures 3, 4) that includes HLA-A*11:01:01:01 carrier status, age, and sex for the early detection of potential sCOVID-19    patients in the hospital. The AUC value of the risk prediction model was 0.772 with optimal sensitivity of 0.715 and specificity of 0.717, suggesting that the current prediction model could be further improved by incorporating additional clinical and genetic factors associated with the severity of COVID-19. However, this study demonstrated that the contributions of genetic factors and comorbidities are useful in identifying potential sCOVID-19 cases. Further investigation and validation are needed to confirm the applicability of the prediction model to a bigger sample in Japan and the generalizability of this predictive model to other Asian populations. All samples used in this study were collected in the Center Hospital of the NCGM for the period from February to August 2020, where the mutant strain of SARS-CoV-2 was not yet prevalent in Japan. However, further studies are needed to evaluate correlations between the emerging mutant strain of SARS-CoV-2 and severity of COVID-19.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in HLA COVID-19 repositories (http://www.hlacovid19.org/).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by National Center for Global Health and Medicine.
The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
S-SK and KT contributed to the writing of the manuscript. YO, NN, MaS, and S-SK contributed to the experimental aspect of this study. NK, TS, MiS, SS, SI, MH, NO, and MM contributed to the sample collection of this study. All authors contributed to the article and approved the submitted version.

FUNDING
This research is supported by Japan Agency for Medical Research a n d D e v e l o p m e n t ( A M E D ) u n d e r G r a n t N u m b e r  . The dotted line representing the threshold of the plot is obtained from the AUC with optimal sensitivity and specificity. A significant difference (P = 5.22E-08) was observed between risk and non-risk of case-control in this study, with an odds ratio of 6.37 (95%CI = 3.15-12.86).
JP20kk0205012 and JP20fk0108104 and the NCGM Intramural Research Fund 20A2002D.