Host Genetic Liability for Severe COVID-19 Associates with Alcohol Drinking Behavior and Diabetic Outcomes in Participants of European Descent

Risk factors and long-term consequences of COVID-19 infection are unclear but can be investigated with large-scale genomic data. To distinguish correlation from causation, we performed in-silico analyses of three COVID-19 outcomes (N > 1,000,000). We show genetic correlation and putative causality with depressive symptoms, metformin use (genetic causality proportion (gĉp) with severe respiratory COVID-19 = 0.576, p = 1.07 × 10−5 and hospitalized COVID-19 = 0.713, p = 0.003), and alcohol drinking status (gĉp with severe respiratory COVID-19 = 0.633, p = 7.04 × 10−5 and hospitalized COVID-19 = 0.848, p = 4.13 × 10−13). COVID-19 risk loci associated with several hematologic biomarkers. Comprehensive findings inform genetic contributions to COVID-19 epidemiology, molecular mechanisms, and risk factors and potential long-term health effects of severe response to infection.


INTRODUCTION
Host genetic liability to severe COVID-19 (coronavirus disease 2019) following SARS-Cov-2 (severe acute respiratory syndrome coronavirus 2) infection is of immediate clinical interest. (Ji et al., 2020). While preexisting comorbidities, including hypertension, type 2 diabetes, and asthma, have been characterized in large population cohorts, it remains unclear which long-term health consequence may arise following COVID-19 infection. (Atkins et al., 2020). Furthermore, the direction of epidemiological observations is confounded by many external factors such that bidirectional effects have been reported for comorbidities such as type 2 diabetes. (Atkins et al., 2020;Rubino et al., 2020).
To understand better the association of biological measurements, lifestyle indicators, biomarkers, and health and medical records with COVID-19 susceptibility, we performed analyses to distinguish genetic correlation from genetically informed causal effects using large-scale genomic data of COVID-19 outcome severity in over 1,000,000 participants from the COVID-19 Host Genetics Initiative.
For two COVID-19 phenotypes with h 2 z-scores > 4 (the threshold recommended by LDSC developers), (Bulik-Sullivan et al., 2015b), severe respiratory COVID-19 and hospitalized COVID-19, we then estimated their genetic correlation (r g ) with 4,083 phenotypes from the UK Biobank (UKB, see http:// www.nealelab.is/uk-biobank). LDSC analyses were based on linkage disequilibrium information from the 1,000 Genomes Project (1kGP) European reference population. When available for continuous traits, we restricted our analyses to genome-wide association statistics generated from inverse-rank normalized phenotypes. Multiple testing correction was applied to genetic correlation results using the false discovery method (FDR q < 0. 05) based on the number of COVID-19 outcomes (two traits with h 2 z-score > 4) and the number of suitably powered UK Biobank phenotypes (772 traits with h 2 z-scores > 4) against which r g was tested.

Latent Causal Variable Analysis
To distinguish between genetic correlation and causative effects, we applied the Latent Causal Variable (LCV) approach to all nominally significant genetic correlations. (O'Connor and Price, 2018). Under the assumption of a single effect-size distribution in per-trait GWAS, LCV tests for the presence of a single latent trait connecting COVID-19 outcomes to UKB phenotypes. LCV was performed in R using the 1kGP European reference LD panel and genome-wide association statistics for SNPs with minor allele frequencies >5%. Variants in the major histocompatibility complex region of the genome were excluded because of its complex LD structure. LCV genetic causality proportion (gĉp) estimates were only interpreted for trait pairs where both traits exhibit LCV-calculated h 2 z-scores ≥ 7, as recommended by the LCV developers. (O'Connor and Price, 2018). The gĉp estimate ranges from 0 to 1 with values near zero indicating partial causality and values approaching 1 indicating full causality. The sign of the gĉp indicates the direction of the causal relationship. In this study, positive gĉp indicates that the COVID-19 outcome causes the second phenotype while negative gĉp indicates that the second phenotype causes the COVID-19 outcome. LCV developers indicated that gĉp >0.7 can be interpreted as evidence of a strong causal relationship between trait pairs. (O'Connor and Price, 2018). Multiple testing correction was applied using the FDR method (q < 0.05) based on the number of COVID-19 outcomes (N 2) and the number of suitably powered UK Biobank phenotypes (N 188 traits with h 2 z-scores ≥ 7) against which r g also was tested.

Replication
Significant genetic correlations and latent causal effects were replicated using the FinnGen resource (release 5; accessed September 2021). A total of 20 traits could be mapped to FinnGen for replication.
We analyzed genome-wide association statistics generated from the analysis of 7,218 phenotypes in six ancestries: European (N 420,531), Central/South Asian (N 8,876), African (N 6,636), East Asian (N 2,709), Middle Eastern (N 1,599), and Admixed American (N 980). Pan-UKB traits were analyzed if they had 100 cases in European ancestry or 50 cases in all other ancestries. Association statistics were covaried with sex, age, age 2 , sex×age, sex×age 2 , and the first ten withinancestry principal components. A detailed description of the methods used to generate these data is available at https://pan. ukbb.broadinstitute.org/. Multiple testing correction was performed for the number of phenotypes (N 7,218) and ancestry groups (N 6) using the p.adjust (method "fdr") function of R.

Statistical Comparison of Effect Estimates
To compare the magnitude of genetic correlation and genetical causality proportion between COVID-19 outcomes, we performed two-sided Z-tests. P-values for each Z-test were corrected for multiple testing using the false discovery method.  Tables S1-S2). LDSC developers recommend that r g be estimated with traits whose h 2 z-score > 4, permitting tests of r g for severe respiratory COVID-19 (h 2 z-score 5.38) and hospitalized COVID-19 (h 2 z-score 4.11). We next tested for pleiotropy between COVID-19 risk loci and 4,083 phenotypes from the UK Biobank using the LDSC method. Severe respiratory COVID-19 and hospitalized COVID-19 were genetically correlated with 127 and 174 phenotypes, respectively (Figure 1; Supplementary Table S3), reflecting 188 traits. A total of 111/184 phenotypes were genetically correlated with both COVID-19 outcomes (FDR q < 0.05) and there were no differences in r g magnitude between each phenotype and the COVID-19 outcomes. The most significant genetic correlate of COVID-19 outcomes was waist circumference (severe respiratory COVID-19 r g 0.272, p 2.18 × 10 −9 and hospitalized COVID-19 r g 0.342, p 1.66 × 10 −8 ).

Replication of r g and LCV in FinnGen
In FinnGen release 5, we mapped 188 significant genetic correlates to 18 phenotypes. Note that 1-to-1 trait matching was not possible for most traits. For example, anthropometric measurements make up 22% of the significant genetic correlates but they are not available in FinnGen. Genetic correlation between two COVID-19 outcomes and 18 FinnGen traits identified seven replicated pleiotropic effects (Supplementary Table S3). All replicated genetic correlations were in the same direction as the discovery r g in UKB and there were no differences in magnitude when comparing UKB and FinnGen. We replicated (p < 0.05) the genetic correlation between hospitalized COVID-19 and COPD, depression, diabetes, pain, tramadol use, and paracetamol use. We replicated a single genetic correlation with severe respiratory COVID-19 compared to hypertension. Of these seven traits, three had a significant gĉp with either severe respiratory COVID-19 or hospitalized COVID-19 in the discovery phase with UKB: pain, diagnosed diabetes, and current depression.

DISCUSSION
In light of the 2020 COVID-19 pandemic and ongoing 2021 interpersonal distancing protocols, host genetic susceptibility to FIGURE 3 | Phenome-wide association study (PheWAS) of risk loci from three COVID-19 outcomes: A2: very severe respiratory confirmed COVID-19 versus population, B2: hospitalized COVID-19 versus population, and C2: COVID-19 versus population. Each facet details the pleiotropic effects of loci detected by GWAS of the indicated COVID-19 outcome. Each data point corresponds to a single trait assessed in UK Biobank participants of European descent. For the top associations of interest, the association between SNP and phenotypes across all ancestries is described. Details of the effect of each SNP in six populations from the Pan-ancestry UKB are provided in Supplementary Table S5.
Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 765247 severe responses to SARS-CoV-2 infection is critical. We used genome-wide data to uncover overlap and putative causal relationships between genetic liability to COVID-19 severity, preclinical risk factors (e.g., alcohol consumption), (Ko et al., 2020), and long-term consequences of infection (e.g., diabetes). (Del Rio et al., 2020). Our most notable findings reflect (1) causal consequences of cigarette smoke exposure and alcohol consumption on COVID-19, (2) causal consequences of COVID-19 on diabetes, and (3) ABO blood type effects on COVID-19 severity across ancestry. (Ellinghaus et al., 2020). Exposure to cigarette smoke and tobacco products through either direct consumption or as an environmental exposure in community spaces has garnered considerable attention. On one hand, direct consumption of tobacco via smoking was associated with a significantly worse COVID-19 prognosis than nonsmokers. (Peng et al., 2021). Smoking may cause a greater abundance of respiratory epithelial angiotensin-converting enzyme-2 (ACE2) receptors contributing to greater viral load as ACE2 is the major binding point for SARS-CoV-2 via spike protein interaction. (Brake et al., 2020;Leung et al., 2020). Furthermore, as SARS-CoV-2 is transmitted via aerosolized salivary particles, environmental exposure to the virus may contribute to higher transmission. (Ahmed et al., 2020).
The relationships between alcohol and diabetes with COVID-19 severity demonstrate that epidemiologic observations between them are due, in part, to putative causal effects. (Saengow et al., 2020). Persons with diabetes have been identified as some of the most high-risk individuals for COVID-19 and there are several instances of spontaneous diabetes onset following COVID-19 recovery. (Chee et al., 2020). As risk factors, diabetes and alcohol consumption accentuate two prominent mechanistic hypotheses leading to severe COVID-19 and higher mortality: irregular blood viscosity and hyperglycemia. Increased glucose levels directly increase SARS-CoV-2 replication and proliferation. (Lim et al., 2021). This relationship is so pronounced that antidiabetic medications and other glucose-lowering medications may be viable treatments to reduce mortality in COVID-19 positive diabetics. (Luo et al., 2020;Sardu et al., 2020).
In line with other studies, (Hawkins et al., 2020;Mena et al., 2021), we hypothesize that socioeconomic status likely mediates many of the effects observed here, such as those between COVID-19 outcomes and adopted as a child, home locations, and workplace conditions. For example, in our previous work, we demonstrated that regions of the genome associated with household income attenuated the effect of body mass index on severe respiratory COVID-19. (Cabrera-Mendoza et al., 2021). The genes associated with household income have been linked to brain regions associated with educational attainment such as the anterior cingulate cortex and cerebellum. Furthermore, medium spiny neurons and serotonergic neurons in these regions appear to play a role in higher household income. (Hill et al., 2019a). Therefore, genetic studies of socioeconomic variables may capture genetic and environmental contributions to brain structure and function that may be relevant for other social factors associated with health, such as BMI or the childhood adversity and poor work condition information detected here. (Hill et al., 2019b;Polimanti et al., 2019). The LCV method used here is not suited for multivariable analyses and future work is necessary to untangle which of the detected effects are independent of various measures of socioeconomic status. (O'Connor and Price, 2018).
With single-SNP measures we recapitulate the relationship between COVID-19 severity and diabetes outcomes by detecting consistent negative association between rs8176719 (ABO locus) and alkaline phosphatase, an enzyme with evidence of protective effects against diabetes when present in sufficient concentrations. (Malo, 2015).
Our findings have two primary limitations. First, since we investigated datasets generated from participants of European descent, our findings may not translate to other ancestries. Second, the methods used herein fail to entertain multivariable latent causal factors, such as socioeconomic status, which has a documented history of confounding causal inference studies and disproportionately influencing COVID-19 infection. (Hawkins et al., 2020;Muniz Carvalho et al., 2020;Mena et al., 2021). These findings reflect potential measures to refine, and/or improve accuracy and generalizability of COVID-19 severity outcomes with epidemiological and self-report information. (Ji et al., 2020). The detected risk factors and potential chronic outcomes have critical public health consequences on the long-term economic burden of the COVID-19 pandemic (Miller et al., 2020).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.