Gene-Environment Interaction Analysis Incorporating Sex, Cardiometabolic Diseases, and Multiple Deprivation Index Reveals Novel Genetic Associations With COVID-19 Severity

Increasing evidence indicates that specific genetic variants influence the severity of outcomes after infection with COVID-19. However, it is not clear whether the effect of these genetic factors is independent of the risk due to more established non-genetic demographic and metabolic risk factors such as male sex, poor cardiometabolic health, and low socioeconomic status. We sought to identify interactions between genetic variants and non-genetic risk factors influencing COVID-19 severity via a genome-wide interaction study in the UK Biobank. Of 378,051 unrelated individuals of European ancestry, 2,402 were classified as having experienced severe COVID-19, defined as hospitalization or death due to COVID-19. Exposures included sex, cardiometabolic risk factors [obesity and type 2 diabetes (T2D), tested jointly], and multiple deprivation index. Multiplicative interaction was tested using a logistic regression model, conducting both an interaction test and a joint test of genetic main and interaction effects. Five independent variants reached genome-wide significance in the joint test, one of which also reached significance in the interaction test. One of these, rs2268616 in the placental growth factor (PGF) gene, showed stronger effects in males and in individuals with T2D. None of the five variants showed effects on a similarly-defined phenotype in a lookup in the COVID-19 Host Genetics Initiative. These results reveal potential additional genetic loci contributing to COVID-19 severity and demonstrate the value of including non-genetic risk factors in an interaction testing approach for genetic discovery.


INTRODUCTION
Epidemiological research has uncovered multiple risk factors for COVID-19 severity, including sex, metabolic conditions such as type 2 diabetes and obesity, and socioeconomic status. Male sex is independently associated with higher mortality and worse COVID-19 outcomes (Palaiodimos et al., 2020;Park et al., 2020). Cardiometabolic (CM) conditions, such as Type 2 diabetes (T2D) and obesity, are also associated with increased COVID-19 susceptibility and severity (Barron et al., 2020;Zhu et al., 2020). Additionally, associated comorbidities of obesity, such as deregulated immune response, chronic inflammation, metabolic dysfunction, and compromised cilia on airway epithelial cells may put individuals at higher risk of severe COVID-19 (Ritter et al., 2020). Minoritized communities are disproportionately impacted by of COVID-19 and may be predisposed to worse conditions due to environmental factors, limited healthcare access, and other societal factors (Tai et al., 2021). Furthermore, housing and neighborhood density and increased work-related exposure may put low-income groups at higher risk (Burström and Tao, 2020). Additionally, the greater prevalence of underlying chronic conditions among individuals with lower socioeconomic status puts this group at greater risk of severe outcomes.
Genetic investigations, such as that from the Host Genetics Initiative (HGI) consortium, have demonstrated that specific genomic regions are associated with COVID-19 severity. The HGI global meta-analysis identified 13 genome-wide significant loci, 9 of which were associated with increased risk of severe symptoms for hospitalized COVID (Niemi et al., 2021). Several loci were further associated with interstitial lung disease and autoimmune and inflammatory diseases, possibly predisposing individuals to greater immune response and worse outcomes. Genetic studies have also been helpful in clarifying the causal impacts of highly-correlated body mass index (BMI) and cardiometabolic risk factors on COVID-19 severity (Leong et al., 2021).
It is not clear whether genetic factors impact the relationship between these key risk factors and COVID-19 severity, or whether these interactions can uncover novel genetic loci impacting this outcome. We sought to understand the interactions between genetic variants and previously reported risk factors, in order to gain novel understanding of the underlying mechanisms impacting COVID-19 severity and add an important dimension to the current epidemiological literature on COVID-19. We undertook a series of three genome-wide gene-environment interaction studies in the United Kingdom Biobank, conducting both interaction effect tests, to find genetic impacts on the risk factor-severe disease relationship, and joint tests of genetic main and interaction effects, to discover new genetic loci while accounting for heterogeneity due to risk factor interactions. The "environmental" exposures included sex, CM health (obesity and type 2 diabetes status), and social determinants of health (as quantified by the multiple deprivation index). The binary outcome was severe COVID-19 (as defined by hospitalization or death due to  while the rest of the population was used as a control group. Using GxE analyses and GWAS postprocessing methods, we found 5 genome-wide significant loci that provide insight into the biological mechanisms of severe COVID-19 outcomes.

UK Biobank Dataset
The UK Biobank (UKB) is a population-based cohort including over 500,000 individuals living in England, Wales, and Scotland. The sub-population of interest for this study included unrelated individuals of European ancestry in order to minimize genetic heterogeneity. Sample sizes varied depending on available phenotypes across these populations. COVID-19 test results were downloaded from the UKBB data portal on January 1, 2020 including all diagnostics available since April 2020, when the test results became publicly available. The severe COVID-19 phenotype for was defined as laboratory confirmed SARS-CoV-2 infection plus hospitalized COVID-19, with the rest of the population serving as controls versus the rest of the population. This definition was designed to mirror that of the "B2" phenotype used by the COVID-19 Host Genetics Initiative team (Niemi et al., 2021) (COVID-19 Host Genetics Initiative, 2020 and is outlined in Supplementary Figure S1. Genotype preprocessing was primarily performed centrally by the UKB with filters at the marker and sample level (Bycroft et al., 2018). Genotypes were further subsetted to common variants (minor allele frequency >0.05) for analysis.

Exposures of Interest
Risk factors used as exposures were measures of geneticallydetermined sex, CM health, and social determinants of health (SDH). For CM measures, BMI was used as a measure of obesity and T2D status was determined based on self-reported medical history and medication use ("probable" or "possible" algorithmic definitions described by Eastwood an colleagues (Eastwood et al., 2016). BMI and T2D were tested jointly, and then individually as a sensitivity analysis. The multiple deprivation index (MDI) was used as a measure of social determinants of health (SDH). The MDI is composed of metrics including economic stability, physical environment, and education; details can be found at https://biobank.ndph.ox.ac.uk/ukb/label.cgi?id 76. The MDI is a measure of relative deprivation for small areas (or neighbourhoods) in England. The MDI ranks every small area in England from 1 (most deprived area) to 32,844 (least deprived area). Deprivation "deciles" and percentages are published alongside ranks; percentages were used in this study. Naïve collation of these indices across UK countries can be problematic since the scores are constructed differently. So, for the MDI analyses, only the subset of the population living in England was used in order to reduce heterogeneity. We note that this still retains a majority of the population in question (86%).

Statistical Analysis
A genome-wide scan was performed based on a logistic regression model including gene-environment interaction terms: log it y ∼ g + exp osure + gp exp osure + covariates Y was the binary severe COVID-19 indicator (defined above). The three genome-wide scans used the following exposures: sex, CM conditions (BMI and T2D), and MDI. For the CM conditions, two environmental terms and two interaction terms were tested jointly. To test T2D exposure effect, GxT2D interaction obese and non-obese stratified analyses were run. Covariates included age, five genetic principal components, and sex. Genome-wide analysis was conducted using GEM (Gene-Environment interaction analysis in Millions of samples) v1.2 (Westerman et al., 2021) with model-based standard errors. For each variant, two statistical tests were derived: an interaction test and a joint test of the interaction term(s) plus the genetic main effect.
Interaction and joint analyses were conducted on the Terra cloud platform. Phenotype definitions and population summaries were created in interactive Jupyter notebooks with an R 3.6 kernel. Genome-wide interaction study (GWIS) analyses were submitted as workflows using a Workflow Description Language (WDL) script implementing GEM. Post-GWIS summarization and visualizations were created in a separate Jupyter notebook. These notebooks can be viewed on GitHub (https://github.com/ manning-lab/ukb-covid-gxe).

Variant Biology Investigation
Top variants were further investigated for trait associations, eQTLs, and linkage disequilibrium using dbSNP (NCBI), PhenoScanner (Kamat et al., 2019;Staley et al., 2016), RegulomeDB (Boyle et al., 2012), Type 2 Diabetes Knowledge portal (https://t2d.hugeamp.org/), and LDlink (Myers et al., 2020). Colocalization between interactions and eQTLs was performed using the coloc package (Giambartolomei et al., 2014) along with blood-based eQTL summary statistics from the eQTLGen Consortium (Võsa et al. , 2018). This analysis tests the hypothesis that the genetic association signal (here, of genetic interaction) shares a common causal variant with the eQTL at that locus, which increases confidence that the associated expression effect could be mediating the genetic effect. It produces a series of posterior probabilities, one of which corresponds to the hypothesis ("Hypothesis 4") that the causal variant is shared, for which a value > 0.9 would be considered strong support for colocalization.

RESULTS
The UKB population is described in Table 1, with subjects categorized into those having experienced severe COVID-19 (hospitalization or death from COVID-19; see Methods) and the remaining population (regardless of infection status). While the overall population had a greater proportion of females, males were more likely to be present in the case group (54 vs. 46% in controls, p 2.3 × 10 −11 ). Cases also had a greater prevalence of T2D (p 8.2 × 10 −48 ), higher BMI (p 6.7 × 10 −62 ) and higher MDI (p 9.7 × 10 −45 ), which is interestingly, less deprived.
We conducted a GWIS for each of the following exposures: sex, CM traits (BMI and T2D, tested jointly), and MDI. Top index variants after pruning are displayed in Supplementary Tables  S2-S4. Across all scans, five variants (rs2268616, rs182113773, rs148793499, rs11115199, and chr2:218260234) passed a genome-wide significance (GWS) threshold (p < 5 × 10 −8 ) in the joint test (one with sex as the exposure, three with CM traits, and one shared between the two). One of these five (rs11115199) was additionally found to be GWS in the CM interaction test (Figure 1; Table 2 and Table 3). Two of these variants (rs148793499, rs11115199) passed a study-wide significance threshold (p < 5 × 10 −8 /3 exposures 1.6 × 10 −8 ). No variants passed the GWS threshold in the MDI analysis. Of the five variants, a GWS marginal effect (i.e., from the identical statistical model but excluding the interaction term) was identified for only rs2268616 (p 1.08 × 10 −8 ) and rs182113773 (p 1.39 × 10 −8 ). This result shows that the joint test discovered variants that would not have been found via a standard GWAS in this population.
These five GWS variants were compared to genetic main effects from the HGI meta-analysis (with UKB omitted) testing the equivalent "B2" phenotype (hospitalized COVID-19 vs. population). A significant genetic main effect would constitute a partial replication of the joint test (genetic plus interaction effect) hypothesis. Neither of the two variants directly tested in the HGI meta-analysis showed nominal replication (both p > 0.05). For the remaining three, neither the variants nor close genetic proxies (r 2 > 0.5 using European-based linkage disequilibrium patterns) were available in the HGI dataset.
Next, we explored these top variants and interactions to understand their potential biological function. One variant of interest, rs2268616 (MAF 0.018), was genome-wide significant in the joint sex analyses (p 2.67 × 10 −8 ) and joint CM diseases (p 3.87 × 10 −8 ). This variant sits in an intron of the placental growth factor (PGF) gene and is modestly associated with testosterone in GWAS analyses (Prins et al., 2017). It is also a putative enhancer in lung and other tissues, and is an eQTL for EIF2B2 (a gene in a family of proteins that regulate viral mRNA translation) in whole blood. However, colocalization analysis using whole blood eQTL statistics from eQTLGen Consortium did not support the hypothesis of a shared causal variant with either PGF or EIF2B (posterior probabilities <0.1%).  Figure 2. T2D-stratified tests also showed a greater genetic effect on severe COVID-19 in individuals with T2D (OR 2.01 [1.22-3.32]) compared to those without T2D (OR 1.6 [1.33-1.9).
The additional GWS variants also indicated genetic effects on COVID-19 severity mediated through interaction effects. rs182113773 was found in the CM joint test (p 2.71 × 10 −8 ) and found in the intron for MACC1. This variant sits in an enhancer within neutrophils, monocytes, and B cells and has a RegulomeDB score of 0.59, suggesting a regulatory role in transcription.

DISCUSSION
Exploring the interplay of genetics and sex offers novel understanding of the underlying mechanisms impacting COVID-19 severity and adds an important dimension to the current epidemiological literature on COVID-19. In this genomewide gene-environment interaction analysis, we found five significant genomic regions (p < 5 × 10 −8 ) that interact with well-established risk factors to influence COVID-19 severity. Sex-dimorphic transcripts and hormones, as well as differences in environmental factors between the sexes, contribute to differential immune responses between sexes (Klein and Flanagan, 2016) and may mediate the established association of male sex with greater COVID-19 severity. In our analysis, rs2268616 was statistically significant in the joint analyses for sex and CM diseases (p < 5 × 10 −8 ). This variant has been associated with testosterone and placental growth factor gene in GWAS analyses, suggesting that this variant interacts with sex to mediate worse COVID-19 outcomes. Interestingly, this variant is also an eQTL for EIF2B2, a gene within a family of proteins that mediate viral mRNA translation. Moreover, prior studies have found an increased risk of death and significantly increased levels of inflammatory markers in male COVID-19 positive hospitalized patients compared with women (Lau et al., 2021). The EIF2B2 variant is linked to a strong transcription chromatin state in the cells of the lung, spleen, and B-cells, perhaps mediating the robust inflammatory response in males that is associated with worse COVID outcomes. Furthermore, rs2268616 sits within an enhancer in lung tissue, suggesting a role of this variant on transcription and respiratory complications after SARS-CoV-2 infection. Variant rs2268616 also shows a modest positive association with coronary artery disease and negative association with HOMA-B based on lookups in the Type 2 Diabetes Knowledge Portal (https://t2d.hugeamp.org/), indicating a potential influence on metabolic traits in general. Our findings suggest that this genetic variant may modify the relationship between biological differences and associated worse COVID-19 outcomes primarily through regulating viral RNA clearance immune response and lung cell transcription.
Comorbidities associated with CM health such as obesity and T2D have been implicated in mediating worse COVID-19 outcomes (Ritter et al., 2020). Our findings show four variants that were genome-wide significant in our CM joint tests: rs182113773, rs11115199, rs148793499, and rs2268616. Located within the intron for MACC1, a gene associated with BMI-adjusted waist circumference and BMI-adjusted waist-hip ratio, rs182113773 is an enhancer within neutrophils, monocytes, and B cells. This variant also has high gene expression in EBVtransformed lymphocytes and is a likely regulatory variant (RegulomeDB score of 0.59), which further suggests that the interaction of this variant with CM health has a regulatory role on immune response. Studies found that increased neutrophil count in T2D groups are associated with clinical severity and may mediate the positive association between T2D and COVID-19 severity (Zhu et al., 2020). Thus, this MACC1 variant may be interacting with CM health to mediate greater COVID-19 severity. Furthermore, obese adipose tissues overexpress receptors and proteases that enable the entry of SARS-CoV-2, possibly contributing to the severe inflammation and immune response of individuals with obesity (Ritter et al., 2020). Alongside decreased immune response mediated by testosterone, rs2268616 may also play a role in the deflated immune response seen in cases with CM disease status. This variant has a positive association with coronary artery disease and a negative association with HOMA-B (a method that assesses β-cell function from basal fasting glucose and insulin). For CM diseases, well controlled blood glucose and smaller glycemic variability have been associated with lower mortality during hospitalization due to COVID-19 (Zhu et al., 2020). Therefore, this variant may help explain the COVID-19 biology that increases the risk for individuals with T2D. Interactions between these genetic factors and deregulated immune response, chronic inflammation, metabolic dysfunction, and other comorbidities of obesity and T2D may be placing individuals at greater risk for worse outcomes of COVID-19.
Beyond rs2268616, other genome-wide significant variants identified in the CM analysis are of potential biological interest. The rs11115199 variant is an eQTL for METTL25, a gene that has known genetic links to BMI but which has minimal transcription in memory T cells and B cells. Thus, this locus may instead modify immune response via interactions with obesity. Meanwhile, the genetic effect of rs148793499 appears to be most directly modulated by metabolic status; in stratified CM tests, the variant showed strong effects in individuals with T2D but no major differences in effect in individuals with obesity.
Our analysis focusing on social determinants of health did not identify any significant variants. This may be a function of the noise associated with the MDI measurement and the difficulty in using this measurement to represent social determinants of health in a large diverse population. One study leveraged an Index of Multiple Deprivation and Income Deprivation Affecting Older People Index to show higher incidence of COVID-19 related deaths in the most deprived quartiles (Bach-Mortensen and Degli Esposti, 2021). We subsetted our sample to participants from England to reduce heterogeneity, but this reduced the sample size (by 16.5%; 2,007 vs. 2,402 cases) and thus statistical power available for the MDI analysis. Additionally, there may simply be little signal to uncover: the effects of genetics and social determinants of health on COVID-19 severity may be approximately independent.
The results of this study may be limited due to linkage disequilibrium and heterogeneity caused by geographic location within our sample population. The case definition allows us to identify variants associated with severity, however these results need to be taken with caution given the possibility of collider bias. Analyzing UK Biobank data, the participants tested for COVID-19 were highly selected for a range of genetic, behavioral, cardiovascular, demographic, and anthropometric traits (Griffith et al., 2020). By subsetting our dataset to individuals of European ancestry, we reduce the heterogeneity but face a limited sample size. Nonetheless, the use of interaction analysis allowed us to uncover novel variants: the GEM marginal p-value did not pass the genome-wide significance threshold for three of the five variants, meaning that these variants would not have been detected via a standard GWAS in this population.
Our findings suggest that gene-environment interaction effects contribute to the differences in COVID-19 severity. Sex-associated differences in immune response and CM disease comorbidities that deregulate immune response may interact with the identified genetic variants and put individuals at higher risk for worse outcomes of COVID-19. Future studies investigating the stratified effects of sex, T2D and BMI, and social determinants of health on COVID-19 susceptibility, as well as similar analysis with a wider array of ancestries, may further reveal underlying the genetic interaction effects that place individuals at higher risk.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: There are restrictions prohibiting the provision of data in this article. The data were obtained from a third party, UK Biobank, upon application. Interested parties can apply for data from UK Biobank directly. UK Biobank will consider data applications from bona fide researchers for health-related research that is in the public interest. By accessing data from UK Biobank, readers will be obtaining it in the same manner as we did.
Requests to access these datasets should be directed to UK Biobank, http://www.ukbiobank.ac.uk