Mendelian randomization and Bayesian model averaging of autoimmune diseases and Long COVID

Background Following COVID-19, reports suggest Long COVID and autoimmune diseases (AIDs) in infected individuals. However, bidirectional causal effects between Long COVID and AIDs, which may help to prevent diseases, have not been fully investigated. Methods Summary-level data from genome-wide association studies (GWAS) of Long COVID (N = 52615) and AIDs including inflammatory bowel disease (IBD) (N = 377277), Crohn’s disease (CD) (N = 361508), ulcerative colitis (UC) (N = 376564), etc. were employed. Bidirectional causal effects were gauged between AIDs and Long COVID by exploiting Mendelian randomization (MR) and Bayesian model averaging (BMA). Results The evidence of causal effects of IBD (OR = 1.06, 95% CI = 1.00–1.11, p = 3.13E-02), CD (OR = 1.10, 95% CI = 1.01–1.19, p = 2.21E-02) and UC (OR = 1.08, 95% CI = 1.03–1.13, p = 2.35E-03) on Long COVID was found. In MR-BMA, UC was estimated as the highest-ranked causal factor (MIP = 0.488, MACE = 0.035), followed by IBD and CD. Conclusion This MR study found that IBD, CD and UC had causal effects on Long COVID, which suggests a necessity to screen high-risk populations.


Introduction
Autoimmune diseases (AIDs) are a group of diseases characterized by the body's immune response to autologous antigens resulting in damage to its own tissues, affecting 5%-10% of the global population (Trier and Houen, 2023) and increasing in incidence (Lerner et al., 2015).More than 80 AIDs have been identified, including urticaria, asthma, psoriasis, Type 1 diabetes mellitus (T1D), rheumatoid arthritis (RA), inflammatory bowel disease (IBD) (including Crohn's disease (CD) and ulcerative colitis (UC)), etc. AIDs can be divided into two subclasses: systemic and organ-specific (Marrack et al., 2001).Organ-specific AIDs produce an immune response only to the autoantigens of a specific organ, resulting in local damage (DiMeglio et al., 2018;Filippi et al., 2018), whereas systemic AIDs produce an immune response to a widely distributed autoantigen, resulting in widespread damage (Marrack et al., 2001;Rahman and Isenberg, 2008;Smolen et al., 2018;Ze et al., 2021).The presence of autoantibodies (autoAbs) is a common feature of AIDs and may increase a patient's rate of infection with other diseases (Puel et al., 2022).It has been found that autoAbs are associated with the incidence rate and prognosis of the pandemic COVID-19 (Pablos et al., 2020;Sawalha et al., 2020;Knight et al., 2021) and may even be related to the emergence of the Long COVID (Su et al., 2022;Taghadosi et al., 2023).Long COVID refers to the persistence of symptoms beyond 3 months of SARS-CoV-2 infection that continue for no less than 2 months and aren't caused by any other illness (Lechner-Scott et al., 2021;Mehandru and Merad, 2022;Raman et al., 2022).
Approximately 31%-69% of COVID-19 patients suffer from Long COVID (Groff et al., 2021), the pathogenesis of which is complex and hypotheses include (1) residual virus storage, (2) immune failure leading to slow clearance of the virus, (3) cross reactivity between SARS-CoV-2-specific antibodies and host proteins leading to autoimmunity, etc. (Ramakrishnan et al., 2021).There is still a large knowledge gap in the pathophysiology of Long COVID and a lack of targeted treatment (Ceban et al., 2022;Ceban et al., 2023).Therefore, finding the genetic association between AIDs and Long COVID may inspire the prevention and treatment of Long COVID.However, there is currently a lack of research on whether there is a causal relationship between the two or the direction of the causal relationship.
Mendelian randomization (MR) emerges as a more robust analytical approach for drawing reasonable etiological conclusions.MR leverages genetic variants as surrogates for the exposure, facilitating stronger causal inferences.MR offers a distinct advantage over other methods in terms of susceptibility to confounding factors (Wang et al., 2022).This is because germline genetic variations are assigned at random during meiosis, allowing them to serve as proxies for lifelong exposures without being influenced by the potentially biased effects of reverse causation.In addition, by extending the MR method, bidirectional MR has been employed for identifying the direction of causal effects between two interconnected traits (Sekula et al., 2016).MR-BMA is a multivariable MR approach that can be extended to the dimensions of high-throughput experiments and obtain risk factors from many candidate risk factors (Zuber et al., 2020).By reporting the model-averaged causal effect (MACE), MR-BMA can better and more consistently detect true risk factors for disease than either the inverse variance weighted (IVW) method or other variable selection methods (Zuber et al., 2020).
In this study, we utilized pooled statistics from genome-wide association studies (GWAS) for Long COVID and AIDs.The influence of several AIDs on Long COVID and the influence of Long COVID on AIDs were identified respectively and it was concluded that IBD, CD and UC have significant causal effects on Long COVID, which may be clinically helpful in identifying risk factors, screening high-risk groups and preventing disease progression.

Study design
Figure 1 presents a concise depiction of the bidirectional MR design.MR relies on three fundamental assumptions: 1) There is a strong and consistent correlation between the exposure and the chosen instrumental variable, representing the genetic variant.2) There is no link between genetic variation and confounding factors.
3) Genetic variations influence outcomes exclusively via exposure, rather than alternate pathways (Kennedy et al., 2020).Our research employed summary-level data derived from publicly available GWAS conducted on Long COVID and AIDs.Our approach involved three distinct steps.Initially, we carefully identified genetic variants associated with AIDs to establish a causal inference from AIDs to Long COVID.Subsequently, we utilized genetic variants linked to Long COVID to infer the causality of Long COVID to AIDs.Finally, we performed MR-BMA for the positive results of univariable MR.

Genetic instrumental variables for AIDs and Long COVID
Summary statistics for AIDs were obtained from the FinnGen biobank analysis round 9 datasets (Table 1) (10,175 urticaria cases and 364,583 controls; 8,967 T1D cases and 308,373 controls; 12,555 RA cases and 240,862 controls; 9,631 asthma cases and 210,122 controls; 5,759 psoriasis cases and 364,071 controls; 7,625 IBD cases and 369,652 controls; 1,581 CD cases and 359,927 controls; 5,034 UC cases and 371,530 controls) (Kurki et al., 2023).Individuals of ambiguous sex, high genotypic deletion rate (>5%), high heterozygosity (+−4SD) and non-Finnish origin were excluded during the sample quality control step.Variants with high deletion rates (>2%), low Hardy-Weinberg Equilibrium p-values (<10 -6 ), and low minor allele counts (<3) were excluded during the variant quality control step.To account for potential confounding factors, such as age, gender, ten principal components and genotype batch, the researchers at FinnGen employed mixed-model logistic regression techniques during their analyses (Wei et al., 2023).We want to acknowledge the participants and investigators of the FinnGen study.
We acquired summary-level data for Long COVID from the most recently published GWAS available (Lammi et al., 2023), consisting of 6,407 cases (Long COVID after test-verified, selfreported or clinician-diagnosed SARS-CoV-2 infection) and 46,208 controls (individuals that had SARS-CoV-2 but did not develop Long COVID).Participants are defined as Long COVID cases when the criteria below are met.For studies with questionnaires, self-reported COVID-19 symptoms that cannot account for other diagnoses and reported continued significant influence on daily activities were adopted.For cohorts having electronic health records, diagnosis codes for Long COVID (Post COVID-19 condition, ICD-10 code U09 (.9)) were included.The study was reported in Nature's news section (Ledford, 2023).

Determination of genetic instrumental variables
To adhere to the hypotheses outlined in Figure 1 of our study, we carefully selected single-nucleotide polymorphisms (SNPs) that fulfilled two important criteria: robustness and independence as predictors.This selection was based on the significant genome-wide results (p < 5 × 10 −8 ) obtained from the published GWAS.Building upon existing research insights, we utilized SNP clumping techniques to identify distinct and independent loci.This involved establishing a linkage disequilibrium (LD) threshold of r 2 = 0.01 and implementing a clumping window of 5000 kb (Xie et al., 2023).
Rigorous data harmonization procedures were meticulously performed to ensure that the SNP's impact on both the exposure and outcome aligned with the identical allele.In cases where SNPs exhibited various effect alleles from distinct strands, we diligently fixed the strand orientation to maintain consistency in effect alleles across datasets.Nevertheless, the harmonization of palindromic SNPs posed greater challenges since their alleles were identical on both strands.To mitigate any ambiguity in reporting the effect allele in the exposure and outcome GWAS, we decided to exclude these palindromic SNPs from the analysis (Bowden et al., 2017;Guo et al., 2023).
Finally, an assessment was conducted on the potency of each SNP using the F-statistics, an indicator dependent on the magnitude and accuracy of the genetic impact on the characteristic under consideration.SNPs displaying F-statistics below 10 were eliminated, as F-statistics exceeding 10 indicated adequate potency to uphold the credibility of the SNPs (Vaucher et al., 2018).

MR analysis
The primary method used to assess the relationship between AIDs and Long COVID was the IVW approach with multiplicative random effects.This method incorporates the Wald estimator of SNPs to estimate the effect.To ensure the validity of the IVW results, each SNP must satisfy the assumption of MR, specifically level-free pleiotropy.Since we used 8 exposures in the MR analysis, we applied a Bonferroni correction, meaning that a significance level of p < 0.00625 (0.05/8) was used.p values ranging from 0.00625 to 0.05 were regarded as suggestively significant.
Aside from the IVW method, several sensitivity analyses were employed as supplementary analysis methods, including MR-Egger and weighted median.
To examine the exclusion restriction assumption, we employed the MR-Egger regression intercept along with its corresponding 95% confidence intervals (CIs) to evaluate the extent of bias in causal Assumptions and framework of the bidirectional Mendelian Randomization (MR) investigation on the links between autoimmune diseases (AIDs) and Long COVID.estimates attributable to directional pleiotropy (Bowden et al., 2015).By employing MR-Egger analysis, we assessed the presence of pleiotropy in instrumental variables, wherein a nonzero intercept signifies potential bias in the IVW estimate (Burgess and Thompson, 2017).We employed heterogeneity indicators derived from the IVW approach (using a significance threshold of p < 0.05) to signify the presence of potential horizontal pleiotropy.Furthermore, the intercept derived from the MR-Egger regression served as an indicator for directional pleiotropy (where p < 0.05 indicated the presence of directional pleiotropy) (Vaucher et al., 2018).We used MRPRESSO to detect outliers and rerun the analysis after excluding them.The analyses were carried out using the TwoSampleMR package (version 0.5.7)(https://mrcieu.github.io/TwoSampleMR)and MRPRESSO (version 1.0) (https://github.com/rondolab/MR-PRESSO)within the R environment (version 4.2.1).We used Sangerbox Tools (Shen et al., 2022) for the figure.

MR-BMA
We utilized MR-BMA to further strengthen the reliability of our MR analysis and find out the best model for the causal effect of AIDs on Long COVID.Exposures that have causal effects on Long COVID were included in the model of the multivariate framework if they don't result in multicollinearity and have a strong correlation with one of the model's instrumental variables (IVs).Given there is a high degree of overlap in genetic risk factors between AIDs, we did not remove AIDs for multicollinearity to avoid deletion of important data.Posterior probability (PP) for every model and marginal probability of inclusion (MIP) for every risk factor were estimated to identify the best model.The direction and importance of the risk factors were determined using the MACE for each model.
Unlike MR (using MR-Egger and MRPRESSO for pleiotropic variants), MR-BMA allows pleiotropic effects (Zuber et al., 2020).Pleiotropic variants can be identified as outliers in the model fit in MR-BMA (Zuber et al., 2020).We used the Q-statistic and Cook's distance for sensitivity analyses to detect outliers and influential variants.Any genetic variant with a Q-statistic>10 or a Cook's distance > N exposure/N SNPs) is marked with the name of the variant in the diagnostic plots.After excluding them, we performed the analysis again.The analyses were carried out using the TwoSampleMR package (version 0.5.7)(https:// mrcieu.github.io/TwoSampleMR)within the R environment (version 4.2.1).More details for MR-BMA can be found in previous studies (Zuber et al., 2020).

Determination of genetic instrumental variables
We selected a total of 8 SNPs that were significantly associated with urticaria, 108 for T1D, 50 for RA, 23 for asthma, 36 for psoriasis, 62 for IBD, 7 for CD and 48 for UC, as listed in Supplementary Tables S1-S8.The F-statistics for these SNPs were greater than 100, indicating a strong association.
The MR-Egger regression intercept term showed no clear pleiotropy among the SNPs in both datasets, with all p-values exceeding 0.05(Supplementary Table S9).No evidence of heterogeneity was observed in genetic variants associated with AIDs and Long COVID, as all p-values were above 0.05(Supplementary Table S9).The directions of the sensitivity analyses and the predicted impacts of the primary research were the same, including MR-Egger, weighted median, weighted mode and simple mode methods (Supplementary Table S9).The evidence presented demonstrated the stability of the outcomes derived from the IVW method.No outliers were detected with MRPRESSO.
However, there was no significant association observed between genetically predicted other AIDs (including urticaria, T1D, RA, asthma and psoriasis) and the risk of Long COVID.No evidence of horizontal pleiotropy or heterogeneity was observed, with all p-values exceeding 0.05(Supplementary Table S9).No outliers were detected with MRPRESSO.
There were no SNPs found to be genome-wide significantly associated with Long COVID.To ensure analytical accuracy, we opted not to use a relaxed p-value threshold.

MR-BMA
To ensure the reliability of the findings considering complicated genetic effects, MR-BMA analysis was employed for assessing the relationship of causality between Long COVID and AIDs.As genetically predicted IBD, CD and UC showed a significant association with the risk of Long COVID, we included IBD, CD and UC in the MR-BMA.We selected a total of 69 SNPs that were significantly associated with IBD, CD and UC (Supplementary Table S10).
UC was estimated as the highest-ranked causal factor (MIP = 0.488, MACE = 0.035), followed by IBD and CD (Table 2).UC was also the highest-ranked causal model (PP = 0.464), followed by IBD and CD (Table 3).
As for the sensitivity analyses, any genetic variant with a Q-statistic>10 or a Cook's distance >0.04 (3/69) is marked with the name of the variant.No outliers and influential variants were detected (Figures 3, 4).

Discussion
Based on GWAS data, our study applied a bidirectional MR analysis and MR-BMA to reveal the causal associations between AIDs and Long COVID.MR analysis indicated IBD, CD and UC may increase the risk of Long COVID, which was robust in different sensitivity analyses.MR-BMA further suggested that among the three AIDs, UC was most connected with Long COVID, followed by IBD and CD.In a longitudinal COVID-19 cohort study, AIDs were found to be significantly associated with the risk of Long COVID (Jacobs et al., 2023), which aligns with our findings.As far as we know, this is the first research focused on the causality between AIDs and Long COVID utilizing MR analysis and MR-BMA.
The presence of autoAbs is a common feature of AIDs (Damoiseaux et al., 2015).On one hand, during COVID-19 infection autoAbs may generate and cause AIDs (Liu et al., 2021;Sacchi et al., 2021).On the other hand, autoAbs can be physiological, as a transient tool to eliminate degraded self-and foreign antigen which causes no clinical disease (Avrameas and Selmi, 2013;Panda and Ding, 2015).The presence of autoAbs in COVID-19 patients correlates with increased antiviral humoral immune responses and inflammatory immune signature (Taeschler et al., 2022).Previous research shows that diverse functional autoAbs across a wide range of tissues and immunological and physiological functions were identified and validated in COVID-19 patients and that some of these autoAbs probably predated infection, whereas others were induced after infection (Davis et al., 2023).It has also shown the persistence of autoAbs in post-COVID individuals (Davis et al., 2023;Sin, 2023).There is a study demonstrating that Associations between genetically predicted AIDs and Long COVID with SNPs.autoAbs have an anti-correlation with anti-SARS-CoV-2 antibodies and are associated with distinct patterns of Long COVID (Su et al., 2022).In conclusion, the previous works mentioned above demonstrate that autoAbs generated before or after the infection of COVID-19, some of which are associated with AIDs, may lead to severe COVID-19 and Long COVID, suggesting a possibility that autoAbs associated with AIDs contribute to Long COVID.Other factors such as immune cells and cytokines may also be crucial parts of the relationship between AIDs and Long COVID.study of longitudinal clinical and autoantibody analyses found that a proinflammatory state was observed in patients with Long COVID featured by up or downregulation of various immune cells and cytokines, indicating a need to evaluate the role of immunomodulation in the treatment of Long COVID (Acosta-Ampudia et al., 2022).However, in general, research on whether autoAbs, immune cells and cytokines lead to Long COVID is still rarely seen.Until now researchers mainly focused on whether and how COVID-19 induces the production of autoAbs and the occurrence of AIDs but not on a reverse relation.
IBDs including CD and UC are AIDs.UC is limited to the colon and featured by mucosal inflammation.By contrast, CD, commonly associated with complications such as abscesses, fistulas and strictures, can cause transmural inflammation and affect any part of the gastrointestinal tract (Zhang and Li, 2014).AutoAbs such as antineutrophil cytoplasmic antibody (ANCA), anti-Saccharomyces cerevisiae antibody (ASCA), anti-intestinal goblet antibody (GBA) and anti-pancreatic antibody (PAB) exist in the serum of IBD patients (Fiocchi, 1998).ASCA is mainly seen in CD patients (Forcione et al., 2004), while ANCA is more common in UC patients and is used to distinguish UC and CD (Klebl et al., 2003).UC (Aydın and Taşdemir, 2021;Cayley, 2022;Xia et al., 2023) and CD (Senthamizhselvan et al., 2021;Tursi and Nenna, 2022) after COVID-19 have been reported.However, there is no other research indicating that UC, CD or ANCA, ASCA, GBA and PAB may serve as risk factors for Long COVID up to date.Immune cells and cytokines are also part of the relationship between COVID-19 and IBDs.For instance, a study found that interferon-γ (INF-γ) was the dominant driver of the expression of signaling lymphocytic activation molecule family member 7 (SLAMF7, CD319), whose engagement drove a strong wave of inflammatory cytokine expression in macrophages, and that SLAMF7-induced gene programs existed in gut macrophages from patients with active CD and in lung macrophages from patients with severe COVID-19 (Simmons et al., 2022).A cohort study found a significantly higher risk of COVID-19 related hospitalization and admission to intensive care in patients with CD and UC compared with background population with COVID-19 (Attauabi et al., 2022).A two-sample MR study found no evidence to support that IBDs increased the risk of COVID-19 susceptibility or severity (Ai and Yang, 2023).To our knowledge, our study is the first MR study working on the relationship between IBDs and Long COVID.
A study shows that Angiotensin-Converting Enzyme 2 (ACE2) serves as a receptor of SARS-CoV-2 (Hoffmann et al., 2020).Given that enhanced expressions of ACE2 in ileal and colonic tissue were observed in IBD (including CD and UC) (Garg et al., 2020), it is reasonable to draw a possible conclusion that IBD patients may suffer a higher risk of Long COVID mediated by ACE2.
IL-17, mainly produced by Th17 cells, plays an important role in IBD, specifically by affecting epithelial barrier function and innate immune response, promoting neutrophil recruitment and activation, and inducing the production of other pro-inflammatory cytokines and chemokines, thereby exacerbating intestinal inflammation (Moschen et al., 2019).Higher expression levels of IL-17 were observed in intestinal mucosa and peripheral blood of IBD patients compared with those of healthy controls (Sugihara et al., 2010;Song et al., 2013).As a large amount of IL-17 can be produced in cytokine storms induced by COVID-19 infection, some researchers regard that IL-17-dependent pathways may be common molecular mechanism between IBD and COVID-19 infection (Tao et al., 2022).
Some studies focused on the relationship between AIDs and COVID-19.However, a conclusion has not been reached.For instance, a study conducted in Hubei, China, indicated that patients with AIDs might be more susceptible to COVID-19 infection (Zhong et al., 2020).However, a study in northeast Italy suggested that in comparison with the general population, AIDs patients had a similar COVID-19 infection rate (Zen et al., 2020).A study in the United Kingdom showed that patients with severe AIDs had prominently higher mortality following COVID-19, but no differences were identified in the clinical endpoints among individuals with AIDs compared with those without AIDs who were hospitalized for COVID-19 during the initial surge of the pandemic (Arachchillage et al., 2022).
There are several advantages in our study based on GWAS summary-level data on a large scale.First, MR analysis overcame the issues of detecting confounders and reverse causality in typical observational and longitudinal research by generating practically unbiased causal estimates (Hemani et al., 2018).Second, the use of sensitivity analyses enabled us to further interrogate MR assumptions, which ensured notable changes in results could be found.Furthermore, the MR-BMA approach was used for identifying major causal AID while controlling for pleiotropy, which helped to evade the restrictions of linear regression-based MR approaches and strengthened our results' robustness (Zuber et al., 2020).Effect estimates were given by MR-BMA biased towards the Null when a causal effect exists.However, MR-BMA trades bias for lessened variance, which means it can better and more stably detect true causal risk factors than either the conventional IVW method or other variable selection methods (Zuber et al., 2020).
There are limitations to consider in our MR study.First, the summary-level GWAS data utilized in this research is not stratified according to gender, age and disease severity, thus it was impossible to conduct a stratified analysis.Summary-level data also limits access to individual patient data which might provide richer insights.Second, our study heavily relies on data from specific populations (FinnGen for AIDs and the most recent GWAS for Long COVID), which may limit the external validity of the findings to other populations.Third, since the number of cases in the GWAS was relatively small, the analysis for Long COVID might be underpowered, leading to missing true associations or overestimating weak effects.IBD and CD have only suggestive significant causal effects on Long COVID.Although some sources of error in conventional observational cohort studies can be avoided, MR studies are affected by varying levels of influence from horizontal pleiotropy (Verbanck et al., 2018) and bias including measurement bias (Pierce and VanderWeele, 2012), weak instrumental variable bias (Burgess et al., 2011) and selection bias (Efron, 2011).The results need further validation in larger studies.Fourth, the analysis suggested that AIDs might increase the risk of Long COVID, but it is also possible that Long COVID could trigger or worsen AIDs.We were unable to explore the reverse causality (Long COVID triggering AIDs) in detail for lack of IVs.Also, the presence of autoAbs and immune response might be potential mechanisms linking AIDs to Long COVID as discussed above, with immune cells and cytokines likely playing a role in this relationship.Our study cannot definitively rule out these possibilities.
In conclusion, our MR study provides evidence of the causal effects of IBD, CD and UC respectively on Long COVID, which might be mediated by autoAbs, immune cells and cytokines.These results may shed fresh light on the bidirectional causal relationships between AIDs and Long COVID, especially attracting researchers' and clinicians' attention to the causal effect of AIDs on Long COVID but not only the reverse causal relation.

FIGURE 3
FIGURE 3 Diagnostic plots for outliers for the top three MR-BMA models.Any genetic variant with a Q-statistic>10 is marked with the name of the variant.(A) UC, (B) IBD, (C) CD.

FIGURE 4
FIGURE 4 Diagnostic plots for influential genetic variants for the top three MR-BMA models.Predicted associations (x-axis) are plotted against observed associations (y-axis) for Long COVID.Any genetic variant with a Cook's distance >0.04 (3/69) is marked with the name of the variant.(A) UC, (B) IBD, (C) CD.

TABLE 1
Summary statistics for AIDs.

TABLE 2
AIDs ranked by their MIP.

TABLE 3
Best individual AID models based on their PP.