Investigating the Causal Effect of Brain Expression of CCL2, NFKB1, MAPK14, TNFRSF1A, CXCL10 Genes on Multiple Sclerosis: A Two-Sample Mendelian Randomization Approach

Multiple Sclerosis (MS) exhibits considerable heterogeneity in phenotypic expression, course, prognosis and response to therapy. This suggests this disease involves multiple, as yet poorly understood, causal mechanisms. In this work we assessed the possible causal link between gene expression level of five selected genes related to the pro-inflammatory NF-κB signaling pathway (i.e., CCL2, NFKB1, MAPK14, TNFRSF1A, CXCL10) in ten different brain tissues (i.e., cerebellum, frontal cortex, hippocampus, medulla, occipital cortex, putamen, substantia nigra, thalamus, temporal cortex and intralobular white matter) and MS. We adopted a two-stage Mendelian Randomization (MR) approach for the estimation of the causal effects of interest, based on summary-level data from 20 multiplex Sardinian families and data provided by the United Kingdom Brain Expression Consortium (UKBEC). Through Radial-MR and Cochrane’s Q statistics we identified and removed genetic variants which are most likely to be invalid instruments. To estimate the total causal effect, univariable MR was carried out separately for each gene and brain region. We used Inverse-Variance Weighted estimator (IVW) as main analysis and MR-Egger Regression estimator (MR-ER) and Weighted Median Estimator (WME) as sensitivity analysis. As these genes belong to the same pathway and thus they can be closely related, we also estimated their direct causal effects by applying IVW and MR-ER within a multivariable MR (MVMR) approach using set of genetic instruments specific and common (composite) to each multiple exposures represented by the expression of the candidate genes. Univariate MR analysis showed a significant positive total causal effect for CCL2 and NFKB1 respectively in medulla and cerebellum. MVMR showed a direct positive causal effect for NFKB1 and TNFRSF1A, and a direct negative causal effect for CCL2 in cerebellum; while in medulla we observed a direct positive causal effect for CCL2. Since in general we observed a different magnitude for the gene specific causal effect we hypothesize that in cerebellum and medulla the effect of each gene expression is direct but also mediated by the others. These results confirm the importance of the involvement of NF-κB signaling pathway in brain tissue for the development of the disease and improve our understanding in the pathogenesis of MS.

Multiple Sclerosis (MS) exhibits considerable heterogeneity in phenotypic expression, course, prognosis and response to therapy. This suggests this disease involves multiple, as yet poorly understood, causal mechanisms. In this work we assessed the possible causal link between gene expression level of five selected genes related to the pro-inflammatory NF-κB signaling pathway (i.e., CCL2, NFKB1, MAPK14, TNFRSF1A, CXCL10) in ten different brain tissues (i.e., cerebellum, frontal cortex, hippocampus, medulla, occipital cortex, putamen, substantia nigra, thalamus, temporal cortex and intralobular white matter) and MS. We adopted a two-stage Mendelian Randomization (MR) approach for the estimation of the causal effects of interest, based on summary-level data from 20 multiplex Sardinian families and data provided by the United Kingdom Brain Expression Consortium (UKBEC). Through Radial-MR and Cochrane's Q statistics we identified and removed genetic variants which are most likely to be invalid instruments. To estimate the total causal effect, univariable MR was carried out separately for each gene and brain region. We used Inverse-Variance Weighted estimator (IVW) as main analysis and MR-Egger Regression estimator (MR-ER) and Weighted Median Estimator (WME) as sensitivity analysis. As these genes belong to the same pathway and thus they can be closely related, we also estimated their direct causal effects by applying IVW and MR-ER within a multivariable MR (MVMR) approach using set of genetic instruments specific and common (composite) to each multiple exposures represented by the expression of the candidate genes. Univariate MR analysis showed a significant positive total causal effect for CCL2 and NFKB1 respectively in medulla and cerebellum. MVMR showed a direct positive causal effect for NFKB1 and TNFRSF1A, and a direct negative causal effect for CCL2 in cerebellum; while in medulla we observed

INTRODUCTION
Multiple Sclerosis (MS) is a multifactorial disease with progressive neurodegeneration characterized by chronic inflammation and demyelination in the Central Nervous System (CNS) (Nylander and Hafler, 2012).
Many molecular events contribute to MS susceptibility and all these events are widely distributed across the many different cellular components of both the innate and adaptive immune system. Oligodendrocytes, the myelinating cells of the CNS, and neurons are mainly targeted by this wave of inflammation, that leads to cell death, that is closely associated with the manifestation of clinical symptoms (Mc Guire et al., 2013;Ghasemi et al., 2017). MS pathogenesis has both genetic and environmental factors. Research on genetic susceptibility to MS has been fueled by recent genome-wide association studies (GWASs), fine-mapping and meta-analysis (Hafler et al., 2007;International Multiple Sclerosis Genetics, 2011;International Multiple Sclerosis Genetics Consortium, 2019;Patsopoulos et al., 2011Patsopoulos et al., , 2013; International Multiple Sclerosis Genetics Consortium, Beecham et al., 2013;Andlauer et al., 2016;George et al., 2016). These studies have also highlighted loci never previously implicated in MS, which represents an opportunity to generate novel insight into the biological pathways involved in the disease. Most of the GWAS-highlighted loci that appear to operate via gene expression regulation, rather than protein coding, spurred the researchers to perform expression quantitative trait loci (eQTL) analyses that highlighted genes whose expression in specific tissues is regulated by loci associated with the disease, thus showing the potential functional consequences of certain MS associated variant (Edwards et al., 2013;Gibson et al., 2015).
The fact that other GWAS signals fall within specific signaling cascades, suggests that the understanding of how single variants with small odds ratio act in disease susceptibility could lie in the alteration in pathways, rather than in individual genes (Housley et al., 2015). More specifically, a GWAS conducted by International Multiple Sclerosis Genetic Consortium (IMSGC) in 2013 (International Multiple Sclerosis Genetics Consortium, Beecham et al., 2013) reported the nuclear factor kappalight-chain-enhancer of activated B cells (NF-κB) pathway as significant in MS pathology.
NF-κB acts on many immune cells, and its constitutive activation leads to an increase inflammation in inflammatory and autoimmune diseases, such as MS (Yan and Greer, 2008). It is crucial for B and T lymphocytes' development and proliferation, for production of pro-inflammatory cytokines, inflammatory mediators by dendritic cells, and neurotoxic mediators in microglia and astrocytes (Leibowitz and Yan, 2016). Many studies have reported the activation of NF-κB in the brain tissue of MS patients (Gveric et al., 1998;Bonetti et al., 1999;Lock et al., 2002). Approximately 20% of genome-wide MS susceptibility variants fall within and/or proximal to NF-κB signaling genes, including NFKB1 or p105/p50 and TNFRSF1A (TNFR1), which cause decreased expression of the negative regulators of NF-κB (De Jager et al., 2009;Housley et al., 2015). As shown in a GWAS noise reduction (GWAS-NR), an approach to detect novel associations beyond those detected by traditional GWAS, variants regulating the activation and proliferation of immune effector cells, comprising key regulators of NF-κB signaling, are involved in the genetic susceptibility to MS (Hussman et al., 2016).
The available biological evidence led us to here focus on a priori selected set of genes, CCL2, NFKB1, MAPK14, TNFRSF1A, CXCL10, on the basis of their involvement in MS risk and in NF-κB signaling pathway. Should we conclude that these genes cause the disease through changes in their expression? To assess the possible causal links between their expression levels in ten different brain tissues [i.e., cerebellum (CRBL), frontal cortex (FCTX), hippocampus (HIPP), medulla (MEDU), occipital cortex (OCTX), putamen (PUTM), substantia nigra (SNIG), thalamus (THAL), temporal cortex (TCTX) and intralobular white matter (WHMT)] and MS we adopted a twosample Mendelian Randomization (MR) approach. MR uses measured variation in DNA sequence, Z, to assess the possible causal effect of a biological phenotype, X, on a response variable, Y (Davey Smith and Ebrahim, 2003;Davey Smith and Hemani, 2014;Burgess et al., 2017), without requiring any experimental intervention on X to assess causality. The variable Z is said, in this case, to be an instrument for the estimation of the effect of interest. Ideally, to be a valid instrumental variable (IV) the MR approach requires the instrument, Z, to satisfy three validity conditions, precisely to be: (i.) associated (not necessarily in a causal way) with X; (ii.) independent of Y conditional on X and on the confounders (U) of the relationship between X and Y (no pleiotropy); and (iii.) furthermore independent of those confounders.
So far MR have been mainly used to investigate the causal effect of a high-level phenotype (e.g., X = body mass) on a specific medical outcome. Here we shift the causal exploration to the level of individual genes, which we do by letting X represent the level of expression in brain tissues of a specific gene and by letting the instrument Z consists of a collection of Single Nucleotide Polymorphisms (SNPs) that regulate the expression of that gene. In this way, thanks to MR, we may use the information provided by genotyping to address the question whether changes in the level of expression of the gene of interest represent causal influences on the disease, in the hope that this will then point to biological pathways of pathogenic relevance.
In this study the associations between the genetic variants and the disease and between the same genetic variants and the exposure were estimated respectively in two independent and non-overlapping samples: Sardinian multiplex families (Fazia et al., 2017) and data provided by the United Kingdom Brain Expression Consortium (UKBEC) (Trabzuni et al., 2011;Ramasamy et al., 2014).
In this application we are also interested to estimate the direct effect of multiple gene expressions belonging to the same NF-κB signaling pathway because it is plausible to believe that they are closely related. For the sake of argument, if we consider the expression of NFKB1 as the exposure of major interest, we may want also to investigate if, beyond its direct effect, the effect of the genes belonging to the same pathway may mediate the relationship between NFKB1 expression and disease. In other words, we want to estimate the direct effect of NFKB1. For this reason, in addition to the estimation of total causal effect via univariable MR, we also applied a multivariable MR (MVMR) in which a set of genetic instruments is used to predict a set of exposure variables, which are the expression levels in the different brain regions of the genes belonging to the same pathway, for the estimation of their direct effect. The same argument applies to all the genes in the pathway.
Thus, our aim here is to improve our understanding of the pathogenesis of MS, which is yet to be clarified, pointing to important candidate genes, in specific brain tissues, to be prioritized for further studies and potentially drug discovery.

Data Sources
The required instrument-outcome (i-o) associations and instrument-exposure (i-e) associations were provided by two separate and independent datasets.
(i) Dataset 1 (Sardinia data): i-o summary statistics of association (regression coefficients and standard errors) obtained in an our previous study in which we analyzed Immunochip genotyping data from 15 multiplex Sardinian families (75 affected and 254 unaffected members), plus 94 unrelated healthy subjects from the same population using generalized estimating equation to allow for nonindependence between members in the same family (Fazia et al., 2017). (ii) Dataset 2 (UKBEC transcriptome data): Immunochip genotyping and gene expression measurements made in a collection of autoptic tissue from ten brain regions (i.e., CRBL, FCTX, HIPP, MEDU, OCTX, PUTM, SNIG, THAL, TCTX and WHMT) of a sample of 134 confirmed control individuals free of any neuropathological disease of European descent. These data were provided by the UKBEC http://www.braineac.org (Trabzuni et al., 2011;Ramasamy et al., 2014). We obtained from these data the required i-e association statistics (regression coefficients and standard errors).

Gene/Instruments Selection
We applied MR method to a restricted selected a priori set of genes (CCL2, NFKB1, MAPK14, TNFRSF1A, CXCL10) identified as plausible MS risk genes on the basis of previous, independent studies and of their involvement on NF-κB signaling pathway.

CCL2
C-C chemokine ligand 2 (CCL2 or MCP-1) is a pro-inflammatory chemokine that, through an interaction with its receptor CCR2, attracts dendritic cells, monocytes, T cells, and natural killer cells at inflammatory sites (Sorensen et al., 2004;Deshmane et al., 2009). CCL2 expression is largely dependent on NF-κB signaling pathway (Hayden and Ghosh, 2012;Nakatsumi et al., 2017;Wang et al., 2018). CCL2 mRNA was found significantly increased in demyelinated MS hippocampus with discrepancies in the spatial and quantitative distribution with respect to gray matter and white matter lesions (Prins et al., 2014). Furthermore, CCL2-induced monocyte migration results in blood brain barrier breakdown through the downregulation of endothelial tight junction proteins (dos Santos et al., 2005).

MAPK14
MAPK14 (p38α) belongs to mitogen-activated protein kinase (MAPK) family. Signaling via the MAPK14 pathway is important in the regulation of inflammatory response in multiple cell type and the production of specific cytokines and chemokines (Lo et al., 2015). Under CNS inflammatory condition, p38α-deficient mice show a reduced reactivity of astrogliosis and impairment in the formation of astroglial barrier, thus revealing the importance of p38α signaling in maintaining the barrier of CNS microcirculation (Lo et al., 2015). p38a signaling in astrocytes critically regulates specific subsets of cytokines (TNFα, IL-6) and chemokines (CXCL1, CXCL2, CXCL10, CCL2, CCL4). The result is that without the astroglial barrier an elevated number of macrophages/microglia in the CNS contributes to an increase in the upregulation of chemokines and cytokines leading to an uncontrolled inflammation.

NFKB1
NFKB1 encodes the p50 subunit of NF-κB transcriptional complex. Several studies have found genetic variants within or near this gene to be associated with MS (International Multiple Sclerosis Genetics, 2011; International Multiple Sclerosis Genetics Consortium, Beecham et al., 2013;Hussman et al., 2016). Two in vivo studies showed that NFKB1-deficient mice are significantly resistant to EAE (Hilliard et al., 1999;Leibowitz and Yan, 2016), while mice knockout for the inhibitor of p50, IκBα, are characterized by the constitutive activation of NF-κB in microglia/macrophages during EAE, developing an exacerbated EAE disease course with enhanced inflammatory infiltration and demyelination in the CNS (Yue et al., 2018).

TNFRSF1A
This gene encodes (TNF) receptor-1, an important player in MS susceptibility (Tienari and Hohlfeld, 2013). TNFRSF1A leads to the activation of different signaling pathways, such as NF-κB or MAPK pathways, two important pathways associated with MS susceptibility (Housley et al., 2015). GWAS have identified MS-associated variants within or proximal to TNFRSF1A (De Jager et al., 2009;International Multiple Sclerosis Genetics, 2011; International Multiple Sclerosis Genetics Consortium, Beecham et al., 2013;Andlauer et al., 2016;Hussman et al., 2016). The MR approach requires that each of the five selected genes fulfill two important conditions: (i). to have independent SNPs (r 2 < 0.20) in cis and in trans throughout the genome associated with their levels of expression (analyzing UKBEC transcriptome data), (ii). to have the same exposure-significant SNPs also genotyped in the Sardinia data and for which we have summary-level data of their association with MS.
We selected as IVs those variants whose regression coefficients of association with the gene expression level achieves a p-value < 5 × 10 −4 . The transcript-level expression profile used is estimated as the Windsorized mean of the exon-level probesets. The association test between each SNP and the transcript-level expression profile was performed for each of the ten available brain regions, by modeling the effect of genotype as additive linear. Because gene expression may be highly specific for a particular area of the brain, the importance of sampling different areas of the brain cannot be understated. In fact, in the analysis of each selected gene, we separately assessed the possible causal link between MS and changes in expression in each available different area of the brain. In each tissue, gene expression values were natural-log transformed and then standardized to have a more logical and useful interpretation of the causal effect, which represents the increase of the log-odds ratio due to an increase equal to 1 standard deviation (SD) of the log transformed gene expression.

Univariable Mendelian Randomization
We estimated the total causal effect, that is composed by the direct and indirect effects, i.e., mediated by the others, of each gene in each brain region, using three MR methods: the Inverse-Variance Weighted estimator (IVW), the MR-Egger Regression estimator (MR-ER) and the Weighted Median estimator (WME). We use IVW for main analysis and MR-ER and WME for sensitivity analysis.
Here below we discuss IVW in detail being the main analysis and leave the reader to look at the cited literature for MR-ER and WME (Egger et al., 1997;Han, 2008;Bowden et al., 2016).
In brief, each j-th instrument, or variant Z j , with j = 1, . . . J, contributes a separate IV estimate of the causal effect of X on Y, given by the ratioβ YZ j /β XZ j , whereβ YZ j is the estimated coefficient of Z j in a univariate linear regression of Y on that instrument (log-odds ratio if Y is binary) andβ XZ j is the estimated coefficient of X on the same instrument Z j (log-odds ratio if X is binary). The IVW estimator of the causal effect of X on Y combines the IV estimates from multiple independent instruments that are assumed to satisfy all the IV conditions we described in the introduction, via the formula: where σ Yj is the standard error of the gene-exposure association estimate for instrument j (Burgess et al., 2013;Bowden et al., 2016). If all genetics variants satisfy the IV assumptions, then the IVW estimate is a consistent estimate of the causal effect (e.g., it converges to the true value as the sample size increases), as it is a weighted mean of the individual ratio estimates. However, in the IVW method if only one genetic variant is not a valid IV, then the estimator is typically biased. Hence, two methods were used as a sensitivity analysis: MR-ER and WME. MR-ER allows all genetic variants to violate pleiotropy assumption, and the intercept from the MR-ER analysis can be interpreted as the average pleiotropic effect of genetic variants included in the analysis and a measure of potential bias affecting IVW estimate (Egger et al., 1997). However, this method requires all genetic variants to satisfy an alternative assumption (InSIDE assumption), valid only if pleiotropic effects are independently distributed from i-e associationβ XZ j . In contrast, WME can be obtained using inverse-variance weights in a weighted median function; it relaxes MR assumptions and gives consistent estimates assuming that IVs representing over 50% of the weight are non-pleiotropic (Han, 2008;Bowden et al., 2016). The assumptions underlying each method are summarized in Table 1.
To assess the validity of the IVs, Cochrane's Q test (heterogeneity test) was used to check the presence of heterogeneity, that indicates a possible violation of the IV assumptions (Greco et al., 2015). If the Cochrane's Q statistics is large (yielding correspondingly small p-values), the estimated causal effects of the exposure may vary across the population or between variants. To identify IVs which give the largest contribution to Cochran's Q statistic, IVW Radial-MR was performed. IVW Radial-MR weights causal estimates with the

IVW
All instruments must be mutually independent, non-pleiotropic, and satisfy conditions (i-ii) WME All instruments must be mutually independent and satisfy conditions (i-ii). The no-pleiotropy condition must be satisfied by a subset of instruments that accounts for at least 50% of the total weight (minor weight condition).

MR-ER
All instruments must be mutually independent and satisfy conditions (i-ii). InSIDE must hold.
square root of the actual weight each SNP receives in the IVW analysis (Bowden et al., 2018). Q statistic obtained for each IV represents the contribution Q j to the overall Q statistic. If the contribution was statistically significant at α = 0.05 then the IV was considered as highly potential invalid and thus removed, and the analysis repeated without the outlier to obtain a more reliable result.
To identify potentially invalid IVs, forest plots and scatter plots were used as visual tools: the forest plot shows the causal estimates (i.e., IVW and WME) expressed as log-odds ratio for each IVs with their 95% Confidence Interval (CI). The scatter plot showsβ XG j andβ YG j respectively on X-axis and Y-axis, with θ IVW , θ WME and θ MR−ER as slopes allowing to evaluate MR-Egger intercept.
For all the analyses, we have reported causal estimates for IVW, MR-ER, and WME. The effect size was calculated as the effect of a 1 SD change in natural-log-transformed gene expression level, since this metric is more interpretable than an arbitrary difference.

Multivariable Mendelian Randomization
In polygenic MR investigations, as ours, in which both cis-and trans-acting IVs are involved, while a high number of instruments results in an higher statistical power, the chance of including a pleiotropic variant is higher than using cis-variants from a single region (Brion et al., 2013). Pleiotropy is commonly classified as vertical and horizontal. In the vertical pleiotropy, genetic variants are associated with multiple exposures belonging to the same causal pathways, while in the horizontal pleiotropy, genetic variants are associated with multiple exposures belonging to different causal pathways. As discussed elsewhere pleiotropy may lead to biased estimate of causal effects (Bowden et al., 2015;.
Given our studied genes belong to the same pathway they might have few instruments affecting more than one exposure and the use of MVMR allows us to control for vertical pleiotropy.
Furthermore, in our application we are also interested in estimating, in addition to the total effect, the direct effect of each specific gene expression. This is motivated by their likely relatedness due to being in the same pathway and exhibiting correlation as shown in Supplementary Figure S1 and Supplementary Table S1. As a consequence, many of them may exert a causal effect on the outcome, one mediating the effect of the other(s). MVMR by jointly estimating the causal effects of all exposures on the outcome allows: (i.) to estimate the direct effect of an exposure (within a mediation scenario), (ii.) to mitigate a possible vertical pleiotropy bias, due to instruments affecting more than one exposure.
The assumptions underlying MVMR are quite similar to those of standard MR i.e., each variant: (i) is associated with one or more exposures, (ii) is not associated with any confounders, and (iii) affects the outcome only via one or more of the exposures included in the model Burgess et al., 2019). MVMR requires as instruments a composite set of SNPs, Z, associated with the exposures included in the model and that do not affect the outcome other than through these exposures. In our MVMR analysis we used as instruments the genetic instruments specific and common (composite) to the investigated exposures.
Within the MVMR scenario using summary-level data, IVW can be generalized by fitting the model with the intercept set to zero: Whereβ YZ j is the summary-level data of the association between each j genetic variant and the outcome,β 1,Z j is the genetic effect of Zj on X 1 ; while θ 1 is the direct causal effect of interest for X 1 . The MVMR Egger (MVMR-ER) uses the same regression model but allowing the intercept to be estimated. To assess instrument validity we used an adjusted version of the Cochran Q statistic (Q A ) proposed by Sanderson (Sanderson et al., 2019). Excessive heterogeneity in Q A brings assumptions ii) and iii.) (detailed in methods section) into doubt. To identify specific genetic variants as outliers we used each variant's contribution q, defined as the weighted squared difference between the observed and predicted association with the outcome, to the overall Q A (Zuber et al., 2020). If the q contribution was statistically significant different from zero (p-value < 0.05) then the genetic variant was considered as highly potential invalid instrument and thus removed, and the analysis conducted without that variant to obtain a more reliable result.
All the analyses were performed using TwoSampleMR (Yavorska and Burgess, 2017), RadialMR (Bowden et al., 2018) and MendelianRandomization  R packages. Benjamini-Hochberg correction (Yekutieli and Benjamini, 2001), fixing the False Discovery Rate (FDR) at α < 0.05 was used to account for multiple comparison. To further confirm our results, we used a bootstrap procedure for both univariable and multivariable analysis. Since we are using summary-data, to obtain MR univariable bootstrap confidence intervals we generated 1000 datasets, in which each i-e associationβ X,Z j has been sampled from a normal distribution with mean equal toβ X,Z j and standard deviation equal to se(β X,Zj ). In the multivariable scenario we generated 1000 datasets, in which each i-e associationβ X k ,Z j , for each of the k = 5 genes, has been sampled from a normal multivariable distribution with means equal to the vector of the 5 i-e associationsβ X k ,Z j and covariance matrix equal to the covariance matrix between gene expressions in that specific tissue. MR (and MVMR in the multivariable scenario) analysis was conducted for each of the 1000 datasets generated, so that 1000 IVW estimates were obtained. Using the percentile method, the confidence interval is given by 95% central in the distribution of bootstrap replications. That is, the plausible IVW estimates are those falling between the 2.5-th percentile and the 97.5-th percentile of the bootstrap distribution estimates. We considered as statistically significant those results reaching FDR adjusted p-value < 0.05 and whose bootstrap confidence intervals does not contain the estimate under the null.
Flowchart of the analysis is reported in Figure 1, while causal diagram for (A) univariable MR and (B) MVMR scenario are reported Figure 2.

Univariable MR
We analyzed five a priori selected genes (i.e., CCL2, NFKB1, MAPK14, TNFRSF1A, CXCL10) in ten different brain regions (i.e., CRBL, FCTX, HIPP, MEDU, OCTX, PUTM, SNIG, THAL, TCTX and WHMT). Table 2 reports for each gene in each brain tissue the number of IVs, the IVW, WME and MR-ER causal effect estimates, their corresponding standard error and p-value. Table 2 also reports in the last column the IVW estimate and its 95% confidence intervals obtained via bootstrap. In Supplementary Table S2 detailed information for the selected IVs in each brain tissue (e.g., summary statistics of the association between the instrument and the disease and between the instrument and gene expression, Q statistics, and f statistics) are reported.
An estimate was considered as statistically significant, if the FDR corrected p-value was < 0.05, and its 95% bootstrap confidence interval did not include the value of the estimate under the null. According to this criterion CCL2 gene in MEDU and NFKB1 gene in CRBL turned out statically significant. These results are discussed in detail in the following part of the text.
Frontiers in Bioengineering and Biotechnology | www.frontiersin.org FIGURE 2 | Directed acyclic graph illustrating (A) the univariable and (B) the multivariable Mendelian Randomization problem. In (A) the potential violation of IV assumption represented by the direct pleiotropic effect of Zj on Y is indicated by a dotted line. In (B) causal graph illustrating multivariable Mendelian randomization assumptions for a set of genetic variants Z, five risk factors X 1 , X 2 , X 3 , X 4 and X 5 , and the binary outcome Y. The direct pleiotropic effect of Z on Y is represented by a dotted line. The node W represents the possibility that each risk factor, X, could influence each other.
with exp (θ IVW ) leading to an OR = 1.31 (95% CI [1.12; 1.54]). This suggests that an increase of natural log-transformed CCL2 expression by 1 SD (0.15) in MEDU causes an increase in the odds to develop MS. Sensitivity analysis with WME led to an almost identical estimate (θ WME = 0.276) that resulted statistically significant different from zero (p-value < 0.039). MR-ER does not give enough evidence of a directional pleiotropy since the intercept estimate was not statistically significant being its value close to zero (α = −0.046, p-value > 0.05); while causal effect estimate was slightly lower, but still similar in magnitude (θ MR−ER = 0.217). Cochrane's Q test did not give evidence of heterogeneity between the instruments (Q = 7.90, df = 10, p-value = 0.64). Radial-MR did not show any instrument to statistically significant contribute to heterogeneity. Thus, these results are robust to the sensitivity analysis and hence give a hint of a potential total causal effect on MS of CCL2 expression in MEDU.
In Figure 3 the forest plot and the scatter plot of causal estimates of CCL2 expression in MEDU on MS are shown.

NFKB1 Expression in Cerebellum
MR analysis was initially carried out using 10 IVs. IVW estimate of NFKB1 causal effect wasθ IVW = 0.281 (95% CI [0.085;0.477]); sensitivity analysis with WME led to a higher estimate (θ WME = 0.417), while MR-ER gave a similar estimate, (θ MR−ER = 0.237) with no evidence of a directional pleiotropy [intercept estimate was not statistically significant (α = −0.027, p-value > 0.05)]. Radial-MR identified an instrument contributing to heterogeneity (i.e., rs6109595 with Q = 3.98, p-value = 0.046). This instrument appears to be potentially invalid, as also confirmed by the forest plot and the scatter plot, where the causal estimate of this SNP appears to   be far from the others, and to "move" the IVW estimate toward the null. Thus, we considered this instrument as pleiotropic and, given the statistical evidence by Radial-MR, and we removed it from the analysis. In Figure 4 the forest plot and the scatter plot of causal estimates of NFKB1 expression in CRBL on MS with the presence of the potentially invalid IV are reported. After the outlier's removal, IVW estimateθ IVW = 0.332 (95% CI [0.129;0.534]) and WME estimate (θ WME = 0.421) turned out more similar in magnitude. WME estimated obtained by including the outlier IV was very similar to that obtained by excluding it; this can be explained by the fact that WME relaxes (ii) and (iii) assumptions so that IVs representing more than 50% of the weight need to be valid IVs. Since the outlier SNP did not weight more than 50%, its removal did not affect the causal estimate. After outlier's removal, the MR-ER estimate (θ MR−ER = 0.452) turned out being very close to the WME estimate, thus indicating potential residual bias in IVW estimate due to presence of pleiotropic SNPs. The scatter plot clearly shows how the slopes of the WME and MR-ER are almost perfectly overlapping while IVW slope is slightly lower. Moreover, there is no evidence of directional pleiotropy (MR-ER intercept close to 0, α = −0.073, p-value > 0.05) or heterogeneity (Q statistic = 6.93, p-value = 0.54). Exp (θ IVW ) corresponding to an OR = 1.39 (95% [1.14;1.71]), suggests that an increase of natural log transformed NFKB1 expression by 1 SD (0.04) in CRBL causes an increased odds to develop MS. We conclude that these results show evidence of a potential total causal effect of NFKB1 expression in CRBL on MS, since all the estimators give very similar causal effect estimates.
Bootstrap IVW estimate was also equal to 0.312 with 95% CI [0.103;0.543]. Figure 5 reports the forest plot and scatter plot of the causal estimates of NFKB1 expression in CRBL on MS after removing the outlier. Table 3 reports the significant results for CCL2 and NFKB1, which turned out as statistically significant after multiple testing correction.

Multivariable MR
The fives selected genes (i.e., CCL2, NFKB1, MAPK14, TNFRSF1A, CXCL10) were subsequently analyzed jointly in a MVMR framework. Table 4 reports for each gene in each brain tissue the number of IVs, the MVMR-IVW and MVMR-ER direct causal effect estimates, their corresponding standard errors and unadjusted p-values; the last columns report the MVMR-IVW causal estimate and 95% confidence intervals obtained via bootstrap.
Significant positive direct causal effects, also according to bootstrap estimates and 95% CIs, were obtained in CRBL for NFKB1 (θ MVMR−IVW = 0.749, p-value = 0.004, OR = 2.12, 95% CI As regard MVMR-ER, there is no evidence of directional pleiotropy since the intercept is close to 0 and p-value is > 0.05. Heterogeneity tests do not show statistically significance, indicating that there is not variability in the genetic associations with the disease and likely there is no residual pleiotropy. In Supplementary Table S3 detailed information for the selected IVs is reported. Table 5 reports for CCL2 in MEDU and for NFKB1 in CRBL the gene expression mean level and the level of gene expression increased by 1 and 2 SD, that leads to an increased odds to develop MS. For CCL2 expression in MEDU the difference between total and direct effects is relatively moderate (31% vs 50%), while there is a big difference in NFKB1 expression (39% vs 111%). The direct effect of CCL2 and TNFRSF1A in CRBL is also reported: an increase by 1 SD in CCL2 gene expression in CRBL seems to cause a decreased odds in developing the disease by 41%, while an increase in TNFRSF1A causes an increased odds of 44%. These values have been calculated on log-scale and then reverted back to the original scale.

DISCUSSION
The genetic architecture of MS susceptibility has been further characterized in the recently published genomic map of MS comprising 32 loci in the MHC and 200 non-MHC autosomal loci. This study has shown that these loci exert their effect  in multiple different peripheral immune cells as well as in microglia, highlighting the importance of several cells of the peripheral and brain resident immune system. Thus the MS genetic risk is characterized as "an autoimmune inflammatory process that targets CNS and triggers a neurodegenerative cascade" (International Multiple Sclerosis Genetics Consortium, 2019). Large GWASs have identified a wealth of SNPs responsible of genetic susceptibility to complex diseases, such as MS, and many of these signals could operate via gene expression regulation, given the majority of them are located in non-coding genomic regions (Barr and Misener, 2016).
In the last years understanding how associated genetic variations contribute to the biological pathways involved in the progression and pathogenesis of diseases has become an achievable goal for genetic research given the availability of public gene expression database in different tissues and genotyping database, together with the development of appropriate statistical methods. More specifically, causal inference framework plays an important role in this respect since it allows to answer questions such as "does variation in the expression of a given gene influences the disease risk?" and not simply "the expression of a given gene influences the disease". The identification of a disease-associated SNP being at the same time an eQTL does not imply that the expression of the involved gene is causally related to the disease risk. In our strategy we applied MR approach to investigate the causal relationship between the expression of a given gene and MS.
In our work, in the effort to investigate and understand MS etiology, we focused on five genes related to the proinflammatory NF-κB signaling pathway (i.e., CCL2, NFKB1, MAPK14, TNFRSF1A, CXCL10), given the observation that this pathway is out of balance in MS and could represent a valid therapeutic target via inhibiting proper molecules implicated in it. NF-κB acts on many immune cells, and its constitutive activation leads to an increased inflammation in inflammatory and autoimmune diseases, such as MS (Yan and Greer, 2008). Furthermore, NF-κB emerged in the pathway enrichment for the prioritized genes of the 200 non-MHC loci in the last IMSGC paper (International Multiple Sclerosis Genetics Consortium, 2019).
We investigated the effect of each of these specific genes on MS susceptibility by performing a two-sample MR analysis, using summary-level information from two distinct datasets (i.e., Sardinia and UKBEC data). Sardinia data consists in the summary statistics of association between Immunochip SNPs and MS, while UKBEC data consists in transcript-level expression profile data on ten different brain tissues in addition to Immunochip data for 134 subjects free of neurological disorders. We performed both a univariable and a multivariable MR: the first approach was used to estimate the total causal effect of each exposure and the second to estimate the direct causal effect of each exposure by allowing multiple risk factors to be modeled at once, thus also allowing for measured pleiotropy.
In the univariable analysis two gene expressions showed significant results after multiple testing correction: CCL2 in MEDU and NFKB1 in CRBL tissue. These findings, also emerged as robust with sensitivity analysis and further confirmed using bootstrap procedure, are consistent with previous literature discussed below.
CCL2 is a chemokine involved in immunoregulatory and inflammatory processes, and its expression in astrocytes during EAE was found to be functionally significant, playing a role in the infiltration of macrophage and T cell in the CNS, and in the diffuse activation of astrocytes and microglia in both white and gray matter. Higher CCL2 expression correlates also with relapses, more severe EAE clinical scores, demyelination and axonal loss (Kim et al., 2014). A less severe disease course was observed after induction of EAE in mice with a conditional, astrocyte-specific gene deletion of CCL2 (Ponath et al., 2018b), while endothelial CCL2 knockout mice were observed to be resistant to EAE (Ge et al., 2009). Our analysis shows that an increased expression of CCL2 in MEDU is causally related with MS (OR = 1.31; 95% CI [1.12;1.54]), consistent with the above described results, and also ascribing a causative role to CCL2 expression, rather than a simple association with MS.
NFKB1 gene encodes p50 subunit of NF-κB transcriptional protein complex, an important complex involved in immune responses. NFKB1 variants were found to be associated in GWASs with MS, and, in particular, two variants were found to increase gene expression (Housley et al., 2015;Ponath et al., 2018a). Furthermore, NFKB1-deficient mice are significantly resistant to EAE (Hilliard et al., 1999;Leibowitz and Yan, 2016), and mice knockout for the inhibitor of p50, IκBα, are characterized by the constitutive activation of NF-κB in microglia/macrophages during EAE, developing an exacerbated EAE disease course with enhanced inflammatory infiltration and demyelination in the CNS (Yue et al., 2018). Our results are consistent with these biological evidences and attribute a causative role of increased NFKB1 expression in CRBL on MS (OR = 1.39; 95% CI [1.14;1.71]).
A possible explanation of both CCL2 and NFKB1 expression causative role could be an abnormal response of astrocytes to myelin damage. More specifically, NF-κB pathway is an active pathway in astrocytes during myelin repair (with CCL2 as one of the products of this pathway). Furthermore, during myelin debris damage, if astrocytes-mediated response leads to increased abnormal levels of p50, or elevated levels of CCL2, then a dysregulated immune response could start, leading to an increased lymphocytes recruitment and widespread inflammatory response which are well-known signs of MS.
In order to further investigate which exposure from our selected set of closely related candidate genes are causally related MS we further apply a MVMR approach. MVMR allows to estimate the direct casual effect of each exposure in turn, i.e., each gene expression level, on MS when all the other exposures in the model are held constant, analogously to a multivariable regression. Furthermore, by including multiple exposures into a single model MVMR allows genetic variants to have pleiotropic effects on the exposure, i.e., measured pleiotropy (Rees et al., 2017;Zuber et al., 2020). If mediating effects between the gene expression levels are present, this method allows to identify the gene whose expression level has the greatest direct effect on the disease.
In our study we investigated the causal effect of genes related, in some way, to the same signaling pathway for which a mediated effect is likely.
The MVMR analysis identified in the cerebellum region three genes with the largest direct effect within our five selected genes in the NF-κB signaling pathway: NKFB1, CCL2 and TNFRSF1A. As for these genes the direct effect in CRBL is different in magnitude from the total effect calculated in the univariable MR, we can hypothesized for these gene a mediation scenario in which the effect of NFKB1 is in some way mediated by the effect of the other genes in the pathway, with an estimated direct casual OR = 2.11 with 95% CI [1.27;3.51] (more than double in magnitude compared to the total effect), the effect of CCL2 is mediated by the effect of the other genes in the pathway, with a direct causal OR = 0.59 and 95% CI [0.42;0.83], and that the effect of TNFRSF1A is mediated by the effect of the other genes in the pathway, with a direct causal OR = 1.44 and 95% CI [1.08;1.92]. These results are also consistent with the biological evidence for which TNFRSF1A leads to the activation of different signaling pathways including NF-κB signaling pathway.
As regards MEDU, MVMR showed for CCL2 a significant direct causal OR equal to 1.50 with 95% CI [1.04;2.17] greater in magnitude with respect to total effect suggesting also in this case a possible mediating effect of the other genes in the pathway.
In summary, our study aims at drawing a causal inference on expression of genes belonging to the same pathway, relevant for MS, both in term of total and direct causal effect. Direct and indirect effect in term of OR for the 5 genes in the cerebellum and medulla are summarized in Table 6.
The strengths of our analysis are: (i.) this is a rare application in which 5 exposures, are studied simultaneously, (ii.) MR allows to overcome potential confounding and reverse causation that may bias estimates in observational studies; (iii.) using data from two non-overlapping datasets allowed to avoid winner's curse bias and bias toward the observational estimate, and to obtain more precise estimates than if individual-level data from one study were used. (iv) Finally, here we studied the causal effect of a lifelong exposure of gene expression levels with MS in the general European population, and, such an effect could not be studied with another study design including randomized controlled trial. However, this study has some limitations: (i.) sample size is relatively small so that the study is underpowered to detect small but interesting causal effects; (ii.) the possibility that residual pleiotropy could bias the estimates; even if the main findings turned out to be robust in the sensitivity analyses, thereby decreasing the probability of bias due to pleiotropy; (iii.) like in most MR studies, it was not possible to directly assess whether canalization (compensatory feedback interactions) may have influenced our results. However, since canalization assumes that other physiological mechanisms may attenuate the effect of gene expression, such feedback interactions would tend to bias results toward the null. As a matter of fact, this study has generated results that are very far from the null. (iv) Limitations due to the use of post-mortem tissue may result from cell damages and DNA/RNA degradation; (v.) another limitation of our study could concern the difference between UKBEC and Sardinian data: UKBEC study includes individuals of European descent, while Sardinian dataset includes individuals from the founder population of Sardinia. This potential problem could arise if IVs found in the European population are not valid IVs in the Sardinian population, or if causal SNPs in linkage disequilibrium (LD) with the IV in UKBEC are not in LD with the causal SNP also in Sardinia population. However, as in canalization bias, this limitation tends to bias results toward the null, thus our findings could be at most underestimated. (vi.) as to statistical methodology we haven't used novel methodology, but we applied the approaches suggested in literature to check the assumptions underlying an MR analysis and implemented a bootstrap approach, using a modified version of mr_rucker_bootstrap code, found in TwoSampleMR R package, where confidence intervals were estimated through the percentile method rather than assuming a Student's t-distribution for the bootstrap distribution; (vii.) we reckon that the current methodology allowed us just to draw only general conclusion about the possible mediation effect between the different genes in the pathway without going into depth about the underlying biological mechanism.
In conclusion, our study supports the evidence that NF-κB pathway, already extensively studied in literature, plays a role not only in the peripheral immune system, but also in brain cells and in a causative way. This study shows that as expected genes belonging to the same pathway can mediate the effect of each other. This study provides a rationale to further investigate which role could play CCL2 expression in MEDU, NFKB1, TNFRSF1A and CCL2 expression in CRBL in the pathogenesis of MS, and eventually to understand why these tissues, are more susceptible to increased expression of these genes also taking into account that the brain displays remarkable cellular heterogeneity even within distinct brain regions. It is worthwhile noting that the strongest effect both total and direct has been found in CRBL that is commonly affected by MS, and whose signs significantly contribute to MS symptoms clinical disability (Wilkins, 2017).

DATA AVAILABILITY STATEMENT
The summary-level data for this study are available in the Supplementary Material.

ETHICS STATEMENT
The study was approved by the ethics committee of the Azienda Sanitaria of Nuoro and was conducted in conformity with the 1954 Declaration of Helsinki in its currently applicable version and applicable Italian laws. All study participants gave written informed consent.

AUTHOR CONTRIBUTIONS
LB, CB, and TF conceived and supervised the study. TF, AN, and DG performed the statistical analysis and interpreted the results. AB, JM, and carried out the experiments. MP, AT, PB, and VS collected and stored samples and performed the clinical evaluation. TF, AN, and LB wrote the original draft. All authors participated in revising and editing the manuscript.