White Matter Integrity and Nicotine Dependence: Evaluating Vertical and Horizontal Pleiotropy

Tobacco smoking is an addictive behavior that supports nicotine dependence and is an independent risk factor for cancer and other illnesses. Its neurogenetic mechanisms are not fully understood but may act through alterations in the cerebral white matter (WM). We hypothesized that the vertical pleiotropic pathways, where genetic variants influence a trait that in turn influences another trait, link genetic factors, integrity of cerebral WM, and nicotine addiction. We tested this hypothesis using individual genetic factors, WM integrity measured by fractional anisotropy (FA), and nicotine dependence-related smoking phenotypes, including smoking status (SS) and cigarettes per day (CPDs), in a large epidemiological sample collected by the UK Biobank. We performed a genome-wide association study (GWAS) to identify previously reported loci associated with smoking behavior. Smoking was found to be associated with reduced WM integrity in multiple brain regions. We then evaluated two competing vertical pathways: Genes → WM integrity → Smoking versus Genes → Smoking → WM integrity and a horizontal pleiotropy pathway where genetic factors independently affect both smoking and WM integrity. The causal pathway analysis identified 272 pleiotropic single-nucleotide polymorphisms (SNPs) whose effects on SS were mediated by FA, as well as 22 pleiotropic SNPs whose effects on FA were mediated by CPD. These SNPs were mainly located in important susceptibility genes for smoking-induced diseases NCAM1 and IREB2. Our findings revealed the role of cerebral WM in the maintenance of the complex addiction and provided potential genetic targets for future research in examining how changes in WM integrity contribute to the nicotine effects on the brain.


INTRODUCTION
Tobacco smoking is a complex addictive behavior and is the chief modifiable causal factor for cancer, coronary and cerebrovascular disorders, chronic obstructive pulmonary disorder, hypertension, and many other illnesses (McBride, 1992;Ziedonis et al., 2008). Additive genetic factors explain up to ∼75% of the variance on traits that quantify nicotine dependence and smoking-related behaviors in population studies, thus supporting the hypothesis of a strong genetic susceptibility for nicotine addiction (Kendler et al., 1999;Vink et al., 2005). Genome-wide association studies (GWASs) localized and replicated multiple genetic variants conferring susceptibility to smoking/nicotine addiction including those regulating nicotinic acetylcholine receptors (nAChRs) (True et al., 1999;Erzurumluoglu et al., 2019). However, its neurogenetic mechanisms remain unknown. Neuroimaging studies reported observational (Zhang et al., 2010;Gons et al., 2011;Gray et al., 2020) and direct associations (Kochunov et al., 2013) between smoking and nicotine administration and white matter (WM) integrity, assessed by the fractional anisotropy (FA) of water diffusion in diffusion tensor imaging (DTI) (Kochunov et al., 2007). Cerebral WM is hypothesized to be involved in both positive and negative reinforcement mechanisms of nicotine addiction (Benowitz, 2010). The positive effect of nicotine is associated with cognitive and mood enhancement, which could be driven through the transient elevation in WM integrity following nicotine administration (Perkins, 1999;Kochunov et al., 2013). The nAChRs have been found in the cerebral WM of both human and nonhuman primates in the previous PET ligand and histological binding studies (Fujita et al., 2003;Ding et al., 2004;Pimlott et al., 2004;Hillmer et al., 2011). Their functions have been studied in peripheral (Lang et al., 2003) and central (Zhang et al., 2011a) nervous systems. The negative effect of nicotine in avoiding withdrawal symptoms may also be caused by reduction in WM integrity due to cerebrovascular and neurodegenerative risks associated with heavy chronic smoking (Hudkins et al., 2010;Kim et al., 2010;Gons et al., 2011;Liao et al., 2011;Zhang et al., 2011a).
Vertical and horizontal pleiotropic pathways are chief causal mechanisms explaining the phenomenon of genetic variants affecting multiple traits (Tyler et al., 2009;Paaby and Rockman, 2013;Jordan et al., 2019). Vertical pleiotropy is observed when a trait influenced by genetic factors has, in turn, influenced another trait by acting as a mediator. Horizontal pleiotropy refers to two traits being independently influenced by the same genetic factors. In the first vertical pleiotropic pathway (Figure 1: model 1), we assume that genetic factors underpinning susceptibility to nicotine dependence do so through alterations in WM integrity. In the second vertical pleiotropic pathway (Figure 1: model 2), we alternatively hypothesize that genetic factors underpin nicotine addiction directly, and the reduction in WM integrity occurs due to a multitude of harmful effects of chronic or heavy smoking. Lastly, we also test horizontal pleiotropy (Figure 1: model 0) where genetic factors affect both WM integrity and nicotine addiction independently. The proposed models are complementary to the widely used Mendelian randomization (MR) models that estimate the causal influence between the traits by relaxing the conditional independence assumption in instrumental variable (IV) analysis (Davey Smith and Hemani, 2014;Hemani et al., 2018a,b). We then develop rigorous analytical approaches to determine and validate the best causal model for pleiotropic genetic variants associated with both WM integrity and nicotine addiction.
We analyzed two smoking phenotypes for their biological relevance to nicotine dependence, smoking status (SS) and cigarettes per day (CPD), to test the proposed causal pathways in large-scale epidemiological data from the UK Biobank (UKBB). We used FA to measure WM integrity. Smoking (being current smoker or having higher CPD) was found to be associated with reduced WM integrity in multiple brain regions. Our data showed that the genetic effects on FA and the two smoking phenotypes were not independent, so the horizontal pleiotropy does not hold. Since model 1 and model 2 with a vertical pleiotropic relationship are two mutually exclusive causal pathways, we performed mediation analysis and used Bayes factor to select the optimal model. We identified 272 pleiotropic variants associated with both SS and FA whose effects on SS were mediated by FA (model 1 preferred). On the contrary, 22 pleiotropic variants were found to be associated with both CPD and FA, where CPD acts as a mediator for the genetic effects on FA (model 2 preferred). The identified variants mainly reside in two genes NCAM1 and IREB2. Their relationship to the smokinginduced brain mechanisms will need to be further examined for their functionality in future studies.

UK Biobank Cohort
The data used to test our causal pathways are from the UKBB, a large prospective study that recruited ∼500,000 participants aged between 40 and 69 years in 2006-2010 in 22 assessment centers throughout the United Kingdom (Sudlow et al., 2015). UKBB data consist of phenotypic, genotypic, and imaging details about its participants collected from questionnaires, physical measures, multimodal imaging, genome-wide genotyping, and longitudinal follow-up for health-associated outcomes (Sudlow et al., 2015). In our analysis, UKBB data from all sites and all phases are included. UKBB carried out an automated processing pipeline for sample retrieval, data collection, and quality control to convert raw data to reliable processed data for use by all researchers (UK Biobank, 2007;Alfaro-Almagro et al., 2018;Bycroft et al., 2018). We restricted our analysis to only participants with white ethnicity backgrounds (British, Irish, and any other white background) and with both genotype and nicotine dependence phenotype data available. For causal pathway analysis, we further narrowed down to participants who have genotype, nicotine dependence, and WM integrity phenotype data available. The number of participants included at each analytic step is summarized in Supplementary Figure 1.

Nicotine Dependence-Related Smoking Phenotypes
Supplementary Table 1 summarizes the number of participants by smoking-related phenotype codes in UKBB. We chose to analyze the following two phenotypes due to their FIGURE 1 | Three competing vertical and horizontal pleiotropy pathways proposed to understand the causal relationship between genetics, white matter integrity, and nicotine dependence. Model 0 represents a horizontal pleiotropic relationship, while models 1 and 2 represent vertical pleiotropic relationships.
biological relevance to nicotine dependence (Donny et al., 2008;Benowitz, 2010): (1) Smoking Status (SS; current vs. never smokers). SS was a binary trait describing subjects who were either current smokers or never smokers (past smokers excluded). Current and never smokers were defined using phenotype code 20116 (smoking status) in UKBB.
(2) Cigarettes Per Day (CPD; average number of cigarettes smoked per day by participants who are either current or past smokers). CPD was a quantitative trait that measures the heaviness of smoking among ever smokers (never smokers excluded). It was defined using phenotype codes 2887 (number of cigarettes previously smoked daily), 3456 [number of cigarettes currently smoked daily), and 6183 (number of cigarettes previously smoked daily (current cigar/pipe smokers)] in UKBB. The CPD values of participants who smoked less than one CPD were recoded to 0; and CPD values of those who smoked more than 60 CPD were recoded to 60.
Note that SS and CPD are both related to nicotine dependence but target different smoker categories. SS focuses on the difference between current smokers and never smokers, while CPD focuses on the heaviness of smoking among current or past smokers.

White Matter Integrity Phenotype Measured by Fractional Anisotropy
The UKBB consists of multimodal braining imaging data covering structural, functional, and diffusion imaging (Miller et al., 2016). In this study, we concentrated on the WM FA measure derived from diffusion MRI data, a common measure of WM integrity whose association with smoking addiction behavior has been reported in previous studies (Gogliettino et al., 2016). The UKBB database provides 40 FA measures from multiple brain regions (full region names listed in Supplementary Table 2 The data were preprocessed using pipeline similar to that developed by Enhancing Neuro Imaging Genetics Meta Analysis (ENIGMA) consortium . Before conducting the causal pathway analysis for each FA, we further performed univariate association analysis by regressing FA measures on the two smoking phenotypes and kept only those FA measures with lower values among current smokers or negatively associated with CPD (β < 0, p < 0.05), motivated by the findings from previous studies (Savjani et al., 2014;Umene-Nakano et al., 2014;Gray et al., 2020).

Genotype Data and Genome-Wide Association Study (GWAS)
The genotype data of UKBB cohort came from two platforms, Affymetrix UK BiLEVE Axiom and UKBB Axiom R arrays, which captured over 90 million single-nucleotide variants (SNVs) of ∼500,000 subjects (Bycroft et al., 2017). We first removed variants with minor allele frequency (MAF) below 0.01, Hardy-Weinberg equilibrium p-value below 0.001, and missing genotype rate at 5% and removed individuals with more than 2% missing genotypes. We conducted principal component analysis (PCA) method to adjust for population stratification and chose the top 10 principle components (PCs) as recommended by PLINK (version 2.0) 1 (Chang et al., 2015) and in previous studies Kang et al., 2009;Price et al., 2010;Warren et al., 2017). We then performed GWAS separately on the two smoking phenotypes SS and CPD adjusting for age, gender, body mass index, genotyping chip type, and the acquired top 10 PCs using PLINK (version 1.9 2 ) (Chang et al., 2015). The most significantly associated loci (i.e., p < 5 × 10 −8 ) were used for the causal pathway analysis. Considering the strong linkage disequilibrium (LD) of neighboring single-nucleotide polymorphisms (SNPs), we also included SNPs in the genomic regions ± 250 kb around the peak boundaries ( Supplementary  Figures 2-7). GWAS was not performed on FA measures, which is underpowered due to the relatively small number of participants having neuroimaging data available. In the causal pathway analysis, we further selected pleiotropic SNPs that are associated with both smoking phenotypes and FA measures.

Causal Pathway Discovery
Denote by G the genotype, M the FA measures, Y the smoking phenotypes (SS or CPD), and Z the potential covariates. In this study, we included age and gender as covariates to be adjusted in the model. Given the directed graph structures in Figure 1, we can represent the three competing models by factorizing their joint distributions: Model 0 assumes genetics to be a common cause of both FA and smoking phenotypes independently ("SNP → Smoking, " "SNP → FA" and FA ⊥ Smoking given SNP and confounders) and represents a horizontal pleiotropic relationship. Model 1 and model 2 are two alternative mediation models that represent a vertical pleiotropic relationship. In model 1 ("SNP → FA → Smoking"), FA measures are regarded as the mediators that mediate the effect of SNPs on smoking. In contrast, model 2 ("SNP → Smoking → FA") considers the long-term effect of chronic smoking on the brain structure and regards smoking as the mediator that mediates the effect of SNPs on FA.
We performed automatic causal pathway discovery analysis (Heckerman et al., 1999;Spirtes and Zhang, 2016;Glymour et al., 2019) to identify the optimal pleiotropic pathway. Our analyses started by identifying the potentially pleiotropic variants of FA and smoking for each FA measure separately. We then evaluated the association between each FA measure and smoking given the SNP effects to distinguish horizontal pleiotropy from vertical pleiotropy. For variants with a vertical pleiotropic relationship, we further conducted causal mediation analysis to choose the best mediation model that explains the relationship between SNP, FA, and smoking. Below, we describe the analytical steps in details.
Step 1: Identification of Pleiotropic Variants Suppose we start with a set of g 0 SNPs gained by GWAS for n subjects. Let G ij denote the genotype of the ith subject in the jth SNP (1 ≤ i ≤ n, j ∈ g 0 ), M il denote the lth continuous FA measure (1 ≤ l ≤ 40) of ith subject, Y i denote the SS/CPD value, and Z i denote the covariates of the ith subject. We assume an additive genetic model, and let G ij = 0, 1, or, 2 represent the number of copies of minor alleles. In the first step, we look for SNPs that are associated with both FA measures and SS/CPD (i.e., potentially pleiotropic variants). Notably, this step is also a necessary condition to establish mediation for both model 1 and model 2, where the mediator and the outcome are simply switched in the two models (Judd and Kenny, 1981;James and Brett, 1984;Baron and Kenny, 1986). First, we fit a linear regression model on each SNP for each FA measure separately adjusting for the covariates Z to look for SNP-FA association. Then we fit a logistic regression or a linear regression model on each SNP for SS or CPD, adjusting for the covariates Z to look for SNP-smoking association: (1a) Regress M on G and Z: where α 1 and α 2 are the intercepts, γ 1 and γ 2 are the effects of covariates, and β 1j and β 2j correspond to the genetic effect of the jth SNP on M and Y (subscript l was omitted for α 1 and β 1j for simplicity). The cutoff for statistical significance is chosen to control for the overall false discovery rate (FDR < 0.15) in identifying the potentially pleiotropic variants in the shared subset that meets both SNP-FA and SNP-smoking association criteria (see Supplementary Methods). We then used the Functional Annotation of Variants-Online Resource (FAVOR)  to annotate the identified pleiotropic variants for more biological insights.
Step 2: Distinguish Horizontal From Vertical Pleiotropy Model 0 assumes a horizontal pleiotropic relationship, while models 1 and 2 assume a vertical pleiotropic relationship. The Frontiers in Neuroscience | www.frontiersin.org main feature of horizontal pleiotropy is that the two traits (smoking and FA) are independent given the SNP effect. In this step, we conducted association analysis between FA and SS/CPD conditioning on the SNP. If the conditional independence holds (conditional independence test p > 0.05), the variants demonstrate horizontal pleiotropy; otherwise, they demonstrate vertical pleiotropy.
Step 3: Selection of the Best Mediation Model for Vertical Pleiotropy Model 1 and model 2 are two mutually exclusive mediation models under the vertical pleiotropy assumption. Mediation analysis investigates how a third variable affects the relation between two other variables and is a useful tool in discovering the hidden mechanism in many biological fields (MacKinnon et al., 2007). We conducted exploratory mediation analysis and selected the best mediation model for each pleiotropic SNP using Bayes factor criteria (Kass and Raftery, 1995;Denison et al., 2002;Bernardo and Smith, 2009). We validated the selected model by checking the major causality assumptions and testing and categorizing the mediation effects.
Suppose that the set of SNPs that meet the criteria of vertical pleiotropy from step 1 and 2 is g 1 , to determine the best mediation model from the two candidates, and we regress the outcome on both exposure variable and mediator for model 1 (outcome = SS/CPD) and model 2 (outcome = FA), respectively, adjusting for covariates Z: (2a) Regress Y on G, M, and Z: (2b) Regress M on G, Y, and Z: where α 3 and α 4 are the intercepts; γ 3 and γ 4 are the effects of covariates; β 31j and β 41j represent the direct effects of SNPs on outcomes Y and M in models 1 and 2, respectively; and β 1j β 32 and β 2j β 42 represent the indirect effects of SNPs on outcome via the mediators M or Y in models 1 and 2, respectively.
To select the mediation model that best explains the causal relationship for each SNP j, we propose to use the penalized likelihood Bayesian information criterion (BIC) score as a model selection criterion (Schwarz, 1978). By definition, BIC = −2log(L) + plog (n), whereL denotes the maximized value of the likelihood, p is the number of parameters, and n is the sample size. Since the two mediation models have exactly the same number of parameters, we are directly comparing the maximum likelihoods of the two models. The maximum likelihoods of model 1 and model 2 can be derived from their joint distribution combining step 1 and step 2 (model 1: 1a+2a; model 2: 1b+2b), with the parameters evaluated at maximum likelihood estimation (MLE): The BIC-based model selection performs exploratory mediation analysis to determine the favored mediation model for each potentially causal variant. We then validated the mediation model selected by carefully checking the causal mediation assumptions (see Supplementary Methods). Once the model assumptions are checked, the causal mediation has been established. For model 1, β 31j represents the direct effect, β 1j β 32 represents the indirect/mediation effect, and β 2j represents the total effect. For model 2, β 41j represents the direct effect, β 2j β 42 represents the indirect/mediation effect, and β 1j represents the total effect. Zhao et al. (2010) classified mediation into three types according to significance and direction of the direct effect when mediation effect is significant: when the direct effect is also significant and has the same sign, the mediation is called a complementary mediation; if they point to the opposite directions, the mediation is called a competitive mediation; lastly, if the direct effect is not significant, the mediation is indirect-only mediation. We followed this classification to interpret the final mediation results for each variant.
All statistical analyses were conducted using R (R Core Team, 2020). An R package "mediation" (Tingley et al., 2014) was used for model checking in mediation analysis.

Genome-Wide Association Study and Selection of Smoking Associated Loci
Genome-wide association study were conducted separately for SS and CPD (N = 293,759 and 142,202, respectively). Numerous important smoking behavior associated loci previously reported were reproduced in our study (Gelernter et al., 2006;Thorgeirsson et al., 2008;Keskitalo et al., 2009;Bloom et al., 2014;Bidwell et al., 2015a;Erzurumluoglu et al., 2019;Xu et al., 2020) as highlighted in the circular Manhattan plots (Figure 2). Notably, the significant loci identified for each of the two traits have little overlap, implying the different genetic bases of the two nicotine dependence-related smoking phenotypes. The loci associated with SS are mainly located in regions on chromosomes 9, 10, and 11 marked by genes FAM163B, SARDH, CNNM2, and NCAM1 and non-coding RNA LOC101928847, while loci associated with CPD are located in regions on chromosomes 8, 15, and 19 marked by genes FIGURE 2 | A concentric circular Manhattan plot of the GWAS results for smoking status (SS) and cigarette per day (CPD) for chromosomes 1-22. Each dot represents an SNV; x-and y-axis refer to genomic locations and -log10(p-value). The SNVs with -log10(p-value) are larger than 12 for chr15 and chr19 of CPD, and larger than 20 for chr9 and chr11 of SS were not included in the plot. The three highest signals mapped on genes CHRNB3, CHRNA3, CHRNB4, IREB2, CYP2A6, and RAB4B of CPD and the four highest signals mapped on genes FAM163B, SARDH, CNNM2, and NCAM1 and lncRNA LOC101928847 of SS are labeled in the plot. GWAS, genome-wide association study; CPD, cigarette per day; SNV, single-nucleotide variant.
Next, we performed causal pathway analysis on the significantly associated loci (p < 5e-8) for SS and CPD. Considering the strong LD among nearby loci, we also included loci in the extended genomic regions by 250 kb both upstream and downstream of the peak regions (Supplementary Figures 2-7). This covers genomic regions including a total of 5,828 SNPs for SS and 4,420 SNPs for CPD ( Table 1). In the causal pathway analysis stage, we focused on participants who have genotype, FA measure, and smoking phenotype data available (N = 23,624 for SS and N = 8,830 for CPD) and further narrowed down the analysis to pleiotropic variants that were also associated with FA measures (see Methods of Causal pathway analysis Step 1).

Causal Pathway Analysis for Smoking Status
Univariate association analysis found 29 FA measures from various brain regions that show significantly lower FA values among current smokers than never smokers (β < 0 , p < 0.05; Supplementary Table 3), supporting the findings in previous literature (Savjani et al., 2014;Umene-Nakano et al., 2014;Gray et al., 2020). We proceeded with the causal pathway analysis for each of these 29 FA measures. Only the FA measure in the right tract of the ALIC (ALIC-R) had pleiotropic variants that passed the statistical significance thresholds (overall FDR < 0.15); thus, we continued with this FA measure only in the subsequent steps. We observed 272 SNPs having a pleiotropic effect on SS and ALIC-R (detailed annotation information from FAVOR is included in Supplementary Table 5). The significant association between ALIC-R and SS held given the genetic effects of any of the SNPs, implying a vertical pleiotropic relationship (Supplementary Table 4), so we no longer proceed with model 0. In comparing the two mediation models of vertical pleiotropy, model 1 where FA mediates the genetic effect on SS is favored. Key mediation assumptions were checked for the chosen model. A majority of those variants (244 out of 272) resided in gene NCAM1 (Supplementary Table 5). The role of NCAM1 in addiction has been found in recent studies (Bidwell et al., 2015b;Muskiewicz et al., 2018). SNPs, single-nucleotide polymorphisms; GWAS, genome-wide association study. Zhu et al. (2016) discussed about two possible explanations for an observed association between genotype and trait, including true causality or strong LD with true causal variants in the same locus. Here, we did not distinguish the 244 variants for true causality vs. linkage and treated all of them as candidate causal variants. Figure 3 shows how these SNPs impacted SS via ALIC-R. Current smokers carry a significantly larger number of minor allele copies of these SNPs than never smokers on average (β 41 > 0). The indirect effect of carrying more minor alleles via FA in ALIC-R is in the opposite direction of the direct effect (β 2β42 < 0), thus regarded as competitive mediation (see Supplementary Table 6 for a complete mediation analysis results).

Causal Pathway Discovery for Cigarette per Day
Univariate association found 29 FA measures that show significantly negative association with CPD (β <0 , p < 0.05; Supplementary Table 3). Among these 29 FA measures, only two FA measures in the ALIC-R and left tract of the PCR (PCR-L) had pleiotropic variants that passed the statistical significance thresholds (overall FDR < 0.15); thus, we continued with these two FA measures only in the subsequent steps. We observed 22 SNPs having a pleiotropic effect on CPD and FA measures ALIC-R and PCR-L (Supplementary Table 5). CPD and FA measures are dependent on each other given the genetic effects (Supplementary Table 4), implying a vertical pleiotropic relationship. Model 2 (CPD mediates the genetic effect on FA) was chosen as the best mediation model for these 22 SNPs, and the key mediation assumptions were checked for the chosen model. These variants are located in the exonic, intronic, and untranslated regions of gene IREB2 (Supplementary Table 5). Figure 4 shows how these SNPs impacted the two regional FA measures ALIC-R and PCR-L via CPD. The minor alleles of these SNPs appear to have a protective direct effect on brain structure (β 41 > 0), while exerting an adverse effect associating with higher CPD (β 2β42 < 0), regarded as competitive mediation (see Supplementary Table 6 for a complete mediation analysis results).

DISCUSSION
In this study, we used novel causative imaging genetics analyses to test neurogenetic mechanisms of nicotine dependence through altered WM integrity. We hypothesized and tested three pleiotropic models to explain the complex causal relationship among genetics, WM integrity, and nicotine dependence. Our GWAS on SS and CPD identified two different sets of associated genetic variants including many reported to be related to smoking in previous large-scale GWASs or meta-analyses (Hardin et al., 2012;Wain et al., 2015;Zeng et al., 2020). We also found smoking (being current smoker or having higher CPD) to be associated with lower WM integrity measured by FA in multiple brain regions. The causal pathway analysis identified 272 pleiotropic SNPs associated with FA in ALIC(R) and SS, and 22 pleiotropic SNPs associated with FA in ALIC(R) and PCR(L) and CPD. These SNPs are mainly located in genes NCAM1 and IREB2. NCAM1 was found to influence risk of nicotine addiction (Gelernter et al., 2006;Muskiewicz et al., 2018). IREB2, which regulates iron mechanism in the cell, was a susceptibility gene for both neurodegeneration and smoking-induced diseases (DeMeo et al., 2009;Cooper et al., 2019). Interestingly, these two sets of SNPs favored different vertical pleiotropic pathways: Gene → FA → SS vs. Gene → CPD → FA. The basic genetic components of addiction might have produced a pattern change in WM among smokers, reinforcing the addiction behavior. Chronic severe smoking (reflected in, e.g., CPD) will have negative impact to overall health, which in turn reduces the WM integrity. We have used an imaging genetics approach to test the neurogenetic mechanisms. Traditional imaging genetics treat neuroimaging as the mediator of the genetic effect on behavior through a unidirectional model (Meyer-Lindenberg and Weinberger, 2006;Bogdan et al., 2017). We use novel causal mediation models to evaluate vertical and horizontal pleiotropy pathways, SNP → behavior → brain vs. SNP → brain → behavior vs. SNP → behavior and SNP → brain, in understanding the neurogenetic mechanism of nicotine addiction behavior. Constraint-based methods and score-based methods are two main categories of conventional causal discovery methods (Glymour et al., 2019). Constraint-based methods identified causal links by conducting conditional independence tests, while score-based methods selected the model with the optimal score from multiple candidate causal models (Spirtes and Zhang, 2016). Here, we proposed to perform conditional independence test to distinguish vertical pleiotropy from horizontal pleiotropy, and we used a BIC score-based method to select the optimal model with the larger score from the two competing vertical pleiotropic models represented by directed acyclic graph (DAG). Such hybrid approach exploited principled ways to combine advantages of both methods and was a computationally efficient strategy to learn the causal structure in a wide range of real-life applications (Wong et al., 2002;Tsamardinos et al., 2006;Scutari et al., 2018). In our data, such an approach indeed resulted in a DAG with higher likelihood values.
Vertical pleiotropy and horizontal pleiotropy are two competing types of pleiotropy in complex polygenic traits (Paaby and Rockman, 2013). Conventional MR framework treats pleiotropic genetic factors as IV to elucidate how one phenotype (modifiable exposure) causally relates to another phenotype (the outcome) (Davey Smith and Hemani, 2014), but it only considers the vertical pleiotropy with no direct causal link between SNP and outcome (known as exclusion restriction assumption) (Hemani et al., 2018a). We developed a model that evaluates both vertical and horizontal pleiotropy pathways simultaneously and used a rigorous likelihoodbased approach to determine the optimal model. We found a significant direct effect of SNPs on the outcomes using our data, which violates the exclusion restriction assumption (Davey Smith and Hemani, 2014). Such extension from the MR framework to tolerate SNPs violating the exclusion restriction assumption of IV analysis has also been seen in recent genetic literature (Bowden et al., 2015;Verbanck et al., 2018). In addition, to establish causality, we also carefully checked the main causal mediation model assumptions of the optimal model. The biophysical mechanisms that link smoking and lower WM integrity are unknown. Most studies in chronic and heavy smokers reported reduced FA values when compared with those in nonsmokers (Swan and Lessov-Schlaggar, 2007;Hudkins et al., 2010;Kim et al., 2010;Cullen et al., 2011;Gons et al., 2011;Liao et al., 2011;Zhang et al., 2011a,b). Absolute WM FA values are sensitive to many parameters including myelin content, intravoxel axonal crossing, and axonal fiber density and diameter (Beaulieu, 2002). However, long-term changes in regional FA values were shown to be mainly (r > 0.8) driven by changes in regional cerebral myelin concentrations and myelin packing (Song et al., 2003(Song et al., , 2005Madler et al., 2008). Our findings strongly implicate cerebral WM in the maintenance of this complex addiction and provide genetic targets for further analyses. This finding should encourage future research to examine how changes in WM integrity may or may not contribute to the overall nicotine effects on brain and cognition. Heavy chronic smoking increases the risk of the development of abnormalities in vascular endothelial morphology and function, which may cause cerebral perfusion leading to poorer neurocognition (Pittilo, 2000). Additionally, chronic smoking can impact the vasomotor reactivity/responsivity of the cerebrovascular through upregulation of Ca 2+ channels and/or modulation of nitric oxide, resulting in the reduction of regional cerebral blood flow (Zubieta et al., 2001;Domino et al., 2004) and the development of WM disease (Liao et al., 1997;Ding et al., 2003;Jeerakathil et al., 2004).
Our study population consists of individuals with white ethnic background, and we pooled the UKBB data from different assessment sites and phases together in the analyses without considering the heterogeneity across sites and phases. Future studies are needed to evaluate the generalization of our conclusion to individuals from other races or ethnicities and to assess whether the conclusions are related to site and phase effects. Our study focused on the most significant variants from GWAS for the causal pathway analysis. Considering the high dimensionality of SNPs and the complex polygenic architecture of both smoking and WM integrity, SNPs with weak signals may not be identified. Further investigation in large independent cohorts is needed to validate current causal analysis results and provide a full picture of the complex genetic architecture of smoking and brain structure, which will in turn improve our understanding of the neurogenetic mechanism of nicotine addiction behavior.

DATA AVAILABILITY STATEMENT
The raw genetic and phenotypic data used in the current study are available from the UK Biobank (UKBB), which can be accessed via https://www.ukbiobank.ac.uk/.

AUTHOR CONTRIBUTIONS
ZY, CM, and SL performed the analysis and wrote the manuscript. TM, SC, and PK supervised the project and took the