Determination of genetic correlation between tobacco smoking and coronary artery disease

Backgrounds Tobacco smoking is an important risk factor for coronary artery disease (CAD), but the genetic mechanisms linking smoking to CAD remain largely unknown. Methods We analyzed summary data from the genome-wide association study (GWAS) of the UK Biobank for CAD, plasma lipid concentrations (n = 184,305), and smoking (n = 337,030) using different biostatistical methods, which included LD score regression and Mendelian randomization (MR). Results We identified SNPs shared by CAD and at least one smoking behavior, the genes where these SNPs are located were found to be significantly enriched in the processes related to lipoprotein metabolic, chylomicron-mediated lipid transport, lipid digestion, mobilization, and transport. The MR analysis revealed a positive correlation between smoking cessation and decreased risk for CAD when smoking cessation was considered as exposure (p = 0.001), and a negative correlation between the increased risk for CAD and smoking cessation when CAD was considered as exposure (p = 2.95E-08). This analysis further indicated that genetic liability for smoking cessation increased the risk of CAD. Conclusion These findings inform the concomitant conditions of CAD and smoking and support the idea that genetic liabilities for smoking behaviors are strongly associated with the risk of CAD.


Introduction
Tobacco smoking is one of the most important public health problems world-wide, accounting for 9% of deaths (1).Several epidemiologic studies have proved that tobacco smoking is a major risk factor for many diseases such as lung cancer and cardiovascular and respiratory diseases (1).Lung cancer is the most prevalent smoking-associated cause of death, followed by ischemic heart disease and chronic airway obstruction.Tobacco smoking and its impact on the Zhu et al. 10.3389/fpsyt.2023.1279962Frontiers in Psychiatry 02 frontiersin.orgrespiratory system caused an estimated 8 million deaths per year, with more than 10% of these deaths related to second-hand smoke (2).In the United States, tobacco smoking is associated with 30% of all CAD-related deaths each year (3) and also doubles the risk of premature cardiovascular deaths (4).
Several epidemiologic studies have revealed that tobacco smoking increases the incidence of fatal CAD and associates with various cardiovascular diseases (5)(6)(7)(8).Extensive clinical evidence has supported the idea that tobacco smoking causes multiple genetic and epigenetic abnormalities in the respiratory epithelium (9,10).In a previous study, Sabater-Ileal and colleagues identified a genetic locus that influences both lung function and CAD (11), although the findings were not genome-wide in scale and were underpowered due to a small sample size.
Tobacco contains more than 4,000 chemicals (12), and the exact toxic components and the mechanisms involved in tobacco-related CAD and cardiovascular dysfunction are still unknown.Recently, a genetic predisposition to the development of atherogenesis in individuals exposed to cigarette smoke has been reported.The commonly documented examples are CYP1A1 MSP polymorphism and certain endothelial NO synthase intron 4 polymorphisms.Both increase the susceptibility to cigarette smoke exposure-related atherosclerotic diseases including multi-vessel CAD and myocardial infarction (MI) (13,14).Given that much of the available data were derived from observational studies, which are unable to account fully for confounding and reverse causation, the genetic correlation and causal relations between smoking and cardiovascular diseases remain to be determined.
The principle of Mendelian randomization (MR) relies on the basic laws of Mendelian genetics, segregation, and independent association.When these principles hold at a population level, the influence of confounding factors can be reduced because religion, growth, environment, and other confounding factors are considered to be random (15).Given that alleles are randomly allocated and become fixed at conception, MR studies are less susceptible to reverse causality than are observational studies.
In this study, we examined the pleiotropic effect of tobacco smoking and CAD using publicly available GWAS summary statistics.Then, we used bidirectional MR method to reveal the nature of the causal relations between CAD and tobacco smoking.Finally, we determined the biological processes or pathways involved in the comorbidity of these two diseases.

GWAS summary data sets
The GWAS summary data for CAD and plasma lipid concentrations have been described in a previous report (16).Briefly, the summary statistics of a large GWAS meta-analysis comprising more than 120,000 CAD cases and 339,115 controls were obtained from CARDIoGRAMplusC4D Consortium website (http://www.cardiogramplusc4d.org/data-downloads/).A total of 9,149,595 variants were included either in the CARDIoGRAMplusC4D 1,000 Genomes-imputed GWAS or the MIGen/CARDIoGRAM Exome chip study.The smoking data were obtained from the large GSCAN summary statistics (17).The GSCAN investigated four smoking-related phenotypes, including age at initiation of regular smoking (AgeSmk; n = 341,427), whether an individual had ever smoked regularly (SmkInit; n = 1,232,091), cigarettes smoked per day (CPD; n = 337,334), and smoking cessation (SmkCes; n = 547,219).The GWAS summary statistics for different smoking phenotypes can be found at https://conservancy.umn.edu/handle/11299/201564.
Further, we obtained published GWAS meta-analysis association data for lipid concentrations from the Center for Statistical Genetics, which was a joint analysis that examined 188,577 individuals whose genomic DNA samples were genotyped with two platforms from multiple studies (18).Complete GWAS summary statistics were downloaded from webpage http://csg.sph.umich.edu/willer/public/lipids2013/.

Estimation of genetic correlation by LD score regression (LDSC)
The genetic correlations (r g ) between CAD and smoking behaviors were estimated by LDSC (19).Pairwise LD r 2 among SNPs was conducted using pre-computed LD scores with the 1,000 Genomes Project reference panel of subjects of European ancestry.Quality control steps were adopted from LD scores default procedures, including imputation quality >0.9 and minor allele frequency > 0.1.Moreover, all SNPs retained for further analysis were merged with SNPs in the HapMap 3 reference panel.Correlation was considered significant at a corrected p value of <0.05 by Bonferroni correction.

Mendelian randomization (MR)
We extracted the effect estimates and standard errors (Ses) from relevant GWAS and employed TwoSampleMR (v.0.4.22)R package to clarify the potential causal effect for both smoking and CAD (20).The following strategies were used to identify genetic instruments.First, we filtered GWAS summary datasets to require shared susceptible loci in both smoking and CAD.The variants showing genome-wide significance (p < 5 × 10 −8 ) in GWAS for CAD were considered to be candidate variants, and then we checked the significance of these genetic loci separately in four other smoking behaviors: smoking initiation age, smoking cessation, CPD, and age of initiation in GSCAN studies.The common SNPs were harmonized using default parameters within the built-in "harmonize data" function and then trimmed by PLINK (v.1.07) to obtain independent risk variants for each disease (21).
To start, the MR analysis was performed by generating instrumental variable estimates for each SNP.The averaged causal estimate of each SNP was calculated using the inverse-varianceweighted (IVW) method, i.e., specifically defined as the beta coefficient associated with SNP-CAD divided by the beta coefficient associated with SNP-smoking behaviors (22).In addition, we used a series of sensitivity analyses, which included weighted median and MR Egger, to evaluate the reliability of our results.

Gene and pathway analysis
The gene-based analysis that links SNPs to genes was conducted using MAGMA with default settings.To gain biological insights into shared genes, we used the WebGestalt tool (23) to assess enrichment of the identified shared gene set in the Gene Ontology (GO) biological processes with redundant GO terms been removed.Both analyses were based on shared genes that were identified from cross-trait metaanalysis.Pathways with a false discovery rate (FDR) < 0.05 were considered significant.

Susceptible loci shared by CAD and smoking behaviors
To investigate the genetic overlap between CAD and smoking behaviors, we used GWAS summary data from large-scale genomewide studies (Supplementary Table S1).We detected a great number of significantly associated SNPs overlapped between CAD risk loci and at least one smoking phenotype (Supplementary Table S2).Of the CAD GWAS loci, 2091 SNPs (35.01%) showed nominal significance in CPD, 526 SNPs (25.16%) showed nominal significance for smoking initiation, 317 SNPs (15.16%) for smoking cession, and 85 SNPs (4.07%) for age at smoking initiation.Notably, 24 SNPs reached genome-wide significance (Figure 1).

Genetic correlations between CAD and smoking behaviors
We used the LDSC method to test for the genome-wide correlations between CAD and smoking behaviors.Significant genetic correlations were found between CAD and all the smokingrelated traits with the smallest p values <1 × 10 −18 (Figure 2).We observed significant positive genetic correlations between CAD and CPD, smoking initiation, and smoking cessation (r g > 0.2), but a Genetic susceptibility map for CAD and smoking behaviors.Outer ring defines location of 22 human autosomes.Scatter plots in second and fourth rings demonstrate analogy of Manhattan plot for association results from CAD and smoking behaviors, respectively.Altitude of each dot represents statistical significance as −log 10 (P).SNPs that reached genome-wide significance are colored red for CAD and green for smoking behaviors.Yellow bars in third ring mark 24 CAD risk loci at least nominally associated with smoking behaviors, and tag SNPs in these loci are labeled.

Mendelian randomization analysis
Considering the presence of potential LD relations among those significant SNPs of interest, we performed p value-informed LD pruning with the goal of obtaining independent GWAS SNPs.This led to the identification of 15, 63, 18, and 152 independent SNPs for Age of Initiation, CPD, Smoking Cessation and Smoking Initiation, respectively.Bi-directional MR analysis provided strong evidence that smoking initiation increased the risk of CAD (IVW: β = 0.191; p = 2.59 × 10 −6 ) with a consistent direction of effect in all three MR methods (Table 1).There also was evidence for a consistent but weaker genetic liability for smoking cessation on CAD (IVW: β = 0.234; p = 0.001).The same findings were observed for age at initiation as the instrument on CAD (IVW: β = 0.295; p = 0.039).
On the other hand, we obtained negative correlations between CPD and CAD, although the statistic was less significant.Only the result of MR Egger was significant.When treating CAD as an instrument, strong evidence of decreased risk of smoking cessation by

Biological pathway and enrichment analysis
We performed pathway analyses to identify biological pathways enriched for shared genetic loci related to smoking and CAD based on significant cross-trait meta-analysis results.For a detailed list of the overlapped genes and SNPs between CAD and smoking-related phenotypes, please refer to Table 3. Pathway analysis showed that the SNP-related genes were significantly enriched in lipoprotein metabolic, chylomicron-mediated lipid transport, lipid digestion, mobilization, and transport (Table 4).The GO analysis suggested that shared genes in CAD and smoking behaviors were enriched in triglyceride-rich lipoprotein particle clearance, blood vessel development, and very-low-density lipoprotein particle clearance (Table 5).

Discussion
In this study, we revealed the genetic correlation and causal relations between smoking and CAD, providing a comprehensive evaluation of the shared genetic etiology of tobacco smoking and cardiovascular diseases.Our findings have highlighted the discovery that different smoking behaviors have strong associations with CAD, specifically, the correlation between smoking initiation, smoking cessation, and CAD.
The approach to MR is based on the assumptions that: (1) the genetic marker is associated with the exposure; (2) the genetic marker is independent of any confounding factors; and (3) there is no association between the genetic marker and outcome except through confounding factors.However, it should be acknowledged that these assumptions generally are not easy to evaluate.Results from the present MR study were based on data from the GWAS, which has corroborated the results obtained from conventional prospective observational studies that confirmed that tobacco smoking is a risk factor for CAD (5)(6)(7)(8).
To our knowledge, this study represents one of a few large-scale genome-wide analysis to investigate the genetic overlap between smoking and CAD (24)(25)(26).Similar to the findings from these reports, our analyses also revealed strong associations between smoking initiation, smoking cessation, and CAD.Further, we found a significant positive association between smoking initiation and CAD when smoking initiation was considered as exposure (inversevariance-weighted: β = 0.191; p = 2.59E-06; weighted median: β = 0.193; p = 3.35E-06), suggesting that smokers are more susceptible to CAD.In addition, we found a negative correlation between CAD and smoking cessation when CAD was considered as exposure (inverse-variance-weighted: β = −0.046;p = 2.95E-08; weighted median: β = 0.193; p = 3.35E-06).This indicates that patients with CAD are less likely to quit smoking, possibly because of tobacco addiction.Together, these findings demonstrate the presence of shared genetic etiologies between tobacco smoking and CAD.
Further, we found that smoking has significant associations with HDL and LDL.A positive correlation between CPD and LDL was observed when CPD was considered as exposure (inversevariance-weighted: β =0.06; p = 0.01; weighted median: β = 0.06; p = 0.02), and a negative correlation between CPD and HDL when CPD was considered as exposure (inverse-variance-weighted: β = −0.06;p = 0.005; weighted median: β = −0.06;p = 0.02).There also is a positive correlation between LDL and smoking cessation when LDL was considered as exposure (inverse-variance-weighted:     Taken together, these findings provided a clear indication that smoking increases the risk of CAD by affecting the regulation of LDL and HDL, which needs to be further investigated.We also performed GO enrichment and KEGG pathway analyses based on the genes where the SNPs overlapped between CAD and smoking-related phenotypes are located.We found several functions and pathways to be related to the lipoprotein metabolic and blood vessel development, which are all closely associated with CAD.It has been reported that the APOE-APOC1-APOC2-APOC4 cluster was significantly related to lipoprotein-associated phospholipase A2 mass and activity and CAD (33).Interestingly, the SNP rs4420638 located downstream of the APOC1 gene was found to be significantly related to smoking cessation (p = 7.4E-6) (34).Moreover, as a brain eQTL based on the information from BRAINEAC, rs4420638, this SNP has been linked to Alzheimer's disease (35)(36)(37)(38)(39)(40) and cognitive decline (41).
We used non-overlapping data sources in the context of summary-level MR analysis of exposure and outcome, which greatly improved the confidence in the causal effect estimates.In addition, through a range of sensitivity analysis methods, similar causal estimates and consistent causal inferences could be drawn.However, this study had limitations as well.First, there was a stark difference in sample sizes among different phenotypes, which might contribute to discrepancies in statistical power.Second, the information available on the summary-level GWAS data had limited us to divide samples into subgroups, which prevented us from studying the age-related heterogeneity.
In conclusion, this was a systematic analysis of the shared etiology and possible causal relations of smoking and CAD by employing large-scale GWASs.Genetic methods represent another option for assessing causality when there are too many confounding factors in randomized controlled traits, and our findings strongly support the hypothesis that smoking behavior is causally related to CAD risk.We found significant genetic overlap and correlations between CAD and smoking at the SNP level.Taken together, the data from this study enhance the understanding of the genetic etiology of the relations between CAD and smoking and might help to dissect smoking behaviors and develop preventive strategies to reduce the burden of cardiovascular disease in public health.

TABLE 1
Effect of smoking behaviors on CAD using two-sample MR analysis.

TABLE 2
Effect of CAD on smoking behaviors using two-sample MR analysis.

TABLE 3
The overlapped genes and SNPs between CAD and smoking-related phenotypes.

TABLE 4
Detected shared pathway between smoking and CAD based on pathway analysis.