Dissection of genetic architecture of nine hazardous component traits of mainstream smoke in tobacco (Nicotiana tabacum L.)

Tobacco (Nicotiana tabacum L.) use is the leading cause of preventable death, due to deleterious chemical components and smoke from tobacco products, and therefore reducing harmful chemical components in tobacco is one of the crucial tobacco breeding targets. However, due to complexity of tobacco smoke and unavailability of high-density genetic maps, the genetic architecture of representative hazardous smoke has not been fully dissected. The present study aimed to explore the genetic architecture of nine hazardous component traits of mainstream smoke through QTL mapping using 271 recombinant inbred lines (RILs) derived from K326 and Y3 in multiple environments. The analysis of genotype and genotype by environment interaction (GE) revealed substantially greater heritability over 95% contributed mostly by GE interaction effects. We also observed strong genetic correlations among most studied hazardous smoke traits, with the highest correlation coefficient of 0.84 between carbon monoxide and crotonaldehyde. Based on a published high-density genetic map, a total of 19 novel QTLs were detected for eight traits using a full QTL model, of which 17 QTLs showed significant additive effects, six showed significant additive-by-environment interaction effects, and one pair showed significant epistasis-by-environment interaction effect. Bioinformatics analysis of sequence in QTL region predicted six genes as candidates for four traits, of which Nt21g04598.1, Nt21g04600.1, and Nt21g04601.1 had pleiotropic effects on PHE and TAR.


Introduction
Tobacco smoking is the world's leading cause of avoidable premature mortality (World Health Organization, 2023).Cigarette smoke is a complex, dynamic and reactive mixture that consists of more than 8000 chemical compounds (Rodgman and Green, 2003).When a cigarette is lit, the hot carbonaceous coal within the burning cigarette can reach peak temperatures exceeding 900°C during a puff (McAdam et al., 2016).Adjacent to the hot coal, thermolytic processes (including distillation, pyrolysis and combustion) act on the components of the tobacco to form various smoke constituents, which are released as mainstream smoke (Schwanz et al., 2019).
Due to complexity of tobacco smoke, the mechanism underlying harmfulness of smoke has not been fully understood.Therefore, various lists of toxicants have been proposed in an effort to identify the most relevant constituents responsible for smokingrelated diseases (Smith and Hansch, 2000;Liu et al., 2012), including lists of analytes from Hoffman (Hoffmann et al., 1997(Hoffmann et al., , 2001)), Rodgman and Green (Rodgman and Green, 2003), Talhout (Talhout et al., 2011) and FDA (Food and Drug Administration, 2012).Based on the previous lists and toxicological test methods, a simplified evaluation system was established by Xie et al. (Xie et al., 2009).Xie and his colleagues analyzed 29 hazardous constituents in smoke and four pharmacologic indexes for 163 cigarette samples sold in China.Through statistical analysis, seven smoke constituents, including hydrogen cyanide (HCN), ammonia (NH 3 ), phenol (PHE), benzo[a]pyrene (B[a]P), carbon monoxide (CO), crotonaldehyde (CRO), and 4-(methylnitrosamino)-1-(3pyridyl)-1-butanone (NNK), were selected out to establish a novel hazard index (HI).In addition, given the extreme harm of tar (TAR) in mainstream cigarette smoke to human health, tobacco industry has integrated TAR and HI for comprehensive assessments of hazardous smoke.Efforts are being made to minimize harm, with a focus on tar and the seven representative harmful substances.
There have been reports on the emission of smoke toxicants attributable to several factors such as the variety of chemical compositions in tobacco leaf and the reduction of harmful chemical components within the tobacco leaf is considered as a critical objective in tobacco breeding initiatives (Julio et al., 2006).However, little is currently known about the genetic architecture underlying these smoke-related traits which are considered as quantitative traits.Julio (Julio et al., 2006) was the first to show interest in quantitative trait locus (QTL) mapping and several QTLs of smoke properties (tar, benzo[a]pyrene and CO) were detected in a recombinant inbred line (RIL) population with a partial genetic map.No novel QTL was reported until Tong's study (Tong et al., 2021) for seven smoke substances, including benzo[a]pyrene, hydrocyanic acid, phenol, carbon monoxide, tar, nicotine and total particle matter, using a high-density genetic map constructed by single nucleotide polymorphisms (SNPs) in RIL population, mapping more QTLs on smoke hazardous components and understanding their interactions with environments is still necessary for efficient molecular genetic improvement of the traits.In this study, based on an integrated high-density linkage map and multi-environment phenotypic data of the RIL population, QTL mapping was conducted for nine hazardous smoke traits; the detected main-effect QTLs, epistastic QTLs and their interactions with environments will provide more insights into the genetic architecture of the traits and greatly facilitate the molecular improvements of breeding low-hazard tobacco varieties.

Plant materials and field trial
The RILs were generated from two elite flue-cured tobacco parents Y3 and K326.Y3 is a backbone cultivated variety that originated from Zimbabwe with elite agronomic traits and complicated parental sources.K326, whose genome has been assembled (Edwards et al., 2017), was introduced from America with high commercial quality and disease resistance but moderate agronomic performance.A total of 274 RILs were employed in this study, consisting of two parents, one F 1 generation hybrid (YKF 1 ; Y3 Â K326) and 271 F 7 generation individuals.The materials were planted at Shilin (N: 23.46; E: 103.17) field experiment stations using complete random design with 5 replications, and were cultivated according to local technical measures for quality tobacco production.The eight hazardous substances, including benzo[a]pyrene (B[a]P), carbon monoxide (CO)¸crotonaldehyde (CRO), hydrogen cyanide (HCN), ammonia (NH 3 ), 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK), phenol (PHE) and tar (TAR) and were collected in the mainstream smoke of cigarettes produced using tobacco planted at Shilin in 2018, 2019 and 2020; and calculated hazard index (HI).Three combinations of location and year were treated as environments denoted as E1 (2018 Shilin), E2 (2019 Shilin) and E3 (2020 Shilin).Measurement and calculation of nine hazard constituents in mainstream smoke B[a]P was analyzed using the gas chromatography mass spectrometry (GC-MS) method, as described by the Chinese standard method GB/T 21130-2007(Standardization Administration of China, 2007).CO was determined in the vapor phase using a nondispersive infrared analyzer, as described by the Chinese standard method GB/T 23356-2009(Standardization Administration of China, 2009).CRO was analyzed using the high-performance liquid chromatography (HPLC) method, as described by the Chinese tobacco industry standard method YC/T 254-2008(Standardization Administration of China, 2008a).HCN was quantified using Ion Chromatography, the method as described by the Chinese tobacco industry standard method YC/T 403-2011 (Standardization Administration of China, 2011).NH 3 was analyzed using Ion Chromatography, the method as described by the Chinese tobacco industry standard method YC/T 377-2017 (Standardization Administration of China, 2017).NNK analysis was carried out using GC-TEA method, as described by the Chinese standard method GB/T 23228-2008(Standardization Administration of China, 2008b).PHE was analyzed using HPLC, the method as described by the Chinese tobacco industry standard method YC/T 255-2008(Standardization Administration of China, 2008c).TAR was analyzed using a routine analytical smoking machine, as described by the Chinese standard method GB/ T 19609-2004(Standardization Administration of China, 2004).HI was calculated by the following formula, HI = ( X CO 14:2 + X HCN 146:3 + X NNK 5:5 + X NH3 8:1 + X B½aP 10:9 + X PHE 17:4 + X CRO 18:6 ) Â 10=7 (Xie et al., 2009), where X CO represents emission level of CO (mg/cigarette), X HCN for HCN (mg/ cigarette), X NNK for NNK (ng/cigarette), X NH3 for NH 3 (mg/cigarette), X B½aP for B[a]P (ng/cigarette), X PHE for PHE (mg/cigarette) and X CRO for CRO (mg/cigarette).Three biological replicates were used for each assessment.

Statistical analysis of phenotypes
Variance components analysis and heritability estimation were performed based on the following linear model, where y khi is the phenotypic value of the i-th replication of the k -th line in the h-th environment; m is the population mean; g k is the genotypic value of the k-th genotype, random effect, ɡ k ∼ N(0, s 2 ɡ ); e h is the effect of the h-th environment, random, e h ∼ N(0, s 2 e ); ɡe kh is the interaction effect between the k-th genotype and the h-th environment, random, ɡe kh ∼ N(0, s 2 ɡe ); e khi is the residual effect of the individual, random, e khi ∼ N(0, s 2 e ).The mmer module of sommer R package (Covarrubias-Pazaran, 2016) was applied to estimate the variances of random effects ( ŝ 2 ɡ , ŝ 2 e , ŝ 2 ɡe , ŝ 2 e ) and to predict the random effects by BLUPs (best linear unbiased predictions, ĝ k , ê h , ĝe kh ) by solving the mixed model equation (MME).Heritability was estimated with the formula h 2 g = ŝ 2 ɡ =( ŝ 2 ɡ + ŝ 2 ɡe + ŝ 2 e ) and h 2 ge = s 2 ge =(s 2 ɡ + s 2 ɡe + s 2 e ) , where ŝ 2 ɡ was the estimated genotypic variance, ŝ 2 ɡe was the estimated variance due to gene-by-environment interaction, and ŝ 2 e was the estimated residual variance.The rcorr module of Hmisc R package (https://cran.r-project.org/web/packages/Hmisc/index.html)was employed to calculate Pearson correlation coefficients between six studied traits: (1) phenotypic correlation coefficients with y khi for each environment, respectively; (2) genetic correlation coefficients with ŷ k .(ŷ k = m + ĝ k , where ŷ k is the adjusted genotypic value of the k-th line by the environment effects, m is the estimated population mean, and ĝ k is the genotypic value of the k-th line predicted by BLUP).

Genetic linkage map
The high-resolution linkage map (Tong et al., 2023) that contained 46,324 markers, which were classified into 7,107 bins, a group of markers with least genotype missing rate and same genetic distance, distributed on the 24 linkage groups (LGs) and covered 3334.88 cM with an average genetic distance of 0.469 cM, was employed in this study.

Genetic model and statistical methods for QTL mapping
A QTL full model was adopted for modeling the genetic architecture of complex traits from multi-environment trials, which includes additive effect (a) of each QTL, additive-byadditive epistatic effect (aa) of each pair of epistatic QTL, treated as fixed effects, and their corresponding environment interaction effects (ae and aae) as random effects.Suppose a trait is controlled by s segregating QTLs, of which t pairs of QTLs are involved in epistasis.Then, the phenotypic value of the m-th replication of the k -th genotype in the h-th environment (y hkm ) can be expressed by the following mixed linear model (Yang et al., 2007), where, m is the population mean; a i is the additive effect of the i-th QTL with indicator variable x ik , fixed effect; aa ij is the additiveby-additive epistatic effect of the i-th QTL and the j-th QTL with indicator variable x ik x jk , fixed effect; e h is the effect of the h-th environment, random effect, e h ∼ (0, s 2 E ); ae hi is the additive-byenvironment interaction effect of the i-th QTL and the h-th environment with observation u hik ( = x ik ), random effect, ae hi ∼ (0, s 2 A i E ); aae hij is the interaction effect of the aa ij and the h-th environment with observation u hijk ( = x ik x jk ), random effect, aae hij ∼ (0, s 2 AA ij E ); and ϵ hkm is the residual effect of the individual, random, ϵ hkm ∼ (0, s 2 ϵ ).
QTLNetwork 2.0 software were employed to detect QTLs by the mixed-linear-model-based composite interval mapping (MCIM) method (Yang et al., 2008).One-and two-dimensional genome scans for QTLs were performed with configurations of 10 cM testing window, 1 cM walking step and 10 cM filtration window size.To control the experiment-wise type I error rate, a critical Fvalue based on the Henderson III method was determined by the permutation test with 1,000 times for each tested locus at the significance level of 0.05.Based on the significant QTL, a QTL full model was established and used to estimate each parameter based on the samples generated by Markov chain Monte Carlo (MCMC) with 20,000 Gibbs sampler iterations.

Candidate genes prediction
The physical positions of the marker interval with QTL were determined using Nucleotide BLAST module of NCBI (https:// blast.ncbi.nlm.nih.gov/Blast.cgi),which utilized sequence information of two adjacent bin markers in the linkage map.Variants including SNPs and Indels located within the QTL regions were selected for subsequent filtration.These variants were annotated by software SnpEff (http://pcingola.github.io/SnpEff/)based on K326 reference genome (https://solgenomics.net/ftp/genomes/Nicotiana_tabacum/ e d w a r d s _ e t _ a l _ 2 0 1 7 / a s s e m b l y / N i t a b -v 4 .5 _ g e n o m e _ Chr_Edwards2017.fasta), and those showing HIGH or MODERATE impact on related protein effectiveness were retained, according to annotation results.Then, the eligible variants with P value less than 0.05 were identified by performing single-marker regressions with ŷ k as response variable (as mentioned in the Statistical analysis of phenotypes).Before performing the enrichment analysis of genes with variants above, the protein sequences of the K326 reference genome (https://solgenomics.net/ftp/genomes/Nicotiana_tabacum/ edwards_et_al_2017/annotation/Nitab-v4.5_proteins_Edwards2017.fasta)should be uploaded to the eggNOG-mapper website (http://eggnog-mapper.embl.de/)for functional annotation.Gene Ontology (GO) and KEGG pathway enrichment analyses were carried out using the clusterProfiler R package (https://github.com/YuLab-SMU/clusterProfiler) as criteria for predicting candidate genes.

Phenotypic performance of nine smokerelated traits
For the nine smoke-related traits, the estimated heritability of genetic effects h 2 ɡ ranged from 11.03% for CRO to 20.00% for CO, displaying a limited stability across environments for these traits (Table 1).Furthermore, gene-by-environment effects were found to contribute significantly more to phenotypic variation with most estimated interaction heritability ( ĥ 2 ɡe ) nearly over 80%, indicating selection of these traits should design specific strategy for different environment conditions.Most phenotypic correlations of the traits were positive and reached statistical significance (a = 0:05), for example, with coefficients over 0.8 between CO and CRO in three environments (Figures 1A-C).Nevertheless, strong negative correlations were observed between NNK and other three constituents: B[a]P, CO, and CRO, while, weak negative correlations between B[a]P and PHE, and between NNK and TAR.In general, the phenotypic correlations in three environments showed similar pattern as well as the genetic correlations (Figure 1D), indicating a strong and reliable underlying genetic relationship among these traits that make further exploration necessary.However, inconsistencies in the performance of the traits across three environments could be observed, with E2 exhibiting a higher frequency of exceptions.For instance, the phenotypic correlations between CO and PHE were significantly positive in E1 (Figure 1A) and E3 (Figure 1C), but significantly negative in E2 (Figure 1B).The genetic correlation coefficients, calculated using the estimated genotypic values, exhibited the highest correlation of 0.84 between CO and CRO, followed by 0.79 between NH 3 and HI, and several other strong correlations above 0.7, such as NH 3 and PHE (Figure 1D).The existence of high underlying genetic correlations between these traits might help to explain their strong phenotypic correlations.Further, correlations between HI and the other traits were all displayed significantly positive except B[a]P, which is consistent with the fact that HI was a composite index comprising seven representative smoke traits.

Additive and additive-by-environment interaction effects
A total of 19 QTLs were identified responsible for the nine smoke-related traits.Among them, NH 3 had four QTLs; CO, NNK, PHE and TAR each had three QTLs; CRO, HCN and HI each only detected one QTL (Table 2).These QTLs distributed on seven linkage groups (LGs), with LG15 containing the highest number of QTLs (7 QTLs), followed by LG6 (6 QTLs) and LG1 (2 QTLs), while each of LG4, LG5, LG10 and LG19 only harbored one QTL.
It is noteworthy that qPHE1 and qTAR1 located in the same marker interval with fl anking marker PT61201 and SNP_0912209_288, similarly, qPHE6 and qTAR6 in the same marker interval ranged by SNP_0002499_215080 and SNP_0011326_16206, qNH 3 15-2 and qTAR15 in the same interval ranged by SNP_0002539_133 and SNP_0000535_1549 (Table 2).These co-location indicated the potential pleiotropic effect of QTL on PHE and TAR or on NH 3 and TAR, which requires further investigation for revealing the molecular mechanism of genetic correlation between traits.Our inference on existence of pleiotropic QTL was enhanced by the significant and relatively high genetic correlations estimated between the studied traits (Figure 1), which were 0.74 between NH 3 and TAR, and 0.56 between PHE and TAR as we mentioned above.
Totally, 17 QTLs with additive (a) main effects were detected for nine traits, of which six QTLs also showed additive-by-environment interaction (ae) effects (Table 3).Most of the QTLs exhibited small additive effects, which were regarded as minor-effect QTLs and accounted for approximately 2% phenotypic variance.The average Phenotypic and genetic correlations between studied traits in the RIL population.Heat map (A-C) showed phenotypic correlation coefficients between nine traits in E1 (2018 Shilin), E2 (2019 Shilin), E3 (2020 Shilin) in turn, the heat map (D) showed genetic correlation coefficients between nine traits.*, **, *** denote significance level at 0.05, 0.01 and 0.001, respectively.Traits abbreviations are same as those in Table 1.
Significant additive by environment interaction effects (ae) were found for less than half of QTLs, and their contributions to phenotypic variation (h 2 ae ) were around 1%, with most being lower than their corresponding additive effects.However, there was an exception that the interaction between qNNK6 and two environments (E1, E2) accounted for 1.17% of the phenotypic variance, which was higher than 0.80% explained by corresponding additive effects.Moreover, the ae effects of QTL could take same or opposite effect direction as their main effects (a), and also showed different effect direction across environments.However, the ae effects of different QTLs for same trait may show consistent effect direction in same environment.For example, the ae effects of qNNK6 and qNNK15-1 both contributed negative effects in E1.It is widely recognized that QTLs with no significant ae effects have crucial application potential in breeding new varieties with strong environment adaptability.Of significance, the absence of ae effects of QTLs for HCN, NH 3 , TAR, and HI suggested that these four traits may exhibit stable performance across various environments.

Additive-by-additive epistasis and epistasisby-environment interaction effects
Two-dimensional genome scan found a digenic epistatic QTL pair (qNH 3 5-qNH 3 10) with epistasis-by-environment interaction effects (aae) for NH 3 , but no paired epistatic QTLs for other traits (Table 3).This epistatic QTLs didn't contribute additive-additive epistatic effects (aa), only positive aae effects in E1 and negative in E2, accounting for 1.51% of total phenotypic variation.Additionally, the epistatic QTL pair qNH 3 5 on LG5 and qNH 3 10 on LG10, each had no individual additive effect.a QTL: named in the form of "q","Trait","LG","-Rank", for example, qCO6 denotes a QTL of CO which is the first QTL on the LG6; b M-, M+: flanking markers, of which markers whose name begin with TM and PT denote SSR.c Support interval of a QTL: determined by following strategy: firstly, search the first left and right tested positions whose P-values increase to ten times of that of the QTL, then select their nearest markers for the support interval of the QTL.d Type: A, AE and AAE denote QTL with additive effects, additive-by-environment interaction effects and epistasis-by-environment interaction effects, respectively.
Overall, environmental effect (E) explained the largest part of the phenotypic variance which ranged from to 53.83%, followed by genetic main effect (G) and gene-by-environment interaction effect (GE) (Supplementary Table S1).For all traits, the G effects, which were entirely composed of additive effects of one to three QTLs, explained 1.04% to 8.00% of the phenotypic variance.Besides, for CO, CRO, NH 3 , NNK, PHE, the contribution of GE to the total variation was small, with some in the form of ae effects and others in the form of aae effects.In summary, our analysis showed that these traits were primarily controlled by a single major gene or polygenes.

Prediction of superior genotype for smoke-related traits
To explore the potential of the identified QTLs in improving smoke-related traits through genetic and molecular manipulation, we undertook QTL genotype design and assessed the trait values potentially achieved by the general superior homozygous line (GSL (-)) and the superior homozygous line for each year (SL(-)) based on the genetic effects of these QTLs.Notably, for the HCN, TAR, and HI, there are no QTL involved in interaction with environment, the designed superior genotypes GSL(-) and SL(-) of each trait remained consistent across all environments (Table 4).On the CO, CRO, NH 3 , NNK, and PHE, although they were influenced by geneenvironment effects, their superior lines (GSL, SL) have identical homozygous genotypes for each trait across three environments, respectively, except the SL(-)1 with QQ at qNNK6 for NNK.In particular, the superior lines of HCN, NNK and PHE were constructed by all homozygous genotypes with alleles from the parent Y3 (qq) at QTLs except qNNK6 for NNK, which indicated that the pyramid of genes from Y3 at these QTLs can effectively decrease the levels of hazardous smoke constituents, HCN, NNK, and PHE.Meanwhile, this advantageous genotype remained valid under all environments.This discovery provided valuable b ae: additive-by-environment interaction effects, of which ae 1 denotes the interactions between a and environment E1. c aa: additive-by-additive epistatic effect.d aae: epistasis-by-environment interaction effects, of which aae 1 denotes the interaction between aa and environment E1. e PVE(%), the proportion of phenotypic variance explained; PVE(A), the proportion of phenotypic variance explained by the additive QTL; PVE(AE), the proportion of phenotypic variance explained by the additive-by-environment interaction; PVE(AA), the proportion of phenotypic variance explained by the additive-additive epistatic effects; PVE(AAE), the proportion of phenotypic variance explained by the epistasis-by-environment interaction.*, **, *** denote significance level at 0.05, 0.01 and 0.001, respectively.Abbreviations of traits are same as those in Table 1.
indication for efficient simultaneous improvements of HCN, NNK, and PHE by utilization of elite genes Y3.

Enrichment analysis and prediction of candidate gene for smoke-related traits
Annotation of variants (SNP/Indel) was conducted using SnpEff based on the K326 reference genome, and 559,513 variants were identified within the putative physical positions of the 17 additive-QTL regions.Among these, only variants with a HIGH or MODERATE impact on protein function were retained, as they were considered potentially functional in candidate genes, as a result, 5,268 variants were selected in total.Then simple regression analysis of the predicted genetic values on these variants was performed, and 600 significant variants in 76 genes were selected (P < 0:05) (Supplementary Table S2).These prioritized genes demonstrated enrichment in five GO biological processes, four GO molecular functions, and two KEGG pathways.Specifically, the enrichment was observed in cellular processes (biological process), catalytic activity (molecular function), and signaling molecules metabolic pathways (Table 5).For instance, "callose deposition in cell wall", "polysaccharide localization" and "callose localization" were associated with carbohydrate biosynthesis.
The genes Nt21g04598.1, Nt21g04600.1 and Nt21g04601.1 were predicted as potential pleiotropic candidate genes for qPHE1 (PHE) and qTAR1 (TAR), BLASTP results indicated their putative functions.Nt21g04598.1 was shown to encode a protein highly homologous to non-specific phospholipase C4 in Arabidopsis thaliana, which functioned as plasma membrane bound and promoted tolerance to phosphate deficiency and hyperosmotic stress (Nakamura et al., 2005;Peters et al., 2010;Wimalasekera et al., 2010;Kocourkováet al., 2011;Pejchar et al., 2015;Yang et al., 2021).Nt21g04600.1 and Nt21g04601.1 were predicted to code a protein with high homology to transcription factor bHLH91 in Arabidopsis thaliana, which regulated the transcriptional expression, thereby regulating the plant's adaptive responses (Qian et al., 2021).The gene Nt20g03473.1 was pinpointed as a candidate gene for qCO6 (CO), which encoded a protein highly homologous to phospholipase D (PLD) delta in Arabidopsis thaliana.PLD delta has been proposed to play a role in many cellular processes such as signal transduction, membrane trafficking, cytoskeletal rearrangements, and membrane degradation (Distefano et al., 2012;Guo et al., 2012;Uraji et al., 2012;Jia et al., 2013); for example, it was involved in H 2 O 2 and abscisic acid (ABA)-induced stomatal closure, nitric oxide (NO) signaling and ABA-promoted senescence.Moreover, gene Nt16g00273.1 and Nt16g00284.1 have been identified as candidate  Abbreviations of traits are the same as Table 1.
genes for qNNK6 (NNK), which have been predicted to encode uncharacterized serine-rich protein C215.13 and type II inositol polyphosphate 5-phosphatase 15 isoform in Nicotiana tabacum, respectively, awaiting further annotation in the future.

Discussion
In comparison to traditional technologies used to reduce harmful ingredients in cigarette smoke, targeting the molecular mechanisms underlying the production of hazardous substances would be an innovative approach to improve safety and quality.Julio et al. (Julio et al., 2006) was the first to apply QTL analysis to explore smoke toxicants.They constructed a partial genetic map using 138 low-throughput markers, including amplified fragment length polymorphism (AFLP), inter simple sequence repeat (ISSR), sequence specific amplified polymorphism (SSAP) and sequence characterized amplified region (SCAR), which were assigned to 18 linkage groups.Then, a total of five QTLs were identified for TAR, CO, and B[a]P in a RIL population.Furthermore, Tong (Tong et al., 2021) conducted QTL studies using a high-density genetic map with 45,081 SNPs, which was constructed by whole-genome sequencing data of a tobacco population of 274 individuals.They detected several major QTLs of PHE, CO and TAR distributed in LG6 from 123.28 to 158.72 cM, and the close linkage of these QTLs were in accord with the strong positive correlations among these traits.Similarly, our study also detected some QTLs of PHE and TAR in the same region, pleiotropic effects or linkage of which may lead to high genetic correlation between the traits.
Compared with previous QTL studies on hazardous smokerelated traits, our study possesses following advantages.First, we utilized a up-to-date published integrated linkage map with 46,324 polymorphic markers, including SNPs, Indels and SSRs, distributed on 24 linkage groups and covered 3334.88 cM with an average genetic distance of 0.469 cM (Tong et al., 2023).This highresolution linkage map represents the most comprehensive map of tobacco to date.Second, we employed a QTL full model with effects of additive, additive-additive epistasis, and their interaction with environments, which is more rational to depict the genetic properties of quantitative traits where gene-gene and geneenvironment interaction are mostly involved.As a result, we identified a total of 19 QTLs for the studied hazardous smokerelated traits, of which 17 QTLs showed significant additive effects, six showed significant additive-by-environment interaction effects, one pair showed significant epistasis-by-environment interaction effects for NH 3 .Notably, only one QTL was detected for CRO, HCN and HI, but more QTLs for CO, NH 3 , PHE and TAR, respectively, probably due to relatively lower general heritability (h 2 g ) and interaction heritability (h 2 ge ) of CRO, HCN and HI.Besides, we included the comprehensive index HI in QTL analysis to explore whether any pleiotropic QTLs could be detected with the eight direct smoke-related traits.It turned out that qHI6 was found to be located on the same linkage group as qNNK6, qCO6, qHCN6 and qPHE6.This is in concert with the high genetic correlations over 0.7 between HI and HCN, NH 3 and PHE.In addition, considering of no QTL detected for B[a]P and its relatively small genotypic variance and genotype by environment interaction variance, we inferred that the B[a]P may be controlled by minor-effect QTLs which couldn't be detected by the program because of too small effect magnitude, unlike major QTLs with relatively large effects that are more likely to be detected (Heffner et al., 2009;Beavis, 2019).
Our investigation pinpointed the qPHE1 and the qTAR1 were co-localized in the same chromosome, demarcated by the genetic markers PT61201 and SNP_0912209_288.Similarly, qPHE6 and qTAR6 were found to co-localize in another region, flanked by SNP_0002499_215080 and SNP_0011326_16206.These findings suggested the possibility that pleiotropic genes, may be nested within these genomic segments.Furthermore, our hypothesis on existence of pleiotropic gene was reinforced by the substantial genetic correlation with coefficient of 0.56 between the corresponding traits, PHE and TAR.The strong and stable trait Abbreviations of traits are the same as Table 1.
correlation across environments further indicated the existence of shared pleiotropic candidate genes in these identified genomic With subsequent comprehensive bioinformatics analyses, Nt21g04598.1, Nt21g04600.1 and Nt21g04601.1 were anchored at and predicted as pleiotropic candidate genes for qPHE1 and qTAR1.According to the results of NCBI-BLASTP, Nt21g04598.1 was predicted to encode a protein highly homologous to non-specific phospholipase C4 in Arabidopsis thaliana, Nt21g04600.1 and Nt21g04601.1 were predicted to code a protein with high homology to transcription factor bHLH91 in Arabidopsis thaliana.Coupled with results of GO and KEGG enrichment analyses, we speculated that these candidate genes might be involved in the signaling or regulatory processes of carbohydrate biosynthesis and transformation, leading to different performances of PHE and TAR.
In our study, identifying candidate genes relied on the rough physical positions of QTL intervals, which might encompass hundreds or thousands of genes.In other words, distinguishing target trait genes from annotated genes within a QTL remains challenging for primary mapping populations such as RILs used in our study.Therefore, fine mapping using secondary mapping populations (e.g., SSSLs, NILs, CSSLs), are needed to narrow down target QTLs, eliminating genetic background interference for accurate candidate gene prediction.Additionally, further studies are needed for functional validation of these candidate genes with cutting-edge molecular biology techniques on the levels of gene expressions, proteins and metabolites, so that, the pleiotropic genes could be effectively used in synchronous improvement of PHE and TAR.

TABLE 1
Variance components analysis and estimated heritability (%) of genetic effects for smoke-related traits.

TABLE 2
QTLs detected for smoke-related traits in RIL population.

TABLE 3
Effects and the proportion of phenotypic variance explained by QTL.
a a: additive effect.

TABLE 4
Superior lines predicted by full genetic model for smoke-related traits.

TABLE 5
Significantly enriched Gene Ontology (GO) terms and enriched KEGG pathway terms of potential genes.
a Enrichment analysis: Gene Ontology (GO) enrichment analysis, including biological process (BP), molecular function (MF) and cellular component (CC);