Genome-Wide Association Mapping for Tomato Volatiles Positively Contributing to Tomato Flavor

Tomato volatiles, mainly derived from essential nutrients and health-promoting precursors, affect tomato flavor. Taste volatiles present a major challenge for flavor improvement and quality breeding. In this study, we performed genome-wide association studies (GWAS) to investigate potential chromosome regions associated with the tomato flavor volatiles. We observed significant variation (1200x) among the selected 28 most important volatiles in tomato based on their concentration and odor threshold importance across our sampled accessions. Using 174 tomato accessions, GWAS identified 125 significant associations (P < 0.005) among 182 SSR markers and 28 volatiles (27 volatiles with at least one significant association). Several significant associations were co-localized in previously identified quantitative trait loci (QTL). This result provides new potential candidate loci affecting the metabolism of several volatiles.


INTRODUCTION
The perception of the tomato flavor, including aroma and taste, is a result of the interaction of sugars, acids, and many aromatic volatile compounds (Goff and Klee, 2006;Klee and Tieman, 2013). Plants produce a large variety of fruit flavor volatiles. These volatiles are defining elements of the distinct flavors of fruits and are mainly derived from essential compounds, including amino acids, fatty acids, carotenoids (Goff and Klee, 2006;Klee and Tieman, 2013). However, selecting for the quality of fruit aroma has never been a high priority for plant breeders (Goff and Klee, 2006;El Hadi et al., 2013). In fact, research that is focused on improving fruit size, yield, quality, and shelf life has often led to unintended, negative outcome for both aroma and, consequently, flavor (Kader et al., 1977;Ratanachinakorn et al., 1997).
The effect of a volatile on flavor perception is determined by its concentration and perceptible aroma, or odor threshold. As few as 20 of more than 400 volatiles in tomatoes have sufficient concentrations and odor thresholds to contribute to tomato flavor (Baldwin et al., 2000). However, the biosynthetic pathways and regulatory networks are known for volatiles of the greatest economic importance, such as the 20 volatiles aforementioned (Klee, 2010;Klee and Tieman, 2013). Important genes that regulate volatiles can be broadly subdivided into two classes: genes encoding enzymes responsible for synthesis of the end products and genes encoding factors that regulate the pathway (Klee, 2010). The regulation of the pathways of most volatiles is very poorly understood. In fact, even our knowledge on the 20 most important volatiles in tomato is limited.
The genetic and molecular bases of volatiles are also poorly understood. This is primarily due to the polygenic nature and biochemical complexity of flavor/aroma traits and our limited ability to quantify those volatiles (Klee and Tieman, 2013). Improvements in quantification techniques, as well as the capacity for high-throughput genotyping (Agarwal et al., 2008), provide the basis for quantitative trait locus (QTL) analysis of aroma components. Such QTL studies of aroma have already been performed with apple (Zini et al., 2005), grape (Doligez et al., 2006;Obando-Ulloa et al., 2010), and melon (Obando-Ulloa et al., 2010). In tomato, more than 100 quantitative trait loci (QTLs) affecting volatiles and their precursors have been identified (Saliba-Colombani et al., 2001;Causse et al., 2004;Schauer et al., 2006;Tieman et al., 2006;Mathieu et al., 2008;Zanor et al., 2009). Some QTLs specifically alter single volatiles while others can affect several related or even unrelated volatiles (Mathieu et al., 2008).
Genome-wide association studies (GWAS) is a method for mapping the loci responsible for natural variations in a target phenotype (Saidou et al., 2014;Matsuda et al., 2015;Pers et al., 2015). It is based on the identification of significantly associated genetic polymorphisms in a large population (Brachi et al., 2011). GWAS is a reliable, preliminary approach for identifying the locations of QTLs and has been conducted on major fruit quality traits in tomato (Mazzucato et al., 2008;Ranc et al., 2012;Xu et al., 2013;Ruggieri et al., 2014;Zhang et al., 2015). However, few association mapping studies have been performed on QTLs for the main flavor-enhancing volatiles in tomatoes or other crop plants (Kumar et al., 2015). In this study, we performed a GWAS to locate QTLs for flavor-affecting volatiles in tomato. In particular, we detected volatiles in a collection of 174 diverse tomato accessions using gas chromatography-mass spectrometry (GC/MS). We also detected a large number of loci to link their volatile composition with their genotypes. Our results confirmed some previous volatile QTLs, and we identified some chromosome regions that could be important in controlling volatile metabolism.

Plant Materials
The tomato diversity panel consisted of 174 tomato accessions comprised of 123 cherry tomato accessions (Solanum lycopersicum var. cerasiforme) and 51 large-fruit cultivars (S. lycopersicum) ( Table S1). All accessions were grown during the springs of 2013 and 2014 in a completely randomized block design with three replicates (10 plants per replicate), at the research greenhouse of the Tomato Research Group (34 • 24_N, 108 • 07_E) according to standard agronomic practices (Zhang et al., 2015).

Volatile Determinations
We performed analyses of fruit volatiles as described in Tikunov et al. (2005), with minor modifications. We combined all red, ripe fruit produced on the 10 accessions representing each horticultural type and immediately placed them in liquid nitrogen. We then kept them at −80 • C. We transferred 15 g of the finely powdered tomato samples into a 40 mL Teflon cap vial (Thermo Fisher Scientific) with 5 g of NaCl. Ten microliters of 2-octanone (0.125 mg/mL in ethyl alcohol) were then added as an internal standard. We then sealed the vials using a silicone/PTFE septum and a magnetic cap. The closed vials were agitated (500 rpm) and sonicated for 10 min, and incubated at 50 • C for 10 min prior to HS-SPME-GC-MS analysis. We performed three independent reactions for each sample. We extracted headspace volatiles by exposing each sample to a 75 µm carboxen-polydimethylsiloxane SPME fiber (Supelco, USA) for 30 min under continuous agitation (500 rpm) and heating at 40 • C. The fiber was then inserted into an ISQ GC-MS (Thermo Scientific instruments, USA) injection port and the volatiles were desorbed for 3 min at 250 • C. We performed chromatography on an HP-INNOWAX column (60 m × 0.25 mm × 0.25 µm) with helium as the carrier gas, at a constant flow of 1.0 mL min −1 . The temperature of both the GC interface and MS source was 230 • C. The GC temperature program began at 40 • C (2.5 min), and then was raised to 160 • C (5 • C min −1 ) and to 230 • C (10 • C min −1 ) before being held at 230 • C for 5 min. The total run time, including oven cooling, was 40 min. Mass spectra were evaluated in the 35-450 m/z range at a scanning speed of 70 scans s −1 and an ionization energy of 70 eV. We processed the raw data obtained from GC-MS with Xcalibur and AMDIS software and identified the volatile compounds on the basis of the NIST/EPA/NIH Mass Spectral Library (NIST 2008) and Wiley Registry of Mass Spectral Data 8th edition. The retention index (RI) was calculated with a homologous series of n-alkanes (C7-C30, Sigma-Aldrich). The volatiles were semi-quantified according to the method of Baek and Cadwallader (1996) and Hopfer et al. (2012). We calculated the relative concentration of each volatile using the ratio of the areas of the target peak and the internal standard (2-octanone, 0.125 mg/mL, 10 µL) in a total ion current chromatogram with the following equation:

=
Peak area of particular compound Peak area of internal standard (IS) × Concentration of IS We used the relative concentrations to compare the differences in volatile profiles among the 174 accessions.

Genotyping
We extracted DNA from the 174 tomato accessions from fresh leaf tissue following the method of Fulton et al. (1995). All SSR markers were mainly selected from the SOL Genomics Network (http://solgenomics.wur.nl/) and the VegMarks database (http:// vegmarks.nivot.affrc.go.jp/). We used the protocol of Sun et al. (2012) to amplify the markers. Only markers with minor allele frequency (MAF) > 0.05 were genotyped with the whole accessions (Zhang et al., 2015). Finally, we selected a set of 182 polymorphic simple sequence repeat (SSR) for further association mapping (Zhang et al., 2015).

Population Structure
We used the above set of 182 SSR markers to estimate the population structure of the 174 tomato accessions via STRUCTURE 2.3.3 software (Pritchard et al., 2000). We set the number of hypothetical subpopulations (K) at 2-10 in order to evaluate the population structure with an admixture model. We performed 200,000 replicates of the Markov Chain Monte Carlo with a burn-in length of 100,000. We used Evanno transformation method to infer the optimal K of populations (Evanno et al., 2005). The kinship matrix was calculated via SPAGeDi software (Hardy and Vekemans, 2002). We set the diagonal of the matrix to two and negatives values to zero (Yu et al., 2005).

Association Mapping
Decay of LD and the corresponding significance level (P-value) were calculated using TASSEL 2.1 software (Bradbury et al., 2007). We also calculated associations between volatiles and SSR markers using TASSEL 2.1 software (Bradbury et al., 2007). Mixed linear model (MLM) was used in order to reduce false positive associations. We used P < 0.005 as the value to detect associations. We also took P < 0.0003 as the significant value to reduce false positive associations. The amount of phenotypic variation explained by each marker was estimated by R 2 .

Statistics
We used either the SAS 8.1 program (SAS institute, Cary, NC) or the R statistical Software (http://www.r-project.org) version 3.0.2 for statistical analyses. We replaced the values of zero (undetectable) for all volatiles by the smallest non-zero value in the whole dataset (Mathieu et al., 2008). Then, we log 2transformed the volatile quantity values (ng g −1 fresh weight h −1 ) before performing a Two-way ANOVA analysis for all traits. The resulting raw P-values were also corrected for multiple tests using the Benjamini and Hochberg FDR test (Benjamini and Hochberg, 1995). We estimated genetic variance, genetic by environment interaction variance, technical variance, heritability values according to the method of Xu et al. (2013). We developed correlation heat map via HemI 1.0., based on the analysis result among volatiles and accessions.

Variation in Volatiles
Of the more than 400 volatiles reported in tomato, fewer than 20 have been predicted to contribute to the unique tomato flavor based on the concentrations and odor thresholds (Buttery et al., 1987;Goff and Klee, 2006;Klee, 2010;Tieman et al., 2012). Here, we further investigated 28 volatiles that are in sufficient quantities to impact the tomato flavor. We found large variations of up to 1200 x in volatile contents across all of the sampled accessions ( Table 1).
We calculated heritability values for all 28 volatiles based on 2 years of phenotypic characterization. Of these volatiles, nine had a value lower than 0.5 (Table 1). Therefore, volatile values in 2013 and 2014 were evaluated separately for further genome-wide association analyses.
Pearson correlation coefficients (r) among the 28 volatiles were relatively low, based on the mean value of 2 years data (the springs of 2013 and 2014) (lower than 0.5) (Figure 1). However, we observed significant coefficients among some volatiles (Table  S2). For instance, beta-cyclocitral and (Z)-2-heptenal were positively correlated with 1-pentanol (r = 0.464 and 0.417, respectively). In addition, methyl salicylate was negatively FIGURE 1 | Heat map showing the correlation analysis between 28 volatiles in the 174 tomato accessions. Regions in red and yellow indicate positive or negative correlations between traits, respectively (The complete data is available in Table S2). correlated with eugenol, (Z)-2-heptenal, and (f)-2,4-heptadienal (r = −0.406, −0.266, and −0.241, respectively). We observed that volatiles derived from the same pathway had a tendency to cluster together (Figure 2). In addition, we observed large variation among the 28 volatiles in the cluster analysis for all of the accessions (Figure 2). The largest variations were mainly observed for 1-hexenol, hexanal, (Z)-3-hexen-1-ol, (Z)-3-hexenal. Additionally, these four volatiles were all linolenic acid-derived flavor molecules and were closely clustered. (Z)-3-hexen-1-ol can be derived from (Z)-3-hexenal by alcohol dehydrogenase (ADH). Similarly, 1-hexenol can also be derived from hexanal via ADH. Volatiles, structures, identification and their precursors of the selected 28 volatiles used in this research are presented in Table S3.

Molecular Polymorphism
The 174 sampled accessions were genotyped with 182 SSRs. We selected only markers with MAF > 5% for association mapping (Zhang et al., 2015). The distributions of the MAF were different among the three groups of accessions. The average MAF for all of the accessions, S. l. cerasiforme accessions and S. lycopersicum accessions has been described by Zhang et al. (2015).
LD decay was analyzed for all markers on all chromosomes for the 174 accessions. Pairwise r 2 was plotted according to the chromosome genetic distance between loci. Non-linear regression was fitted to the decay of LD over genetic distance. LD on the whole genome for all accessions extended on average over 8 cM for r 2 = 0.2 (Figure 3).

Population Structure
We assessed population structure of the 174 tomato accession using STRUCTURE 2.3.3 software with 182 SSR markers. We found an optimal K = 2 inferred according to Evanno method (Evanno et al., 2005). The inferred population was congruent with S. l. cerasiforme and S. Lycopersicum accessions, respectively (Zhang et al., 2015).

Genome-wide Association Analysis
In order to reduce false positive associations, we used the K+Q model (kinship matrix and genetic structure) to detect associations between the selected 28 volatiles and 182 SSR markers. We analyzed the phenotypic data in 2013 and 2014 separately. In total, we detected 125 marker-trait associations (MTAs) on 28 selected volatiles in 2013 and 2014 (P < 0.005) FIGURE 2 | Cluster analyses of the 28 selected volatiles among the whole accessions. The names of volatiles (right) and the accession codes (bottom) are shown (The accession codes can be seen in Table S1). ( Table 2). Among these, 52 MTAs were detected in both years. Twenty-nine of them are significant associations (P < 0.0003). In 2013, 2014, we detected 82, 95 MTAs, respectively. We detected at least one MTA for each volatile. The only exception is for eugenol and we detected no MTAs for this volatile. The most significant MTA for all volatiles in both year was detected on 6-methyl-5-hepten-2-one. This MTA was detected on TES344 on chromosome 11 (Chr11), explaining 36.38, 33.25% of the phenotypic varation, in 2013 and 2014, respectively. The other most significant MTA was detected for (E)-2-hexen-1-ol. This MTA was detected on SSR 287 (Chr2), explaining 44.51, 42.13% of the phenotypic variation in 2013 and 2014, respectively.

Carotenoid-derived Volatiles
Of the 28 volatiles selected for association mapping, there are three important open chain carotenoid-derived volatiles, 6-methyl-5-hepten-2-one, 6-methyl-5-hepten-2-ol and geranylacetone. There are another three cyclic carotenoidderived volatiles, including beta-ionone, beta-cyclocitral, and beta-damascenone. For 6-methyl-5-hepten-2-one, derived by oxidative cleavage of lycopene, five MTAs were detected. Among these, the associated marker TES344 (Chr11) showed the most significant association value (P = 1.51E-26, in 2013; P = 1.84E-24, in 2014) among all significant MTAs detected for 28 volatiles. For 6-methyl-5-hepten-2-ol, which is directly biochemically linked with 6-methyl-5-hepten-2-one via ADH, we detected six significant MTAs. Among these, three MTAs were detected in both years. The most significant associated marker was TOM166 (Chr3) in 2013, which explained about 19.69% of the total phenotypic variation. We also detected significant association on this marker in 2014. The significantly level is relatively lower, explaining 7.85% of the phenotypic variation. This marker was also significantly associated with 6-methyl-5-hepten-2-one, and explained approximately 9.79, 10.23% of the phenotypic variation, in 2013 and 2014, respectively. Marker TGS827 (Chr3) was also associated with 6-methyl-5-hepten-2-one and 6-methyl-5-hepten-2-ol, and explained 4.86 and 8.56% of the phenotypic variation in 2013, respectively. However, we observed no significant MTAs on both volatiles on this marker in 2014. For geranylacetone, we found 10 MTAs, and five of them were detected in both 2013 and 2014. The most significant of these was with marker SSR122 (Chr6), which explained 16.92, 15.65% of the phenotypic variation, respectively. The other most significant MTA of these was with SSR142 (Chr9). This MTA could explain 19.02, 17.65% of the phenotypic variation, in both years, respectively.   Significant MTAs were also observed for three cyclic carotenoid-derived volatiles, with five, six and five MTAs, respectively. The most significant MTA for beta-cyclocitral was detected on marker TGS1548 (Chr2), both in 2013 and 2014. This MTA explained 14.3, 14.9% of the phenotypic variation, respectively. The most significant association for beta-damascenone was detected on TES816 (Chr6), explaining 13.55, 13.64% of the phenotypic variation, in 2013 and 2014, respectively.

Lipid-derived Volatiles
Of the 28 volatiles in sufficient quantities to impact the tomato flavor, we discovered MTAs for 11 lipid-derived volatiles ( Table 2). For (E)-2-hexen-1-ol, we found six MTAs. The most significantly associated marker was SSR287 (Chr2). This MTA represented one of the most significant MTAs for all 28 volatiles, and explained 44.51, 42.13% of the total phenotypic variation, in 2013 and 2014, respectively. Only two MTAs were detected for (E)-2-hexenal, with one MTA in each year, respectively. For 1-penten-3-one, four MTAs were detected either in 2013 or 2014. For 1-penten-3-ol, which is directly biochemically linked with 6-methyl-5-hepten-2-one via ADH, only two MTAs were detected. For 1-pentanol, six MTAs were detected and two had a high association value. The two associated markers were SSR345 (Chr12) and SSR133 (Chr4). The associated marker SSR345 explained 22.84, 22.15% of the phenotypic variation, in 2013, 2014, respectively. Marker SSR133 accounted for 26.72, 25.34% of the phenotypic variation, respectively. For (Z)-2-heptenal, we discovered nine significant MTAs, and six of them were detected both in 2013 and 2014. Among the nine MTAs, five of which were located on Chr9 in the region from 30.1 to 56.86 cM. The most significant associated marker was TGS1032, located at about 30.1 cM on Chr9. This MTA which explained 18.76, 16.25% of the phenotypic variation, in 2013, 2014, respectively.

Amino Acid-derived Volatiles
For 3-methylbutanol, a leucine-derived flavor volatile, we found 10 MTAs. The two most significant MTAs involved with marker SSR92 (Chr1) and SSR13 (Chr5). These two MTAs responsible for 30.69 and 32.25% of the total phenotypic variation in 2013, respectively. In 2014, they accounted for 21.71, 21.76% of the total phenotypic variation. For 2-phenylethanol, a phenylalaninederived volatile, we observed five MTAs in either 2013 and 2014. The most significant marker was TES1521 (Chr7), explaining 9.19, 6.2% of the phenotypic variation, respectively.

Terpenoid-derived Volatiles
The two most important terpenoid-derived volatiles in tomato are neral and geranial, which are primarily localized in tomato leaves and stems (Buttery and Ling, 1993). However, we still observed these two volatiles in fruits in many tomato accessions. We observed four MTAs for neral and six for geranial in either year. For neral, the most significant MTA was detected on SSR92 (Chr1) in 2014, explaining about 14% of the phenotypic variation. No significance was observed on this marker in 2013. For geranial, the most significant MTA was also detected in 2014. This MTA was detected on marker SSR150 (Chr4), accounting for 3.45% of the phenotypic variation.

DISCUSSION
Genome-wide association study (GWAS) is a useful tool to detect candidate loci responsible for the natural variations in a targeted phenotype. This tool can identify significant associations between polymorphic molecular markers and targeted traits in a large natural population (Weigel, 2012;Matsuda et al., 2015). However, many factors can impact the results of association mapping, including type and size of mapping population, targeted traits, number of environments and years for phenotypic evaluations and the type and genome coverage of molecular markers (Ruggieri et al., 2014). Thus, we used a large sample of cherry and large fruited tomato accessions and a MLM to reduce false positive associations in GWAS (Zhao et al., 2007;Bernardo, 2008). The population in our study composes 123 cherry tomato accessions and 51 large fruited accessions and we think the size of our collection was adequate for GWAS (Ranc et al., 2012;Xu et al., 2013;Ruggieri et al., 2014).

Phenotypic and Genetic Diversity
This whole studied population composes 123 cherry tomato and 51 large fruited accessions, representing a large diversity ( Table 1). We found that this population has a large phenotyic diversity, such as fruit weight, soluble solid content, and lycopene content, etc. (Zhang et al., 2015). The large variations of the selected 28 crucial volatiles up to 1200x confirmed this ( Table 2). The studied population could be mainly divided into two subgroups, cherry and large-fruited (Zhang et al., 2015). The higher MAF value among the studied population confirmed that S. l. cerasiforme (cherry tomato) is a mosaic of S. lycopersicum (large fruited tomato) and S. pimpinellifolium (wild species) (Frary et al., 2000;Zhao et al., 2007;Ranc et al., 2008;Xu et al., 2013;Zhang et al., 2015). The linkage disequilibrium of the whole genome decays at about 8 cM, which is consistent with previous studies (van Berloo et al., 2008;Xu et al., 2013;Zhang et al., 2015). Therefore, using a large collection of cherry tomato accessions together with cultivated tomato accessions is useful to overcome the high linkage disequilibrium value of tomato genome (Xu et al., 2013;Zhang et al., 2015).

Associations Confirmed Identified Volatile QTLs
The tomato is an excellent model for investigating the molecular basis of flavor using association mapping (Klee and Tieman, 2013). To date, few association mapping studies have focused on volatiles in major crops (Kumar et al., 2015). Here, we conducted GWAS between 28 most volatiles in tomato and SSR markers and found significant MTAs for most of the studied volatiles ( Table 2). In tomato, over 50 QTLs affecting volatile levels have been identified, mainly using recombinant inbred lines (RIL) or introgression lines (IL) (Saliba-Colombani et al., 2001;Tieman et al., 2006;Mathieu et al., 2008;Zanor et al., 2009). However, the size of introgressed regions is large large (about 10-40 cM) (Zanor et al., 2009) and the results among prior studies differed. For example, prior studies have found different QTLs for the 6-emthyl-5-hepten-2-one, an important carotenoid-derived volatile. In particular, one major QTL mhn4.1 impacting 6emthyl-5-hepten-2-one was detected on chromosome 4 (Chr4) using an introgression line (Saliba-Colombani et al., 2001). However, two different QTLs were detected on 2A, 3C, and 12D in other IL populations (Tieman et al., 2006;Mathieu et al., 2008;Figure 4). In our research, we found six significant associations (P < 0.005) for 6-emthyl-5-hepten-2-one. Among them, the most significant associations were detected on Chr11 (TES344) and Chr4 (SSR188). These two associations were also found in the near region of two QTLs for 6-emthyl-5-hepten-2-one in a previous study by Tieman et al. (2006).
Twenty-five significant MTAs were detected for 15 volatiles on Chr4 ( Table 2). The associations observed on Chr4 showed support for five previously-identified QTLs, including QTLs for3-methylbutanol and (E)-2-pentenal by Tieman et al. (2006); QTLs detected for 2-phenythanol and 1-penten-3-ol by Mathieu et al. (2008); and one QTL detected for eugenol by Zanor  Tieman et al. (2006) were indicated by adding [2] to the end of volatiles; QTLs identified by Mathieu et al. (2008) were indicated by [3] to the end of volatiles; QTLs identified by Zanor et al. (2009) were indicated by [4] to the end of volatiles. CCD, carotenoid cleavage dioxygenases. CCD1A and CCD1B are two genes from the tomato genome data and were indicated by [5] to the end of genes. In Simkin et al. (2004), these two genes were listed as LeCCD1A and LeCCD1B. et al. (2009). However, only a few co-localized QTLs with the significant associations were observed on Chr4 (Figure 4). This could be mainly due to the limited volatiles sampled in previous studies and the limited molecular markers (Saliba-Colombani et al., 2001;Tieman et al., 2006;Mathieu et al., 2008;Zanor et al., 2009). In fact, among the28 volatiles that we sampled, only about 10 volatiles were mentioned in previous studies. Based on previous GWAS on tomato, the LD of tomato genome decayed at approximately 5-20 cM (Mazzucato et al., 2008;van Berloo et al., 2008;Ranc et al., 2012;Xu et al., 2013). Therefore, a 5-10 cM genome coverage should be enough to detect positive associations, especially by using SSRs. Our research revealed more polymorphic loci impacting tomato volatile profiles. However, among the detected 125 MTAs, only 52 of them were detected in both years. The overall significance value is still relatively low. A 5.2 cM genome coverage could detect positive associations. Still, more markers are needed to have a higher genome resolution to detected more candidate QTLs or genes. In addition, the tomato genome data has been available. In order to conduct more efficient association mapping, marker assisted selection and fine mapping of QTLs, high-throughput SNP chips via conducting re-sequencing on the core tomato accessions would greatly promote our further research.

Volatile Biosynthesis Pathways
Tomato volatiles are mainly derived from four pathways, including the fatty acid, amino acid, terpenoid, and carotenoid pathways. The metabolic pathways of the selected volatiles in this study are shown in Figure 5. All volatiles used in this study are directly or indirectly linked with the tricarboxlic acid cycle (TCA), indicating the fundamental significance of primary metabolism. However, our understanding of the biosynthesis pathways and regulatory networks is only known for a small FIGURE 5 | Summary of metabolic pathways leading to the 28 important flavor-associated volatile synthesis. Volatiles used in this study are shown in blue. The precursor for 2-isobutylthiazole is not clear and is not listed in this summary. Dashed lines indicate multiple step reactions. Enzymes or genes involved in some volatile synthesis are listed. BSMT, benzoic acid and salicylic acid carboxyl methyltransferase; PAAS, phenylacetaldehyde synthase; PAL, phenylalanine ammonia-lyase; CCD, carotenoid cleavage dioxygenase; LOX, lipoxygenase; ADH, alcohol dehydrogenase; HPL, hydroperoxide lyaser; 3Z,2E-EI, 3Z,2E-enal isomerase; IPP, isopentenyl pyrophosphate; GPP, geranyl diphosphate. portion of the most economically significant voaltiles (Klee, 2010). Even for some most important volatiles in tomato, the synthetic pathways have been recently established or remain unknown. For instance, the precursor and the corresponding regulation pathways is unknown for 2-isobutylthiazole, one important sulfur-containing compound in tomato (Iranshahi, 2012). Using significant correlations between traits could be used to build the network structure of the poorly known pathways (Carli et al., 2009).

Volatile Biosynthetic Genes
Knowledge of synthetic pathways and the regulatory networks can greatly facilitate the identification of genes encoding biosynthetic enzymes. This can be accomplished by exploiting the whole genomic or expressed sequence databases (Klee, 2010). For instance, Klee and Tieman (2013) reviewed several genes with validated functions in the metabolism of tomato volatiles, including PAR, phenylacetaldehyde reductase; loxC, 13-lipoxygenase; CCD1, carotenoid cleavage dioxygenase; and CXE1, carboxylesterase. LoxC catalyzes the first step in the metabolic pathway that converts 18:2 and 18:3 fatty acids to C6 volatiles, including (Z)-3-hexenal, hexanal, (Z)-3-hexen-1-ol, hexyl alcohol, and hexylacetate (Chen et al., 2004). Volatile terpenoid compounds, including neral, geranial, limonene and beta-cyclocitral, etc. could potentially be derived from carotenoids. These volatiles are all important component of flavor and aroma in tomato (Simkin et al., 2004). LeCCD1A (82,184,195,219 bp) and LeCCD1B (82,194,212,510 bp) are two closely related genes located on chromosome 1 potentially encoding carotenoid cleavage dioxygenases and LeCCD1B. LeCCD1A had a great impact on the concentration of beta-ionone and geranylacetone and LeCCD1B had a high expression level in ripening fruit (Simkin et al., 2004). In our search, we identified one significant MTA for 6-methyl-5-hepten-2-ol, one important volatile derived from lycopene and another significant MTA for geraylacetone. These two associations were both associated with marker TGS1156. However, the significant level was relatively low and not all carotenoid-derived volatiles were associated with this marker. This could due to the limited markers in this region or the weak marker polymorphic linkage with these two genes. At least five lipoxygenases (TomloxA, TomloxB, TomloxC, TomloxD, and TomloxE) in tomato have been identified. They can greatly impact the generation of C6 aldehyde and alcohol volatiles derived from fatty acids, such as n-hexanal, (Z)-3-hexenal, (E)-2hexenal, and (Z)-3-hexenol, in both fruit and leaf tissues (Chen et al., 2004). However, researchers have not yet established the biosynthetic pathways for many of the most important volatiles. Here, we selected the 28 most important volatiles in tomato to perform genome-wide association mapping. Our research points to some chromosome regions that may play a significant role in tomato volatile metabolism. However, the marker coverage was relatively small. Combining with the availability of the tomato genome data, and higher density of SSR marker coverage (or SNPs), this research will promote the isolation of novel genes impacting volatiles.

CONCLUSIONS
Phenotypic evaluation on the 28 most important tomato volatiles detected by GC-MS revealed a broad phenotypic variability within diverse accessions across tomato. GWAS between the selected 28 volatiles and 182 SSR markers allowed detection of 125 significant MTAs (P < 0.005). We detected at least one MTA for 27 volatiles. Notably, some associations had a very high significant value. For instance, we found a highly significant association between 6-methyl-5-hepten-2-one. This MTA accounted for up to 30% of the phenotypic variation in both year. The other most significant association was detected between (E)-2-hexen-1-ol and SSR287 (Chr2). Some associations were colocalized with previously identified QTLs. We identified several chromosome regions that could greatly impact tomato volatile metabolism. Our results represent a step toward accelerating the rate of flavor related gene discovery.

AUTHOR CONTRIBUTIONS
JZ and ZZ designed the study. JZ and JTZ carried out the main GC-MS analysis and molecular mapping, analyzed the data, and drafted the manuscript. YL provided the tomato seeds and participated in its design. ML participated in its design and the GC-MS analysis. YX, JL, PC, and FY participated in the phenotype analysis and molecular mapping. All authors corrected and approved the final version.