ORIGINAL RESEARCH article

Front. Plant Sci., 09 January 2018

Sec. Plant Breeding

Volume 8 - 2017 | https://doi.org/10.3389/fpls.2017.02232

Genome-Wide Association Study Identifying Candidate Genes Influencing Important Agronomic Traits of Flax (Linum usitatissimum L.) Using SLAF-seq

  • 1. Institute of Bast Fiber Crops, Chinese Academy of Agricultural Sciences, Changsha, China

  • 2. Institute of Industrial Crops, Heilongjiang Academy of Agricultural Sciences, Harbin, China

  • 3. College of Agriculture, Northeast Agricultural University, Harbin, China

Abstract

Flax (Linum usitatissimum L.) is an important cash crop, and its agronomic traits directly affect yield and quality. Molecular studies on flax remain inadequate because relatively few flax genes have been associated with agronomic traits or have been identified as having potential applications. To identify markers and candidate genes that can potentially be used for genetic improvement of crucial agronomic traits, we examined 224 specimens of core flax germplasm; specifically, phenotypic data for key traits, including plant height, technical length, number of branches, number of fruits, and 1000-grain weight were investigated under three environmental conditions before specific-locus amplified fragment sequencing (SLAF-seq) was employed to perform a genome-wide association study (GWAS) for these five agronomic traits. Subsequently, the results were used to screen single nucleotide polymorphism (SNP) loci and candidate genes that exhibited a significant correlation with the important agronomic traits. Our analyses identified a total of 42 SNP loci that showed significant correlations with the five important agronomic flax traits. Next, candidate genes were screened in the 10 kb zone of each of the 42 SNP loci. These SNP loci were then analyzed by a more stringent screening via co-identification using both a general linear model (GLM) and a mixed linear model (MLM) as well as co-occurrences in at least two of the three environments, whereby 15 final candidate genes were obtained. Based on these results, we determined that UGT and PL are candidate genes for plant height, GRAS and XTH are candidate genes for the number of branches, Contig1437 and LU0019C12 are candidate genes for the number of fruits, and PHO1 is a candidate gene for the 1000-seed weight. We propose that the identified SNP loci and corresponding candidate genes might serve as a biological basis for improving crucial agronomic flax traits.

Introduction

Flax (Linum usitatissimum L.) is one of the oldest plants cultivated for fiber and edible oil and remains an important cash crop worldwide. Breeding selection for fiber flax or linseed flax has resulted in two plant types, which differ considerably in agronomic performance (Diederichsen and Ulrich, 2009). Compared with linseed flax cultivars, fiber flax plants are typically taller, with fewer branches and fruits and lower seed production (Booth et al., 2004). Therefore, agronomic traits directly affect the seed yield of linseed flax and the bast fiber quality of fiber flax. In recent years, traditional breeding methods have been employed to introduce genetic changes that improve the agronomic traits of flax. However, agronomic features are complex, quantitative traits that are controlled by multiple genes. Consequently, traditional breeding approaches do not satisfy the demand for improving flax traits. Thus, far a number of genetic studies on agronomic traits of flax have been reported. For example, amplified fragment length polymorphism (AFLP) and simple-sequence repeat (SSR) markers were used to perform QTL analysis for four flax traits, revealing several yield-related QTLs (Gehringer et al., 2006). In addition, 464 SSR markers were employed to perform QTL analysis for nine traits in a natural population composed of 390 flax germplasm resources that were planted in eight environments (Soto-Cerda et al., 2014); in that study, the authors identified 12 markers that were closely linked to six traits. A genome-wide scan was performed using 407 core germplasm resources and 448 SSR markers before association mapping was conducted to elucidate the non-neutral genomic regions potentially underlying divergent selection between fiber and linseed cultivars, and the candidate genes involved in the biosynthesis of the cell wall, lignin, and fatty acids were analyzed (Soto-Cerda et al., 2013). Furthermore, Deng et al. (2014) used 61 pairs of SSR primers, 91 pairs of expressed sequence tag (EST)-SSR primers, and 102 pairs of genomic-SSR primers to perform association analysis for yield-related traits in 182 core germplasm resources of flax; the authors identified 57 high-quality allelic variations, including 31 showing yield-enhancing effects and 26 showing the opposite. Nevertheless, these association studies were all based on common molecular markers (e.g., SSR, EST-SSR) that might not be sufficient in light of the rapid development of new sequencing technologies. Furthermore, the investigations cited above were not enough to elucidate the genes related to the complicated agronomic traits of flax.

In recent years, genome-wide association studies (GWAS) based on next-generation sequencing (NGS) technology have become the new approach for improving crop traits. GWAS is suitable for phenotypic data under multiple environments, thereby reducing environment-induced errors and enhancing results accuracy (Hall et al., 2010). Once the genotype data of a population are available, GWAS can be performed to examine multiple traits (Atwell et al., 2010). As such, the approach provides a basis for elucidating the genetic structures of the complex traits of a crop. The resulting association alleles can be used for marker-assisted molecular breeding and are crucial for innovating germplasm resources and improving cultivars. So far, GWAS related to important agronomic traits have been reported in several crops, including maize (Xue et al., 2013; Farfan et al., 2015), rice (Huang et al., 2010; Han et al., 2016), and soybean (Sonah et al., 2015; Zhang et al., 2015). However, an NGS-based GWAS analysis examining agronomic traits in flax (L. usitatissimum) has not been reported.

Previous GWAS have mostly been based on SNP array technology, which can detect known SNP loci but not new loci (Vilkki et al., 2013; Zhang et al., 2016). In light of this limitation, a high-throughput sequencing-based technology known as specific locus-amplified fragment sequencing (SLAF-seq) was developed (Sun et al., 2013). In comparison with other technologies, SLAF-seq has the following advantages: (i) generation of high-density SNP loci numbering in the millions after one sequencing reaction, (ii) capability of detecting novel SNP loci in unknown mutation-harboring loci compared with SNP arrays, (iii) suitability for any species regardless of the presence of a reference genome, and (iv) a higher rate of identified SNP loci that become genuine association markers. As a consequence, the technology has been applied to many crops including rice, soybean, sesame, cucumber, Brassica napus, etc. (Zhang et al., 2013; Xu et al., 2014; Geng et al., 2016; Han et al., 2016; Li et al., 2016).

Here, we examined 224 core germplasm resources of flax grown under different environmental conditions for the main agronomic traits of plant height, technical length, number of branches, number of fruits, and 1000-grain weight. Subsequently, SLAF-seq was employed to perform GWAS and recover potential alleles controlling these traits. To our knowledge, this is the first SLAF-seq-based GWAS with the goal to identify SNP loci and candidate genes linked to important agronomic flax traits. The results provide a basis for molecular marker (related to main agronomic traits)-assisted breeding and improvement of the main agronomic traits in flax.

Materials and methods

Experimental materials and survey of traits

The core germplasm resources of 224 flax accessions were collected from institutions in China and other countries (Table S1). They were sown at the Harbin Experimental Base of Heilongjiang Academy of Agricultural Sciences (45°65′N, 126°68′E), Harbin, China, in April of 2015 and 2016 (2015HRB, 2016HRB) as well as at the Lanxi Experimental Base (6°27′N, 126°28′E), Lanxi, China, in April of 2016 (2016LX). The average annual rainfall from seeding to harvest was 390.1 mm in 2015HRB and 453 mm in 2016HRB, and the average annual temperature was 5.26° and 4.99°C. The average annual rainfall from seeding to harvest was 504.4 mm in 2016LX, and the average annual temperature was 3.11°C. The experiment at each location used a randomized completed block design with three replicates. Each cultivar was planted in triplicate in 2-m lines, with a 20-cm inter-line gap. The field management was the same as the local field management. At the maturing stage, ten plants were selected from each replicate for phenotyping. The five agronomic traits (plant height, technical length, branch number, fruit number, 1,000-grain weight) were investigated. Plant height was measured as the distance between the cotyledon scar and the top of the first-degree branch. Technical length was measured as the distance between the cotyledon scar of the flax plant and the base of the first-degree branch below the inflorescence. The branch number was the number of first-degree branches on the top of the main stem. The fruit number was the number of all fruits that had seeds on the top of the main stem. The 1,000-grain weight was the absolute weight of 1,000 seeds (water content 9%) that were mature, full and clean.

Extraction of genomic DNA

Genomic DNA was isolated from fresh leaves harvested from 20-day-old seedlings of flax. The Tiangen plant total genomic DNA extraction kit (Tiangen Biotech Co. Ltd., Beijing, China) was used for the genomic DNA extraction. A NanoDrop 2000 (Thermo Scientific, Massachusetts) was used to determine the DNA concentration and quality to ensure that DNA samples met the requirements of the sequencing reaction (concentration ≥ 18 ng/μL; volume ≥ 30 μL).

Modification of the genomic DNA

First, the restriction enzyme HaeIII was used to digest the genomic DNA in a 50 μL aqueous solution containing 500 ng of genomic DNA, 41 μL of NEB Buffer (10 ×), and 0.12 μL of HaeIII (1 U/μL). This solution was incubated at 37°C for 15 h. The resulting DNA was column-purified using a QIAGEN kit and solubilized in 50 μL of EB (0.01 mol/L). The sticky ends of the digested DNA fragments were filled in, and their 5′ ends were then phosphorylated using a 100 μL solution containing 30 μL of purified DNA (50 ng/μL), 10 μL of T4 DNA Ligase Buffer (containing 10 mmol/L final concentration ATP), 4 μL dNTP Mix (10 mmol/L), 5 μL of T4 DNA polymerase (5 U/μL), 1 μL of Klenow fragment (5 U/μL), and 5 μL of T4 polynucleotide kinase (10 U/μL). This solution was incubated at 20°C for 30 min in a thermocycler. The DNA was column-purified using a QIAGEN kit and solubilized in 33 μL of EB (0.01 mol/L). Next, a base was added to the 3′ ends of the 5′ phosphorylated DNA fragments, allowing them to connect to the Solexa adaptor, which has a T base at its 5′ end, using the following conditions: 32 μL of purified DNA (50 ng/μL), 5 μL of Klenow Buffer (10 ×), 10 μL of dATP (1 mmol/L), and 3 μL of Klenow Exo (5 U/μL). The solution was placed in a 37°C water bath for 30 min. The DNA was column-purified using a QIAGEN kit and solubilized in 10 μL of EB (0.01 mol/L). Finally, the Solexa adapter was attached to the DNA fragments to allow them to hybridize in the flow cells in the sequencing reactions. The reaction conditions were as follows: the solution containing 10 μL of purified DNA (50 ng/μL), 25 μL of DNA Ligase Buffer (2 ×), 10 μL of Adapter (5 pmol/μL), and 5 μL of DNA Ligase (5 U/μL) was incubated at 20°C for 15 min in the polymerase chain reaction (PCR) system. The DNA was column-purified using a QIAGEN kit and solubilized in 30 μL of EB (0.01 mol/L).

PCR amplification and sequencing

Based on the restriction analysis of PAProC (http://www.paproc.de/) (Nussbaum et al., 2001), DNA fragments of 500–580 bp were gel purified and PCR amplified using a forward primer (5′-AATGATACGGCGACCACCGA-3′) and a reverse primer (5′-CAAGCAGAAGACGGCATACG-3′). PCR amplification was performed in a 40 μL aqueous solution containing 8 μL of purified DNA (50 ng/μL), 1.5 μL of forward primer (50 pmol/μL), 1.5 μL of reverse primer (50 pmol/μL), 9 μL of dNTP mix (10 mmol/L) and 20 μL of Phusion DNA polymerase (2 U/μL). The amplification procedure was as follows: pre-denaturation at 98°C for 30 s, followed by 18 amplification cycles of denaturation at 98°C for 10 s, annealing at 65°C for 30 s, and extension at 72°C for 30 s, before a final extension at 72°C for 5 min. After the reaction, the DNA was column-purified using a QIAGEN kit and solubilized in 30 μL of EB (0.01 mol/L). Purified DNA samples were quantified using the Qubit system before bridge amplification was performed on the surface of the flow cells to generate DNA clusters. The PCR products were re-purified and then prepared for paired-end sequencing on an Illumina HiSeq 2500 sequencing platform (Illumina, San Diego, CA, USA).

Data processing and data submission information

Raw sequencing reads were separated using barcode sequences: Illumina SLAF libraries were barcoded with standard Illumina multiplex adaptors and pooled for sequencing in sets of three samples to generate an average of 6-fold sequence coverage per sample (Purcell et al., 2007; Healey et al., 2014). Low-quality reads (QC score < 20) were removed before SOAP 2.20 (Sun et al., 2013) was employed to align the resulting reads with the reference genome of Linum usitatissimum v1.0 (https://phytozome.jgi.doe.gov/pz/portal.html#!search?show=KEYWORD) (Wang et al., 2012). A read was considered valid if both ends mapped onto the genome and could be used to define SLAF markers. Based on the results of alignment and correction, groups with a mean sequencing depth of 4 were recruited to define SLAF markers. Next, the number of SLAF markers per 100 K genome was recorded to obtain the distribution of SLAF markers in the scaffolds. Finally, SNP loci were detected in the collection of the 224 specimens using pre-defined SLAF markers, whereby the number of SNP loci per 100 K genome was documented. Raw Illumina sequences were deposited in the National Center for Biotechnology Information (NCBI) and can be accessed in the database (https://www.ncbi.nlm.nih.gov/) under accession SRP116365 or SRS2474942 for leaf.

Population structure analysis and significant SNP discovery

The population structure analysis used 146,959 SNPs to infer the genetic background of an accession that belongs to a cluster under a given number of populations (K). The number of genetic clusters was predefined as K = 1–5 for all accessions and was calculated using Admixture software (Hardy and Vekemans, 2002; Alexander et al., 2009).

LD (linkage disequilibrium) between pairs of SNPs was estimated by using squared allele frequency correlations (r2) in Tassel version 3.0 (Bradbury et al., 2007). Each significant SNP was evaluated for the extent of local LD. The region was defined as extending to where LD between nearby SNPs and the lead SNP decayed to r2 > 0.8, MAF > 0.05. Only SNPs with an MAF more than 0.05 and <10% missing data were used. The SNP nomenclature used in this study is based on the number of scaffolds that contained an SNP plus the position of the polymorphism in the scaffold.

Genome-wide association analyses

The efficient model was performed with both GLM and MLM using Tassel software. The population structure matrix generated from Admixture was used as the Q matrix for the GLM model. P-values of P ≤ 1.268 × 10−5 (P = 0.01/n; n = total markers used, which is roughly a Bonferroni correction, corresponding to -log10 (P) = 5, red line) and P ≤ 1.268 × 10−6 (P = 0.1/n; n = total markers used, which is roughly a Bonferroni correction, corresponding to -log10 (P) = 6, blue line) were defined as the genome-wide control threshold and suggestive threshold, respectively. The genes within 10 Kb of a significant SNP's flanking region were reported as candidate genes.

Results

Descriptive statistics of agronomic traits

Under three different environmental conditions, plant height had a minimum value of 42.20 cm, a maximum value of 125.40 cm, and a maximum coefficient of variation of 18.09%; technical length had a minimum value of 27.60 cm, a maximum value of 103.20 cm, and a maximum coefficient of variation of 22.76%; number of branches had a minimum value of 2, a maximum value of 12, and a maximum coefficient of variation of 55.57%; number of fruits had a minimum value of 2, a maximum value of 39, and a maximum coefficient of variation of 53.47%; and 1,000-grain weight had a minimum value of 3.18 g, a maximum value of 9.21 g, and a maximum coefficient of variation of 16.27%. The results therefore indicated that the test germplasm resources of flax contained extraordinary genetic variation (Table 1).

Table 1

EnvironmentsTraitsMinimum valueMaximum valueRangeMean valueStandard deviationCoefficient of Variation (%)
2015HRBPlant height (cm)42.20109.5067.3081.9414.8218.09
Technical length (cm)27.6094.8067.2064.3814.2722.19
Number of branches (unit)2.0012.0010.004.501.0824.00
Number of fruits (unit)4.0029.0025.009.003.7341.44
1,000-grain weight (g)3.948.914.975.000.7715.32
2016HRBPlant height (cm)49.50120.1070.6089.7715.6517.43
Technical length (cm)35.20103.2068.0070.5216.0522.76
Number of branches (unit)2.0012.0010.004.001.1328.25
Number of fruits (unit)2.0030.0029.006.163.1250.64
1,000-grain weight (g)3.188.925.744.960.7915.92
2016LXPlant height (cm)54.20125.4071.291.4514.4615.81
Technical length (cm)30.30100.4070.165.8114.9222.67
Number of branches (unit)2.0010.008.005.473.0455.57
Number of fruits (unit)4.0039.0035.0010.065.3853.47
1,000-grain weight (g)3.829.215.395.100.8316.27

Results of five important agronomic traits derived from the flax germplasm resources.

Sequencing results

SNP detection was performed in the collection of 224 germplasm resources of flax based on the predefined 346,639 SLAF tags, which generated a total of 584,987 SNP loci (MAF ≥ 0.05). Considering both the SLAF and SNP data, we defined the SLAF markers associated with SNPs as polymorphic SLAF markers to thereby examine the SLAF polymorphisms. Our analysis yielded a total of 146,959 polymorphic SLAF markers with a mean depth of 7.2. After quality control, there were 34,932 SNP loci used for subsequent GWAS analyses (Table S2).

Analysis of population structure

The Admixture software was used to analyze the population clustering and structure of the 224 germplasm resources (Figures 1A,B). Specifically, clustering was first performed assuming that the number of clusters (K) was between 1 and 10. Then, the results were cross-validated to determine that the optimal K-value was 3 (according to the valley of the error rates of cross-validation). In other words, our results implied that the collection most likely originated from three ancestors. Given that population stratification might affect the accuracy of association analysis, we generated QQ plots of individual traits (Supplementary Figures 1–5). The results indicated that the observation values (ordinate) generally matched with the corresponding expected values (abscissa), suggesting that the association analysis did not produce any false negativity due to population stratification. Hence, the GWAS results were reliable.

Figure 1

Genome-wide association analysis

SNP loci displaying significant correlation with plant height

The GLM and MLM models of TASSEL were employed to perform GWAS, which revealed that nine SNP loci were significantly associated with plant height (P < 1.26E-06). The relevant Manhattan plots and QQ plots of the two models and three environments are shown in Supplementary Figure 1. The GLM generated nine SNP loci in the three environments, including six in 2015HRB, one in 2016HRB, and two in 2016LX. However, there was not a single SNP that occurred in more than one environment. In comparison, the MLM only generated two SNP loci (scaffold344_309662 and scaffold51_1349321) in 2015HRB, both of which were also identified by the GLM in 2015HRB. The genes closest to the two SNP loci include UGT (UDP-glycosyltransferase) and PL (Pectate lyase). Moreover, the genes closest to the other seven SNP loci were CBP (Calcineurin B-like protein), PI-PLC X (PI-PLC X domain-containing protein), SPP (Squamosa promoter-binding-like protein), PPR (PPR repeat family), PSP (Pectate lyase superfamily protein), UF (Ubiquitin family), and CS (Cellulose synthase) (Table 2).

Table 2

ModelEnvironmentSNP position (bp)scaffoldLocationP-valueNearest geneDistance to SNP (kb)
GLM2015HRBscaffold112_114241scaffold1121142417.43E-07Calcineurin B-like protein (CBP)upstream 0.27
scaffold1491_318496scaffold14913184961.91E-07PI-PLC X domain-containing protein (PI-PLC X)interior
scaffold31_1800846scaffold3118008466.65E-07Squamosa promoter-binding-like protein (SPP)downstream 0.642
scaffold344_309662scaffold3443096621.11E-07UDP-glycosyltransferase 1 (UDP)downstream 4.558
scaffold51_1349321scaffold5113493218.08E-07Pectate lyase (PL)downstream 8.566
scaffold59_572553scaffold595725531.06E-07PPR repeat family (PPR)upstream 0.52
2016HRBscaffold156_641874scaffold1566418747.27E-07Pectate lyase superfamily protein (PSP)downstream 0.51
2016LXscaffold147_367986scaffold1473679861.10E-07Ubiquitin family (UF)upstream 8.3
scaffold859_123972scaffold8591239724.22E-07cellulose synthase (CS)downstream 0.56
MLM2015HRBscaffold344_309662scaffold3443096621.11E-07UDP-glycosyltransferase (UDP)downstream 4.558
scaffold51_1349321scaffold5113493218.08E-07Pectate lyase (PL)downstream 8.566

Associated single nucleotide polymorphisms (SNPs) and the nearest genes for plant height traits of flax.

SNP loci displaying significant correlation with technical length

Three SNP loci, identified by only the GLM in two environments, were found to be significantly associated with technical length (P < 1.26E-06) (Supplementary Figure 2). Among these, scaffold297_275113 and scaffold361_14957 were identified in 2015 HRB, whereas scaffold273_68457 was identified in 2016 HRB. Genes closest to the three SNP loci included HP (Hypothetical protein), VTP (Vesicle transport protein), and MIF (Macrophage migration inhibitory factor) (Table 3).

Table 3

ModelEnvironmentSNP position (bp)scaffoldLocationP-valueNearest geneDistance to SNP (kb)
GLM2015HRBscaffold297_275113scaffold2972751133.96E-07hypothetical protein(HP)downstream 1.83
scaffold361_14957scaffold361149579.44E-07Vesicle transport protein(VTP)upstream 1.19
2016HRBscaffold273_68457scaffold273684571.18E-06Macrophage migration inhibitory factor(MIF)interior

Associated single nucleotide polymorphisms (SNPs) and the nearest genes for the technical length trait of flax.

SNP loci displaying significant correlation with number of branches

Twenty-one SNP loci exhibited a significant association with the number of branches (P < 1.26E-06). The relevant Manhattan plots and QQ plots of the two models and three environments are shown in Supplementary Figure 3. GLM identified nine SNP loci in 2015HRB, two SNP loci in 2016HRB, and nine SNP loci in 2016LX. Among these, eight SNP loci were identified in both 2015HRB and 2016LX (scaffold116_30201, scaffold156_1203677, scaffold1863_545, scaffold353_773806, scaffold42_494571, scaffold464_754364, scaffold635_43971, and scaffold977_784147). MLM identified six SNP loci in 2015HRB, two SNP loci in 2016HRB, and seven SNP loci in 2016LX. Among these, six SNP loci were identified in both 2015HRB and 2016LX (scaffold116_30201, scaffold156_1203677, scaffold1863_545, scaffold353_773806, scaffold464_754364, and scaffold977_784147). There were eight SNP loci co-identified by both the GLM and MLM (scaffold116_30201, scaffold156_1203677, scaffold1863_545, scaffold353_773806, scaffold464_754364, scaffold977_784147, scaffold359_282990, and scaffold359/289139). In addition, six SNP loci displayed associations in both models and were identified in two environments (2015HRB and 2016LX) (scaffold116_30201, scaffold156_1203677, scaffold1863_545, scaffold353_773806, scaffold464_754364, and scaffold977_784147). The genes closest to the six SNP loci were GRAS (GRAS domain family), GST (Glutathione S-transferase), PORR (Plant organelle RNA recognition domain), PIP5K (Phosphatidylinositol-4-phosphate 5-Kinase), XTH (Xyloglucan endotransglucosylase/hydrolase), and DDR (DNA-damage-repair) (Table 4).

Table 4

ModelEnvironmentSNP position (bp)scaffoldLocationP-valueNearest geneDistance to SNP (kb)
GLM201HRBscaffold116_30201scaffold116302013.86E-11GRAS domain family (GRAS)upstream 9.57
scaffold156_1203677scaffold15612036772.29E-11Glutathione S-transferase (GST)downstream 0.52
scaffold1863_545scaffold18635458.39E-11Plant organelle RNA recognition domain (PORR)upstream 6.46
scaffold212_601171scaffold2126011711.63E-07Cytochrome P450 (P450)upstream 4.36
scaffold353_773806scaffold3537738067.04E-11Phosphatidylinositol-4-phosphate 5-Kinase (PIP5K)downstream 6.62
scaffold42_494571scaffold424945711.79E-07Glycerophosphodiester phosphodiesterase (GP)interior
scaffold464_754364scaffold4647543647.77E-07xyloglucan endotransglucosylase/hydrolase (XTH)interior
scaffold635_43971scaffold635439711.08E-06Ricinus communis acid phosphatase (RCAP)interior
scaffold977_784147scaffold9777841472.69E-10DNA-damage-repair (DDR)downstream 1.65
2016HRBscaffold212_216830scaffold2122168306.81E-07Transferase family (TF)upstream 6.80
scaffold359_282990scaffold3592829907.47E-12Aldehyde dehydrogenase (AD)interior
2016LXscaffold116_30201scaffold116302013.86E-11GRAS domain family (GRAS)upstream 9.57
scaffold156_1203677scaffold15612036772.29E-11Glutathione S-transferase (GST)downstream 0.52
scaffold1863_545scaffold18635458.39E-11Plant organelle RNA recognition domain (PORR)upstream 6.46
scaffold353_773806scaffold3537738067.04E-11Phosphatidylinositol-4-phosphate 5-Kinase (PIP5K)downstream 6.62
scaffold42_494571scaffold424945711.79E-07Glycerophosphodiester phosphodiesterase (GP)interior
scaffold464_754364scaffold4647543647.77E-07xyloglucan endotransglucosylase/hydrolase (XTH)interior
scaffold635_43971scaffold635439711.08E-06Ricinus communis acid phosphatase (RCAP)interior
scaffold977_784147scaffold9777841472.69E-10DNA-damage-repair (DDR)downstream 1.65
scaffold359_289139scaffold3592891392.30E-08Protein of unknown functionupstream 1.25
MLM2015HRBscaffold116_30201scaffold116302013.86E-11GRAS domain family (GRAS)upstream 9.57
scaffold156_1203677scaffold15612036772.29E-11Glutathione S-transferase (GST)downstream 0.52
scaffold1863_545scaffold18635458.39E-11Plant organelle RNA recognition domain (PORR)upstream 6.46
scaffold353_773806scaffold3537738067.04E-11Phosphatidylinositol-4-phosphate 5-Kinase (PIP5K)downstream 6.62
scaffold464_754364scaffold4647543647.77E-07xyloglucan endotransglucosylase/hydrolase (XTH)interior
scaffold977_784147scaffold9777841472.69E-10DNA-damage-repair (DDR)downstream 1.65
2016HRBscaffold977_469888scaffold9774698883.79E-07Lus10031183.BGIv1.0upstream 1.2
scaffold359_282990scaffold3592829907.47E-12Lus10013155.BGIv1.0interior
2016LXscaffold116_30201scaffold116302013.86E-11GRAS domain family (GRAS)upstream 9.57
scaffold156_1203677scaffold15612036772.29E-11Glutathione S-transferase (GST)downstream 0.52
scaffold1863_545scaffold18635458.39E-11Plant organelle RNA recognition domain (PORR)upstream 6.46
scaffold353_773806scaffold3537738067.04E-11Phosphatidylinositol-4-phosphate 5-Kinase (PIP5K)downstream 6.62
scaffold464_754364scaffold4647543647.77E-07xyloglucan endotransglucosylase/hydrolase (XTH)interior
scaffold977_784147scaffold9777841472.69E-10DNA-damage-repair (DDR)downstream 1.65
scaffold359_289139scaffold3592891392.30E-08Lus10013156.BGIv1.0upstream 1.25

Associated single nucleotide polymorphisms (SNPs) and the nearest genes for the number of branches trait of flax.

SNP loci displaying significant correlation with number of fruits

Nine SNP loci exhibited significant associations with the number of fruits (P < 1.26E-06). The corresponding Manhattan plots and QQ plots of the two models and three environments are shown in Supplementary Figure 4. The GLM identified three SNP loci in 2015HRB, two SNP loci in 2016HRB, and four SNP loci in 2016LX. Among these, scaffold137_111000 and scaffold225_427119 were identified in both 2015HRB and 2016LX. In addition, scaffold156 and scaffold413 were both identified in 2016HRB and 2016LX, but each had different association loci in the two environments. MLM identified association SNP loci in only two environments, including three SNP loci in 2015HRB and four SNP loci in 2016LX. Among these, scaffold137_111000 and scaffold225_427119 were identified in both 2015HRB and 2016LX. An overview of the results showed that five SNP loci were identified by both models (scaffold137_111000, scaffold225_427119, scaffold687_123666, scaffold156_1203677, and scaffold413_388319). The genes closest to the five SNP loci were TATP (Transmembrane amino acid transporter protein), Contig1437 (Linum usitatissimum clone Contig1437 microsatellite sequence), LU0019C12 (Linum usitatissimum clone LU0019C12 mRNA sequence), FH (Fumarate hydratase), and RP (Ribosomal protein) (Table 5).

Table 5

ModelEnvironmentSNP position (bp)scaffoldLocationP-valueNearest geneDistance to SNP (kb)
GLM2015HRBscaffold137_111000scaffold1371110006.28E-08Transmembrane amino acid transporter protein (TATP)upstream 0.65
scaffold225_427119scaffold2254271191.91E-07Linum usitatissimum clone Contig1437 microsatellite sequencedownstream 1.53
scaffold687_121617scaffold6871216177.22E-07Linum usitatissimum clone LU0019C12 mRNA sequenceupstream 0.36
2016HRBscaffold156_761294scaffold1567612942.76E-07Lus10040627.BGIv1.0downstream 0.54
scaffold413_1116527scaffold41311165272.97E-07Fumarate hydratase (FH)interior
2016LXscaffold137_111000scaffold1371110006.28E-08Transmembrane amino acid transporter protein (TATP)upstream 0.65
scaffold225_427119scaffold2254271191.91E-07Linum usitatissimum clone Contig1437 microsatellite sequencedownstream 1.53
scaffold156_1203677scaffold15612036771.14E-12Ribosomal protein (RP)upstream 1.14
scaffold413_388319scaffold4133883191.03E-06Chitinase class I (CCI)upstream 1.02
MLM2015HRBscaffold137_111000scaffold1371110006.28E-08Transmembrane amino acid transporter protein (TATP)upstream 0.65
scaffold225_427119scaffold2254271191.91E-07Linum usitatissimum clone Contig1437 microsatellite sequencedownstream 1.53
scaffold687_123666scaffold6871236663.22E-07Linum usitatissimum clone LU0019C12 mRNA sequenceinterior
2016LXscaffold137_111000scaffold1371110006.28E-08Transmembrane amino acid transporter protein (TATP)upstream 0.65
scaffold225_427119scaffold2254271191.91E-07Linum usitatissimum clone Contig1437 microsatellite sequencedownstream 1.53
scaffold156_1203677scaffold15612036771.14E-12Lus10040728.BGIv1.0upstream 1.14
scaffold413_388319scaffold4133883191.03E-06Lus10028377.BGIv1.0upstream 1.02

Associated single nucleotide polymorphisms (SNPs) and the nearest genes for the number of fruits trait of flax.

SNP loci showing significant correlation with 1,000-grain weight

Twenty-three SNP loci exhibited significant associations with 1000-grain weight (P < 1.26E-06). The corresponding Manhattan plots and QQ plots of the two models and three environments are shown in Supplementary Figure 5. The GLM identified ten SNP loci in 2015HRB, five SNP loci in 2016HRB, and eight SNP loci in 2016LX. Among these, four loci, namely scaffold112_184204, scaffold1143_190268, scaffold1317_154716, and scaffold1519_272169, were repeatedly identified in the three environments; scaffold123_1191347 was repeatedly identified in both 2015HRB and 2016HRB. The MLM identified eight SNP loci in 2015 HRB, seven SNP loci in 2016HRB, and four SNP loci in 2016LX. Among these, four loci, namely scaffold112_184204, scaffold1155_171787, scaffold132_713877, and scaffold1519_272169, were identified in all three environments; eight SNP loci were identified by both models (scaffold112_184204, scaffold123_1191347, scaffold1317_154716, scaffold1519_272169, scaffold1155_171787, scaffold132_713877, scaffold1491_58878, and scaffold15_1207948). The genes closest to the eight SNP loci were HP (Hypothetical protein), NAD-DEF (NAD dependent epimerase/dehydratase family), TS (Terpene synthase), TPI (Trypsin and protease inhibitor), STK (Serine/threonine protein kinase), CAP (CDP-alcohol phosphatidyltransferase), PHO1 (SPX and EXS domain-containing protein), and ARP (Autophagy-related protein) (Table 6).

Table 6

ModelEnvironmentSNP position (bp)scaffoldLocationP-valueNearest geneDistance to SNP (kb)
GLM2015HRBscaffold101_354340scaffold1013543403.68E-09Uncharacterized protein (UP)interior
scaffold112_184204scaffold1121842044.55E-09hypothetical protein (HP)interior
scaffold1143_190268scaffold11431902682.83E-07serine/threonine-protein kinase (STK)downstream 0.04
scaffold1155_171787scaffold11551717871.23E-08NAD dependent epimerase/dehydratase family (NAD-DEF)interior
scaffold123_1191347scaffold12311913475.48E-08Probable terpene synthase (TS)interior
scaffold1317_154716scaffold13171547167.62E-10Trypsin and protease inhibitor (TPI)upstream 0.04
scaffold132_713877scaffold1327138771.52E-11serine/threonine-protein kinase MPS1-like (STK-MPS1)interior
scaffold1491_58878scaffold1491588783.67E-10CDP-alcohol phosphatidyltransferase (CAP)upstream 3.01
scaffold15_1207948scaffold1512079483.65E-08SPX and EXS domain-containing protein (PHO1)interior
scaffold1519_272169scaffold15192721692.52E-10Autophagy-related protein (ARP)interior
2016HRBscaffold112_184204scaffold1121842045.32E-09Lus10018116.BGIv1.0interior
scaffold1143_190268scaffold11431902681.61E-07Serine/threonine protein kinase (STK)downstream 0.04
scaffold123_1191347scaffold12311913473.51E-08Terpene synthase (TS)interior
scaffold1317_154716scaffold13171547168.00E-09Trypsin and protease inhibitor (TPI)upstream 0.04
scaffold1519_272169scaffold15192721692.52E-09Autophagy-related protein (ARP)interior
2016LXscaffold101_354340scaffold1013543403.68E-09Uncharacterized proteininterior
scaffold112_184204scaffold1121842044.55E-09hypothetical protein (HP)interior
scaffold1143_190268scaffold11431902682.83E-07serine/threonine-protein kinase (STK)downstream 0.04
scaffold1155_171787scaffold11551717871.23E-08NAD dependent epimerase/dehydratase family (NAD-DEF)interior
scaffold1317_154716scaffold13171547167.62E-10Trypsin and protease inhibitor (TPI)upstream 0.04
scaffold132_713877scaffold1327138771.52E-11serine/threonine-protein kinase MPS1-like (STK-MPS1)interior
scaffold1491_58878scaffold1491588783.67E-10CDP-alcohol phosphatidyltransferase (CAP)upstream 3.01
scaffold1519_272169scaffold15192721692.52E-10Autophagy-related protein (ARP)interior
MLM2015HRBscaffold112_184204scaffold1121842044.55E-09hypothetical protein (HP)interior
scaffold1155_171787scaffold11551717871.23E-08NAD dependent epimerase/dehydratase family (NAD-DEF)interior
scaffold123_1191347scaffold12311913475.48E-08Probable terpene synthase (TS)interior
scaffold1317_154716scaffold13171547167.62E-10Trypsin and protease inhibitor (TPI)upstream 0.04
scaffold132_713877scaffold1327138771.52E-11serine/threonine-protein kinase MPS1-like (STK-MPS1)interior
scaffold1491_58878scaffold1491588783.67E-10CDP-alcohol phosphatidyltransferase (CAP)upstream 3.01
scaffold15_1207948scaffold1512079483.65E-08SPX and EXS domain-containing protein 1 (PHO1)interior
scaffold1519_272169scaffold15192721692.52E-10Autophagy-related protein (ARP)interior
2016HRBscaffold112_184204scaffold1121842045.32E-09Lus10018116.BGIv1.0interior
scaffold123_1191347scaffold12311913473.51E-08Lus10042202.BGIv1.0interior
scaffold1317_154716scaffold13171547168.00E-09Lus10007888.BGIv1.0upstream 0.04
scaffold1519_272169scaffold15192721692.52E-09Lus10007527.BGIv1.0interior
scaffold132_713877scaffold1327138771.52E-11serine/threonine-protein kinase MPS1-like(STK-MPS1)interior
scaffold1491_58878scaffold1491588783.67E-10CDP-alcohol phosphatidyltransferase (CAP)upstream 3.01
scaffold15_1207948scaffold1512079483.65E-08SPX and EXS domain-containing protein 1 (PHO1)interior
2016LXscaffold112_184204scaffold1121842044.55E-09hypothetical protein (HP)interior
scaffold1155_171787scaffold11551717871.23E-08NAD dependent epimerase/dehydratase family (NAD-DEF)interior
scaffold132_713877scaffold1327138771.52E-11serine/threonine-protein kinase MPS1-like (STK-MPS1)interior
scaffold1519_272169scaffold15192721692.52E-10Autophagy-related protein (ARP)interior

Associated single nucleotide polymorphisms (SNPs) and the nearest genes for the 1,000-grain weight trait of flax.

Candidate gene prediction

In this study, a total of 42 SNP loci were found to display significant association with five important agronomic traits (P < 1.26E-06). The Manhattan plots of the SNP loci (Supplementary Figures 1–5) as well as Tables 2–6 revealed that relatively more SNP loci were found to be linked to number of branches and 1000-grain weight (over twenty for each trait). In comparison, only nine SNP loci showed significant association with plant height or number of fruits, as did only three SNP loci with technical length. Next, candidate genes were screened in the 10 kb zone of each of the 15 SNP loci. The resulting candidate genes were then screened further using co-identification in both the GLM and MLM as well as co-occurrences in at least two of the three environments, whereby 15 final candidate genes were obtained (Table S3).

Discussion

Genome-wide association study (GWAS) is based on molecular markers, SNPs, that are present throughout a genome and facilitate direct association analysis for complex traits. It is considered an effective method to determine molecular markers that influence crucial traits (Gu et al., 2011; Liu et al., 2013). As a consequence, SLAF-seq-based GWAS has been launched in several crops, such as maize, rice, and soybean. Zhao et al. (2015) used this approach to examine 330 soybean cultivars to identify genes related to resistance against sclerotinia stem rot: the dominant locus Oswm13-1 was identified and four resistance candidate genes were acquired. Likewise, the method was used to analyze 440 soybean germplasm resources of various origins to identify genes related to resistance against soybean cyst nematode (SCN, Heterodera glycines Ichinohe) (Han et al., 2015); the authors identified 19 SNP loci significantly associated with SCN resistance. Su et al. (2016) used SLAF-seq to identify 81,675 SNP loci in cotton, performing GWAS for 355 cotton germplasm resources to identify 11 SNP loci associated with five earliness-related traits. Moreover, Yang et al. (2016) used this approach to analyze 419 core germplasm resources of rice to identify the novel gene LAC6 as associated with amylose content and show LOC_Os06g11340 to be a likely candidate gene for LAC6. These results indicated that SLAF-seq-based GWAS is a well-developed technology to identify high-quality alleles. In comparison with rice, soybean, maize, and cotton, flax has not received adequate research efforts using GWAS to identify high-quality alleles (no GWAS have yet been reported for flax). In this study, we employed NGS sequencing technology coupled with SLAF-seq to perform GWAS to identify SNP loci associated with important traits and determined their candidate genes.

Population stratification and genetic relationship are two key factors affecting the accuracy of population structure. We used Admixture software to analyze the population structure of flax. The results of population structure analyses showed that flax accessions were clearly divided into three groups—oil using, fiber using and oil-fiber using groups—at K = 3. The result indicated a strong divergence between different flax groups. Correspondingly, if the influence of population structure is not considered, then the stratification effect may be misinterpreted as genetic events, leading to pseudo-negativity in the association analysis. Hence, QQ plots of the five important traits under different environmental conditions were generated to validate the accuracy of the population correction. The results of the QQ plots showed that, overall, the observed values matched the expected values except for a few outliers at the ends. In other words, the correction of the population structure produced reliable results; thus, the association analysis did not produce any false associations because of population stratification.

GLM and MLM are the most commonly used algorithmic models in GWAS. The advantage of GLM is that it is more comprehensive and can obtain more SNPs associated with the traits, but its accuracy in identifying SNP loci is worse than MLM (Huang et al., 2010; Yang et al., 2010; Zhang et al., 2010; Liu et al., 2016). MLM can improve the accuracy of the analysis but can also miss some important SNP loci do to the strict screening conditions. Multiple algorithmic models should be used to conduct GWAS data analysis in actual application (Dhanapal and Crisosto, 2013; Hecht et al., 2013; Zhang et al., 2017). However, we found that the observed p-value from GLM greatly deviated from the expected p-value, while the p-value from the MLM model was close to the expected p-value (Supplementary Figures 1–5). The results indicated that the false positives were well controlled in the MLM model in our study.

The two association models (i.e., GLM and MLM) and the phenotypic data derived from three environments (i.e., 2015HRB, 2016HRB, and 2016LX) were used to perform GWAS for five important flax traits, generating a total of 107 loci (42 individual SNP loci) that displayed significant association with the five important flax traits (P < 1.26E-06). Afterwards, a more stringent screening was performed, in which the 42 SNP loci were subjected to analyses of co-identification by both the GLM and MLM as well as co-occurrence in 2015HRB, 2016HRB, and 2016LX. Ultimately, we identified two SNP loci associated with plant height, six SNP loci associated with the number of branches, five SNP loci associated with the number of fruits, and eight SNP loci associated with the 1,000-grain weight. Given that the aforementioned SNP loci displayed repeated occurrences (in both models and/or in at least two environments), they potentially have pivotal influences on the relevant agronomic flax traits. As such, they can be recruited as candidate genetic markers impacting these five important flax traits. The remaining SNP loci only had single occurrences of association (in only one model and in only one environment); therefore, their reliability must be investigated further.

Next, candidate genes were screened in the 10 kb zone of each of the SNP loci, which generated 15 potential candidate genes. Among these, there were two candidate genes for plant height, UGT and PL. It was reported that the overexpression of UGT84B1 and UGT74E2 in Arabidopsis thaliana (A. thaliana)causes phenotypes with shorter stature and more shoot branches. UGT84B1 overexpressors also have wrinkled leaves and reduced root gravitropism (Jin et al., 2013). In addition, UGT74D1 has been shown to modulate the metabolic pathway of auxin (IAA) in A thaliana to influence its development (Tanaka et al., 2014). Transgenic rice lines ectopically over-expressing the cZOGT1 and cZOGT2 genes exhibit short shoot phenotypes, delay of leaf senescence, and a decrease in crown root number. These results suggest that cZOGT activity has a physiological impact on growth and development of rice (Kudo et al., 2012). As such, UGT, in a similar fashion to its homologs in other plants, is likely involved in developmental regulation in flax, thereby affecting plant height. In addition, studies in rice and A. thaliana have shown that PL is intricately associated with plant development (Palusa et al., 2007; Leng et al., 2017) and that PL promotes plant growth and development via adjusting the cell division rate and cell wall relaxation (Sun and Nocker, 2010; Sun et al., 2010). These reports corroborated our finding that PL was a candidate gene for plant height in flax. Association analysis of technical length only identified one gene, MIF. Previous studies revealed that MIF is related to human disease and immunity (Roberts et al., 2017; Shin et al., 2017); if the gene is overexpressed, it may lead to the expansion and proliferation of cancer cells. However, the gene has not been examined in plants; thus, further studies are needed regarding whether MIF indeed affects the technical length of flax. Association analysis of the number of branches identified four candidate genes. Among these, GRAS is a transcription factor unique to plants and plays pivotal roles in development and signal transduction. LS and MOC1 are both members of the GRAS protein family. Lack of expression of LS prevents the formation of the axillary meristem and in turn decreases the number of axillary buds (Schmitz and Theres, 1999; Greb et al., 2003). Lack of expression of MOC1 results in the almost complete loss of tillering in rice plants because the gene is responsible for regulating that biological process via promoting the cell cycle (Li et al., 2003; Sun et al., 2010). These findings therefore indicate that MOC1 is involved in branch formation, directly affects the trait of the number of branches, and is a crucial candidate gene for this trait in flax. In plants, a major function of GST is to detoxify exogenous toxins and harmful endogenous metabolites. Specifically, mercapto groups of GST can be catalyzed to bind to a variety of endogenous electrophilic compounds and lipophilic substrates (Dixon et al., 2002; Moons, 2003). However, the involvement of GST in the branching or development of plants has not been reported; thus, further studies are needed to verify its association with the number of branches in flax. In A. thaliana, XTH9 is expressed in the shoot apical meristem of flower buds and flower stalks and is related to the elongation of these tissues; its loss of expression results in a phenotype of short internodal cell length (Hyodo et al., 2003). Moreover, the overexpression of its Brassica campestris homolog, BcXTH1, in A. thaliana leads to the elongation of flower stalks and an increase in plant height (Shin et al., 2006). Hence, the findings in A. thaliana studies suggest that XTH possibly plays an important role in dictating the number of branches in flax and corroborate our results that XTH is a candidate gene for this trait. Of the three genes showing association with the number of fruits, TATP has not been reported in plants. In addition, Contig1437 and LU0019C12 were both cloned from flax, but there are no studies examining their functions. Therefore, they remain to be validated by functional studies. Of the five candidate genes possibly associated with 1000-grain weight, SPX proteins contain a C-terminal EXS domain, which is a part of the PHO1 family. In A. thaliana and rice, PHO1 participates in transfer and signal transduction of phosphate from roots to the aboveground parts (Hamburger et al., 2002; Svistoonoff et al., 2007; Secco et al., 2010). Because the uptake of phosphorus can clearly improve seed yield in crops, these previous studies on PHO1 are consistent with our findings that it is an important candidate gene for the 1,000-grain weight. However, HP, TS, CAP, and STK have not been previously associated with seed yield.

Conclusion

In this study, we employed SLAF-seq to perform GWAS for five important agronomic traits in 224 germplasm resources of flax. Using two models (i.e., GLM and MLM) for flax grown in three environments (i.e., 2015HRB, 2016HRB, and 2016LX), we identified a total of 42 SNP loci displaying a significant association (P < 1.26E-06), including 15 SNP loci having co-identification either by both models or by co-occurrence in two or more environments. Next, candidate genes were screened in the 10 kb zone of each of the 15 SNP loci to identify 15 candidate genes possibly related to the five important agronomic traits. Our subsequent analyses determined that UGT and PL are candidate genes for plant height, GRAS and XTH are candidate genes for the number of branches, Contig1437 and LU0019C12 are candidate genes for the number of fruits, and PHO1 is a candidate gene for 1,000-grain weight. These SNP loci and candidate genes may serve as a biological basis for improving these important traits of flax.

Statements

Author contributions

ZD, ZY, LZ, and QT carried out most of the experimental work, and this study was conceived by JSu. Collections of flax germplasm resources were performed by DZ and XY. DX and JSun designed the research and wrote the manuscript. All authors read and approved the final manuscript.

Acknowledgments

This work was supported by the fund for the 59th Batch of Certificate of China Postdoctoral Science Foundation (2016M591302), China Agriculture Research System (CARS-19-E01), Agriculture Scientific and Technological Innovation Project of Chinese Academy of Agricultural Sciences (ASTIP-IBFC01), and Introduction of Doctor's Personnel Scientific Research and Development Fund (201507-43). We also thank the National Bast Fiber Crops Germplasm Improvement Center of Flax Branch Center for kindly supplying the experimental platform.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2017.02232/full#supplementary-material

References

  • 1

    AlexanderD. H.NovembreJ.LangeK. (2009). Fast model-based estimation of ancestry in unrelated individual. Genome Res.19, 1655–1664. 10.1101/gr.094052.109

  • 2

    AtwellS.HuangY. S.VilhjálmssonB. J.WillemsG.HortonM.LiY.et al. (2010). Genome-wide association study of 107 phenotypes in arabidopsis thaliana inbred lines. Nature465, 627–631. 10.1038/nature08800

  • 3

    BoothI.HarwoodR. J.WyattJ. L.GrishanovS. (2004). A comparative study of the characteristics of fibre-flax (Linum usitatissimum). Ind. Crops Prod.20, 89–95. 10.1016/j.indcrop.2003.12.014

  • 4

    BradburyP. J.ZhangZ.KroonD. E.CasstevensT. M.RamdossY.BucklerE. S. (2007). Tassel: software for association mapping of complex traits in diverse samples. Bioinformatics23, 2633–2635. 10.1093/bioinformatics/btm308

  • 5

    DengX.QiuC. S.ChenX. B.LongS. H.GuoY.HaoD. M.et al. (2014). Multiple analysis of relationship of agronomic traits and yield formation in flax(linum usitatissimum L.). Southwest China J. Agric. Sci.27, 535–540. 10.16213/j.cnki.scjas.2014.02.038

  • 6

    DhanapalA. P.CrisostoC. H. (2013). Association genetics of chilling injury susceptibility in peach (Prunus persica (L.) Batsch) across multiple years. 3 Biotech, 3, 481–490. 10.1007/s13205-012-0109-x

  • 7

    DiederichsenA.UlrichA. (2009). Variability in stem fibre content and its association with other characteristics in 1177 flax (Linum usitatissimum L.) genebank accessions. Ind. Crops Prod.30, 33–39. 10.1016/j.indcrop.2009.01.002

  • 8

    DixonD. P.LapthornA.EdwardsR. (2002). Plant glutathione transferases. Genome Biol.3:reviews3004-1. 10.1186/gb-2002-3-3-reviews3004

  • 9

    FarfanI. D.GnD. L. F.MurrayS. C.IsakeitT.HuangP. C.WarburtonM.et al. (2015). Genome wide association study for drought, aflatoxin resistance, and important agronomic traits of maize hybrids in the sub-tropics. PLoS ONE10:e0117737. 10.1371/journal.pone.0117737

  • 10

    GehringerA.FriedtW.LühsW.SnowdonR. J. (2006). Genetic mapping of agronomic traits in false flax (Camelina sativa subsp. sativa). Genome49, 1555–1563. 10.1139/g06-117

  • 11

    GengX.JiangC.YangJ.WangL.WuX.WeiW. (2016). Rapid identification of candidate genes for seed weight using the slaf-seq method in brassica napus. PLoS ONE11:e0147580. 10.1371/journal.pone.0147580

  • 12

    GrebT.ClarenzO.SchaferE.MullerD.HerreroR.SchmitzG.et al. (2003). Molecular analysis of the lateral suppressor gene in arabidopsis reveals a conserved control mechanism for axillary meristem formation. Gene Dev.17, 1175–1187. 10.1101/gad.260703

  • 13

    GuX.FengC.MaL.SongC.WangY.DaY.et al. (2011). Genome-wide association study of body weight in chicken f2 resource population. PLoS ONE6:e21872. 10.1371/journal.pone.0021872

  • 14

    HallD.TegströmC.IngvarssonP. K. (2010). Using association mapping to dissect the genetic basis of complex traits in plants. Brief. Funct. Genomics9, 157. 10.1093/bfgp/elp048

  • 15

    HamburgerD.RezzonicoE.Macdonald-ComberP. J.SomervilleC.PoirierY. (2002). Identification and characterization of the arabidopsis pho1 gene involved in phosphate loading to the xylem. Plant Cell14, 889–902. 10.1105/tpc.000745

  • 16

    HanY.ZhaoX.CaoG.WangY.LiY.LiuD.et al. (2015). Genetic characteristics of soybean resistance to hg type 0 and hg type 1.2.3.5.7 of the cyst nematode analyzed by genome-wide association mapping. BMC Genomics16:598. 10.1186/s12864-015-1800-1

  • 17

    HanZ.ZhangB.ZhaoH.AyaadM.XingY. (2016). Genome-wide association studies reveal that diverse heading date genes respond to short and long day lengths between indica and japonica rice. Front Plant Sci. 7:1270. 10.3389/fpls.2016.01270

  • 18

    HardyO. J.VekemansX. (2002). SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol. Ecol. Resour.2, 618–620. 10.1046/j.1471-8286.2002.00305.x

  • 19

    HealeyA.FurtadoA.CooperT.HenryR. J. (2014). Protocol: a simple method for extracting next-generation sequencing quality genomic dna from recalcitrant plant species. Plant Methods10, 1–8. 10.1186/1746-4811-10-21

  • 20

    HechtB. C.CampbellN. R.HolecekD. E.NarumS. R. (2013). Genome-wide association reveals genetic basis for the propensity to migrate in wild populations of rainbow and steelhead trout. Mol. Ecol.22, 3061–3076. 10.1111/mec.12082

  • 21

    HuangX.WeiX.SangT.ZhaoQ.FengQ.ZhaoY.et al. (2010). Genome-wide association studies of 14 agronomic traits in rice landraces. Nat. Genet.42:961. 10.1038/ng.695

  • 22

    HyodoH.YamakawaS.TakedaY.TsudukiM.YokotaA.NishitaniK.et al. (2003). Active gene expression of a xyloglucan endotransglucosylase/hydrolase gene, xth9, in inflorescence apices is related to cell elongation in arabidopsis thaliana. Plant Mol. Biol.52, 473–482. 10.1023/A:1023904217641

  • 23

    JinS. H.MaX. M.HanP.WangB.SunY. G.ZhangG. Z.et al. (2013). UGT74D1 is a novel auxin glycosyltransferase from Arabidopsis thaliana. PLoS ONE8:e61705. 10.1371/journal.pone.0061705

  • 24

    KudoT.MakitaN.KojimaM.TokunagaH.SakakibaraH. (2012). Cytokinin activity of cis-zeatin and phenotypic alterations induced by over-expression of putative cis-zeatin-O-glucosyltransferase in rice. Plant Physiol.160, 112. 10.1104/pp.112.196733

  • 25

    LengY.YangY.RenD.HuangL.DaiL.WangY.et al. (2017). A rice pectate lyase-like gene is required for plant growth and leaf senescence. Plant Physiol.174, 1151–1166. 10.1104/pp.16.01625

  • 26

    LiX.QianQ.FuZ.WangY.XiongG.ZengD.et al. (2003). Control of tillering in rice. Nature422, 618. 10.1038/nature01518

  • 27

    LiY.ZengX. F.ZhaoY. C.LiJ. R.ZhaoD. G. (2016). Identification of a new rice low-tiller mutant and association analyses based on the slaf-seq method. Plant Mol. Biol. Rep.35, 1–11. 10.1007/s11105-016-1002-2

  • 28

    LiuR.SunY.ZhaoG.WangF.WuD.ZhengM.et al. (2013). Genome-wide association study identifies loci and candidate genes for body composition and meat quality traits in beijing-you chickens. PLoS ONE8:e61172. 10.1371/journal.pone.0061172

  • 29

    LiuX.HuangM.FanB.BucklerE. S.ZhangZ. (2016). Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet.12:e1005767. 10.1371/journal.pgen.1005767

  • 30

    MoonsA. (2003). Osgstu3, and osgtu4, encoding tau class glutathione s -transferases, are heavy metal- and hypoxic stress-induced and differentially salt stress-responsive in rice roots, 1. FEBS Lett.553, 427–432. 10.1016/S0014-5793(03)01077-9

  • 31

    NussbaumA. K.KuttlerC.HadelerK. P.RammenseeH. G.SchildH. (2001). Paproc: a prediction algorithm for proteasomal cleavages available on the www. Immunogenetics53, 87–94. 10.1007/s002510100300

  • 32

    PalusaS. G.GolovkinM.ShinS. B.RichardsonD. N.ReddyA. S. (2007). Organ-specific, developmental, hormonal and stress regulation of expression of putative pectate lyase genes in arabidopsis. New Phytol.174, 537–550. 10.1111/j.1469-8137.2007.02033.x

  • 33

    PurcellS.NealeB.Todd-BrownK.ThomasL.FerreiraM. A. R.BenderD.et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet.81, 559–575. 10.1086/519795

  • 34

    RobertsS.LengL.SorokaC. J.BoyerJ. L.BucalaR.AssisD. N. (2017). FRI-393-Macrophage migration inhibitory factor (MIF) modulates T-cell proliferation and hepatic inflammation in a model of autoimmune liver disease. J. Hepatol. 66, S363–S364. 10.1016/S0168-8278(17)31067-X

  • 35

    SchmitzG.TheresK. (1999). Genetic control of branching in Arabidopsis and tomato. Curr. Opin. Plant Biol.2, 51–55. 10.1016/S1369-5266(99)80010-7

  • 36

    SeccoD.BaumannA.PoirierY. (2010). Characterization of the rice pho1 gene family reveals a key role for ospho1;2 in phosphate homeostasis and the evolution of a distinct clade in dicotyledons. Plant Physiol.152, 1693–1704. 10.1104/pp.109.149872

  • 37

    ShinM. S.KangY.LengL.BucalaR.KangI. (2017). Macrophage migration inhibitory factor serves as an upstream regulator of NLRP3 expression and subsequent IL-1beta production in human monocytes in response to lupus U1-snRNP immune complex. J. Immunol.198, 210.

  • 38

    ShinY. K.YumH.KimE. S.ChoH.GothandamK. M.HyunJ.et al. (2006). Bcxth1, a brassica campestris homologue of arabidopsis xth9, is associated with cell expansion. Planta224, 32–41. 10.1007/s00425-005-0189-5

  • 39

    SonahH.O'DonoughueL.CoberE.RajcanI.BelzileF. (2015). Identification of loci governing eight agronomic traits using a gbs-gwas approach and validation by qtl mapping in soya bean. Plant Biotechnol. J.13, 211–221. 10.1111/pbi.12249

  • 40

    Soto-CerdaB. J.DiederichsenA.RagupathyR.CloutierS. (2013). Genetic characterization of a core collection of flax (Linum usitatissimum L.) suitable for association mapping studies and evidence of divergent selection between fiber and linseed types. BMC Plant Biol.13:78. 10.1186/1471-2229-13-78

  • 41

    Soto-CerdaB. J.DuguidS.BookerH.RowlandG.DiederichsenA.CloutierS. (2014). Genomic regions underlying agronomic traits in linseed (Linum usitatissimum L.) as revealed by association mapping. J. Integr. Plant Biol.56, 75–87. 10.1111/jipb.12118

  • 42

    SuJ.PangC.WeiH.LiL.LiangB.WangC.et al. (2016). Identification of favorable snp alleles and candidate genes for traits related to early maturity via gwas in upland cotton. BMC Genomics17:687. 10.1186/s12864-016-2875-z

  • 43

    SunF. L.ZhangW. P.XiongG. S.YanM. X.QianQ.LiJ. Y.et al. (2010). Identification and functional analysis of the moc1 interacting protein 1. J. Genet. Genomics37, 69–77. 10.1016/S1673-8527(09)60026-6

  • 44

    SunL. X.NockerS. V. (2010). Analysis of promoter activity of members of the pectate lyase-like (pll) gene family in cell separation in arabidopsis. BMC Plant Biol.10, 152. 10.1186/1471-2229-10-152

  • 45

    SunX. W.LiuD. Y.ZhangX. F.LiW. B.LiuH.HongW. G.et al. (2013). SLAF-seq: an efficient method of large-scale de novo SNP discovery and genotyping using high-throughput sequencing. PLoS ONE8:e58700. 10.1371/journal.pone.0058700

  • 46

    SvistoonoffS.CreffA.ReymondM.Sigoillot-ClaudeC.RicaudL.BlanchetA.et al. (2007). Root tip contact with low-phosphate media reprograms plant root architecture. Nat. Genet.39, 792. 10.1038/ng2041

  • 47

    TanakaK.HayashiK.NatsumeM.KamiyaY.SakakibaraH.KawaideH.et al. (2014). Ugt74d1 catalyzes the glucosylation of 2-oxindole-3-acetic acid in the auxin metabolic pathway in arabidopsis. Plant Cell Physiol.55, 218–228. 10.1093/pcp/pct173

  • 48

    VilkkiJ.Iso-TouruT.SchulmanN. F.DolezalM. A.BagnatoA.SollerM.et al. (2013). Revisiting QTL Affecting Clinical Mastitis by High-Density GWAS and Resequencing in the Finnish Ayrshire Dairy Cattle, in International Plant and Animal Genome Conference Xxi. San Diego, CA.

  • 49

    WangZ.HobsonN.GalindoL.ZhuS.ShiD.McDillJ.et al. (2012). The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J.72, 461–473. 10.1111/j.1365-313X.2012.05093.x

  • 50

    XuX.XuR.ZhuB.YuT.QuW.LuL.et al. (2014). A high-density genetic map of cucumber derived from specific length amplified fragment sequencing (slaf-seq). Front. Plant Sci.5:768. 10.3389/fpls.2014.00768

  • 51

    XueY.WarburtonM. L.SawkinsM.ZhangX.SetterT.XuY.et al. (2013). Genome-wide association analysis for nine agronomic traits in maize under well-watered and water-stressed conditions. Theor. App. Genet.126, 2587–2596. 10.1007/s00122-013-2158-x

  • 52

    YangX.NongB.XiaX.ZhangZ.ZengY.LiuK.et al. (2016). Rapid identification of a new gene influencing low amylose content in rice landraces (Oryza sativa l.) using genome-wide association study with specific-locus amplified fragment sequencing. Genome60, 465–472. 10.1139/gen-2016-0104

  • 53

    YangX.YanJ.ShahT.WarburtonM. L.LiQ.LiL.et al. (2010). Genetic analysis and characterization of a new maize association mapping panel for quantitative trait loci dissection. Theor. Appl. Genet.121, 417–431. 10.1007/s00122-010-1320-y

  • 54

    ZhangH.FanX.ZhangY.JiangJ.LiuC. (2017). Identification of favorable SNP alleles and candidate genes for seedlessness in Vitis vinifera L. using genome-wide association mapping. Euphytica213:136. 10.1007/s10681-017-1919-z

  • 55

    ZhangJ.SongQ.CreganP. B.NelsonR. L.WangX.WuJ.et al. (2015). Genome-wide association study for flowering time, maturity dates and plant height in early maturing soybean (glycine max) germplasm. BMC Genomics16:217. 10.1186/s12864-015-1441-4

  • 56

    ZhangT.HuY.WuX.MaR.JiangQ.WangY. (2016). Identifying liver cancer-related enhancer snps by integrating gwas and histone modification chip-seq data. BioMed Res. Int.2016, 1–6. 10.1155/2016/2395341

  • 57

    ZhangY.WangL.XinH.LiD.MaC.XiaD.et al. (2013). Construction of a high-density genetic map for sesame based on large scale marker development by specific length amplified fragment (slaf) sequencing. BMC Plant Biol.13:141. 10.1186/1471-2229-13-141

  • 58

    ZhangZ.ErsozE.LaiC. Q.TodhunterR. J.TiwariH. K.GoreM. A.et al. (2010). Mixed linear model approach adapted for genome-wide association studies. Nat Genetics42:355. 10.1038/ng.546

  • 59

    ZhaoX.HanY.LiY.LiuD.SunM.ZhaoY.et al. (2015). Loci and candidate gene identification for resistance to sclerotinia sclerotiorum in soybean (glycine max L. merr.) via association and linkage maps. Plant J. Cell Mol. Biol.82, 245–255. 10.1111/tpj.12810

Summary

Keywords

flax (Linum usitatissimum L.), agronomic traits, GWAS, SLAF-seq, candidate genes

Citation

Xie D, Dai Z, Yang Z, Sun J, Zhao D, Yang X, Zhang L, Tang Q and Su J (2018) Genome-Wide Association Study Identifying Candidate Genes Influencing Important Agronomic Traits of Flax (Linum usitatissimum L.) Using SLAF-seq. Front. Plant Sci. 8:2232. doi: 10.3389/fpls.2017.02232

Received

12 August 2017

Accepted

19 December 2017

Published

09 January 2018

Volume

8 - 2017

Edited by

Luigi Cattivelli, Consiglio per la Ricerca in Agricoltura e l'analisi Dell'economia Agraria (CREA), Italy

Reviewed by

Leonardo Miguel Galindo-González, University of Alberta, Canada; Xiaoming Wu, Oil Crops Research Institute, The Chinese Academy of Agricultural Sciences, China

Updates

Copyright

*Correspondence: Jianguang Su

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

†These authors have contributed equally to this work.

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics