Identification of Loci and Candidate Genes Responsible for Fiber Length in Upland Cotton (Gossypium hirsutum L.) via Association Mapping and Linkage Analyses

Fiber length (FL) is an important fiber quality trait in cotton. Although many fiber quality quantitative trait loci (QTL) responsible for FL have been identified, most cannot be applied to breeding programs, mainly due to unstable environments or large confidence intervals. In this study, we combined a genome-wide association study (GWAS) and linkage mapping to identify and validate high-quality QTLs responsible for FL. For the GWAS, we developed 93,250 high-quality single-nucleotide polymorphism (SNP) markers based on 355 accessions, and the FL was measured in eight different environments. For the linkage mapping, we constructed an F2 population from two extreme accessions. The high-density linkage maps spanned 3,848.29 cM, with an average marker interval of 1.41 cM. In total, 14 and 13 QTLs were identified in the association and linkage mapping analyses, respectively. Most importantly, a major QTL on chromosome D03 identified in both populations explained more than 10% of the phenotypic variation (PV). Furthermore, we found that a sucrose synthesis-related gene (Gh_D03G1338) was associated with FL in this QTL region. The RNA-seq data showed that Gh_D03G1338 was highly expressed during the fiber development stage, and the qRT-PCR analysis showed significant expression differences between the long fiber and short fiber varieties. These results suggest that Gh_D03G1338 may determine cotton fiber elongation by regulating the synthesis of sucrose. Favorable QTLs and candidate genes should be useful for increasing fiber quality in cotton breeding.


INTRODUCTION
Cotton (Gossypium L.) is one of the most important cash crops and is extensively cultivated in more than 80 countries, having an annual global economic impact of approximately $500 billion and accounting for 2.5% of arable land worldwide (Chen et al., 2007). Upland cotton (Gossypium hirsutum. L) is the most important species due to its high yield and wide adaptability and is used as a raw material in the textile industry. The fiber quality is considered a key indicator for breeding programs, and tremendous breeding efforts have focused on fiber length (FL) to increase fiber quality (Said et al., 2015). FL is one of the most important and highly heritable fiber quality traits in cotton (Jamshed et al., 2016) and is directly related to its spinning performance, as longer fibers are typically better for manufacturing fine yarns. Over the last few decades, FL has been successfully used for genetic analysis, such as QTL mapping and association analysis, and more than 490 QTLs for FL have been reported (Said et al., 2015). For example, Liu et al. (2018) constructed a high-density genetic map containing 4,729 SNPs and 122 simple sequence repeat (SSR) markers with an average interval of 0.51 cM and anchored 36 QTLs for FL on 21 chromosomes in 9 environments. Ali et al. (2018) identified 20 QTLs related to FL in a RIL population derived from two cultivars (Yumian 1 and CA3084) with distinct genetic backgrounds, and 12 QTLs were detected in more than two environments. In addition, Huang et al. (2017) employed association mapping techniques, which are different from biparental linkage mapping, using 1,1975 high-quality SNP markers in a set of 503 upland cotton accessions and identified 11 highly favorable SNP alleles for FL. Thus, a better understanding of the genetic architecture of FL could help breeders develop varieties with longer fibers.
Molecular markers are powerful tools in QTL analyses of major traits and the identification of genomic loci that could be used in marker-assisted selection (MAS) breeding (Park et al., 2005). In the past few decades, molecular markers, including amplified fragment length polymorphisms (AFLPs) (Lacape et al., 2003), restriction fragment length polymorphisms (RFLPs) (Paterson et al., 1993), random amplified polymorphic DNAs (RAPDs) (Iqbal et al., 1997), sequence-related amplified polymorphisms (SRAPs) (Lin et al., 2003) and SSR markers (Blenda et al., 2006), have been widely used in cotton QTL mapping. However, compared with traditional molecular markers, SNPs are more efficient in revealing genetic changes in complex traits in association analyses and biparental QTL mapping because SNPs are widely distributed, highly polymorphic and can be obtained at a low cost in crop genomes (Van Tassell et al., 2008;Ganal et al., 2009). To date, genomewide SNP discovery has been applied in multiple crops, including rice, maize, soybean, and oilseed rape. However, few QTLs have been discovered in cotton genetic studies using SNP markers compared with the number discovered in studies using traditional molecular markers (Said et al., 2015). For example, our laboratory previously published a high-density genetic map spanning 4,071.98 cM and identified 247 early-maturity QTLs based on restriction site-associated DNA sequencing (RADseq) . Subsequently, we used the genotyping by sequencing (GBS-seq) method to confirm a major QTL region on chromosome D03, providing valuable information for MAS breeding in early-maturity cotton . Recently, a candidate gene responsible for plant height has been detected through association mapping in upland cotton accessions by using specific locus amplified fragment sequencing (SLAF-seq) (Su et al., 2018). Furthermore, the CottonSNP63K (Hinze et al., 2017) and CottonSNP80K (Cai et al., 2017) arrays for hybridization have become popular among QTL mapping and genome-wide association study (GWAS) analysis for the detection of QTLs responsible for fiber quality (Huang et al., 2017;Tan et al., 2018).
Genome-wide association study analyses have recently become a popular approach for revealing the genetic basis of quantitative phenotypic variation and identifying linkage markers for MAS breeding Mao et al., 2015;Yano et al., 2016). Compared with biparental linkage mapping, GWAS have the advantage of a higher resolution, allow for the identification of genes responsible for multiple traits and do not require the generation of a mapping population over a long period (Huang and Han, 2014). However, the substructure of a population can yield false-positive QTLs between markers and traits in a GWAS (Zhao et al., 2007). To overcome this deficiency, a new approach employing GWAS and QTL mapping to complement one another in the identification of major QTLs responsible for important agronomic traits has been used in various crops (Sonah et al., 2015;Zhao et al., 2015;Li X. et al., 2016;Sun et al., 2016). This method increases the confidence in the significant loci identified by GWAS and has been validated by QTL mapping. In addition, traditional linkage analysis is a powerful tool for analyzing pairs of alleles at a low resolution, and association mapping provides a high-resolution evaluation of numerous alleles with uneven statistical power (Wilson et al., 2004). Due to these advantages, this approach is more accurate and efficient for evaluating major QTLs harboring target genes responsible for important agronomic traits.
To obtain better insight into the underlying genetic mechanisms of FL in cotton, SLAF-seq and GBS-seq were used to discover genome-wide SNP markers in natural and biparental populations, respectively. The FL phenotype was evaluated in multiple environments. Then, we used both GWAS and linkage mapping to complement one another and identify reliable QTLs associated with variations in FL. The stable QTLs verified in this study may be useful for MAS or genomic selection and gene cloning in cotton breeding programs.

Mapping Population and Field Experiments
In total, 355 of the upland cotton cultivars, including 328 representative cultivars developed in China and 27 cultivars introduced from abroad, investigated in this study in the association analysis were obtained from cotton germplasm collections at our laboratory and the low-temperature germplasm gene bank of the Cotton Research Institute of the Chinese Academy of Agricultural Sciences (CRI-CAAS). Detailed information regarding all accessions is provided in Supplementary Table S1 Su et al. (2016) investigated four environments between 2014 and 2015 in Anyang and Shihezi. To enhance the accuracy of our phenotypic data, we added four environments and two additional locations

Phenotyping and Statistical Analysis
In total, 20 naturally opened bolls were hand-harvested from each line. The FL of approximately 10-15-g fiber samples was measured using an HVI-MF 100 instrument (User Technologies, Inc., USTER, Switzerland) at the Cotton Fiber Quality Supervision, Inspection and Testing Center of the Ministry of Agriculture, Anyang, China. The phenotypic data were analyzed using R software (R Core Team, 2013). The P-values of the correlation coefficients of the FL between each two environments were calculated with Pearson's correlation test using the cor.test() function in (R Core Team, 2013). The broad-sense heritability was calculated using the methods described by Knapp (Knapp et al., 1985). In addition, the best linear unbiased prediction of fiber length (FL-BLUP) in each line in the eight environments was calculated for the GWAS analyses with the R package "lme4" (Bates et al., 2014).

DNA Extraction and SNP Genotyping
For each cotton accession, the genomic DNA extraction was performed using the CTAB method proposed by Paterson et al. (1993) with modifications. The biparental populations were genotyped using a GBS-seq approach. GBS sequencing libraries were constructed for each accession based on a double digestion with the restriction enzymes Mse I and NIa III (New England Biolabs, NEB). The detailed protocols used for the library preparation and sequencing using the GBS strategy have been described by Zhou et al. (2016) and . The paired end 150-bp sequence reads in each library were generated using an Illumina HiSeq4000 (Illumina, San Diego, CA, United States). The clean reads from the parents and F 2 individuals were aligned to the reference genome (TM-1)  with Burrows-Wheeler Aligner (Li and Durbin, 2009). Only sequences with a mapping score of at least 20 and those aligning to the reference genome with fewer than two mismatches were used for the SNP discovery. The retained sequences were inputted into the Genome Analysis Toolkit software (McKenna et al., 2010), and a variant calling analysis was performed. The high-quality SNPs were filtered using vcftools (Danecek et al., 2011) and a Python script. The structural and functional annotations of the SNPs were analyzed using ANNOVAR  based on the GFF3 files of the G. hirsutum genome. Before the genetic map construction, all SNP markers were filtered using the criteria detailed by  to exclude those with more than 40% missing data in the progeny. For the natural population, the SNP genotyping of the 355 accessions was performed using the SLAF-seq method (Sun et al., 2013). The genomic DNAs from each accession were incubated with Rsa I and Hae III (New England Biolabs, NEB). The sequencing of the population using the Illumina HiSeq2500 platform (Illumina, San Diego, CA, United States) generated 96.10 Gb of data with ∼80-bp, paired-end, clean reads. The SNP calling was performed using Genome Analysis Toolkit software (McKenna et al., 2010) and SAMtools . Finally, the SNPs used for the GWAS were filtered using vcftools (Danecek et al., 2011) with a minor allele frequency (MAF) ≥ 0.05 and missing data < 0.2.

Genome-Wide Associations and Candidate Gene Identification
For the GWAS analysis, TASSEL (Bradbury et al., 2007) was used to determine the association between the high-quality SNPs and FL-BLUP values calculated from the eight environments. A P-value < 2.5 × 10 −5 was used as the threshold to determine whether a significant association with the markers existed. Candidate genes were identified in a significant LD block region on chromosome D03 (41.52-41.91 Mb). A Manhattan plot and a quantile-quantile plot were generated using the R package 1 "CMplot".

Genetic Map Construction and QTL Analyses
A genetic map of the F 2 populations was constructed using JoinMap 4.0 (Van Ooijen, 2006) with a regression approach and a logarithm of odds (LOD) threshold of 3-10. The recombinant ratio was converted to the genetic distance with the Kosambi map function (Kosambi, 2016). We divided the linkage groups according to their position on the G. hirsutum genome , and the markers in each linkage group were then sorted with a Python script. The detection of the FL QTLs was performed using the software IciMapping with the composite interval mapping (CIM) model (Meng et al., 2015). An LOD threshold of 2.5 was used to identify the presence of significant QTLs. A graphic visualization of the linkage groups was created using the R package ggplot2 (Wickham, 2016).

Gene Expression Level Analysis
The total RNA was extracted using an RNAprep Pure Plant Kit (Tiangen, China), and cDNA was reverse-transcribed with a PrimeScript TM RT Reagent Kit and gDNA Eraser (TaKaRa, Japan). A real-time PCR detection system (7500 Real-Time PCR System, Applied Biosystems, Foster City, CA, United States) was used with the UltraSYBR Mixture (CWBIO, China). The actin transcript was amplified as an internal reference gene to normalize the cDNA quantity added to each reaction. The gene expression levels were calculated using the 2 − CT method. For each sample, three technical replicates and three biological replicates were included in the analysis. All primers used in this study are listed in Supplementary Table S2.
For the RNA-seq analysis of cotton fiber development stage, the raw data were downloaded from TM-1 genome sequencing research . The clean reads were aligned to the TM-1 genome  using Bowtie2 (Langmead and Salzberg, 2012). The expression of each cotton gene was derived from the read alignments and normalized to fragments per kilobase of exon model per million (FPKM) in Cufflinks (Trapnell et al., 2012).

Sequence of Gh_D03G1338 in the Natural Population
The full-length ORF of Gh_D03G1338 was amplified and sequenced in 75 accessions selected from the natural population. The PCR products were ligated into a pMD18-T cloning vector (TaKaRa, Japan) for sequencing by GENEWIZ (Suzhou, China) and aligned using MEGA version 7 (Kumar et al., 2016). The cloning primers are shown in Supplementary Table S2. 1 https://github.com/YinLiLin/R-CMplot

Phenotypic Variation in Fiber Length Between Natural and Segregating Populations
The detailed descriptive statistics of the FL in the mapping populations are presented in Table 1 In the natural population, the FL ranged from 22.60 to 35.50 mm (average 28.63 mm), whereas in the segregating population, the FL ranged from 25.90 to 33.50 mm (average 29.99 mm). The standard deviation (SD) value in the natural and segregating populations was between 0.85 and 1.74, indicating that the experimental error was small. The absolute skewness and kurtosis values of FL in all populations across all environments were <1, suggesting that the data followed an approximately normal distribution. The broad-sense heritability (h 2 ) in the eight environments was estimated to be relatively high at 81%, suggesting that the FL was mainly controlled by genetic factors and was less affected by environmental effects. In addition, significant (P < 0.001) positive correlations were observed between each pair of environments, and the Pearson's correlation coefficients ranged from 0.56 to 0.82 (Figure 1). In conclusion, the high Pearson's correlations and stable heritability indicated that much of the FL variance was genetically controlled in the populations and suitable for a GWAS analysis and QTL mapping.

Genome-Wide Association Study of Fiber Length Based on 355 Accessions
The GWAS population used in this study was previously described (Su et al., 2018). Briefly, 93,250 high-quality SNPs (MAF ≥ 0.05 and missing data < 0.2), including 61,618 SNPs on the At subgenome and 31,632 SNPs on the Dt subgenome, were detected among the 355 accessions on 26 chromosomes in upland cotton. The average SNP density was approximately 22.35 kb per SNP. A genome-wide association analysis of FL was performed with a mixed linear model (MLM), which greatly reduces the false-positive rates, as shown in Figure 2. Using the FL-BLUP values, 14 significant association loci were identified on six chromosomes (A03, D02, D03, D04, D09, and D11) across the eight environments, explaining 9.34-14.37% of the PV based on the R 2 values ( Table 2). All significant SNPs were detected in more than two environments, and 10 (71%) SNPs accounted for more than 10% of the PV ( Table 2). Among the SNPs, only 1 SNP on the At subgenome and 13 SNPs on the Dt subgenome were recovered. On the At subgenome, one significant SNP (P-value = 5.39E-06) on chromosome A03 explaining 11.04% of the PV was detected in six environments. Notably, two significant SNPs (D03_41720764 and D03_41721072) on chromosome D03 that showed the strongest association with FL could explain 13.13-14.37% of the PV. These two SNPs generated the two haplotypes AA and TT. The accessions carrying the AA haplotype had a significantly shorter FL than those with the TT allele in the Frontiers in Plant Science | www.frontiersin.org  eight environments (P < 0.001) (Figure 3). In addition, six SNPs spanning 24.03-24.10 Mb on chromosome D11 were observed to be strongly associated with FL, explained 9.34-12.60% of the PV ( Table 2). In this region, ortholog of KRP6 (Gh_D11G1929), which is a member of the KRP family and encodes a KIP-related protein in Arabidopsis. thaliana. Interestingly, the expression levels of KRP5, which belongs to the KRP family, was reported to be significantly correlated with cell length in Arabidopsis (Wen et al., 2013). Furthermore, previous studies have demonstrated that overexpression of Gh_D11G1929 in Arabidopsis produced a significantly leaf trichrome length compared with the wild type (Ma et al., 2018). From the above results, we inferred that Gh_D11G1929 is a major gene on chromosome D11 for controlling FL and may have a potential role in the breeding process.

Genetic Map Construction and QTL Analysis of Fiber Length in the Segregating Generations
To further verify that these significant SNP loci associated with FL, we constructed an F 2 population (198 plants in total) using two extreme accessions selected from the natural population with contrasting FLs, i.e., CRI50 (27.52 mm) and CRI60 (31.22 mm). The GBS libraries of the 198 F 2 individuals and their parents were constructed for the Illumina HiSeq4000   Table S3). The raw data have been deposited in the NCBI Sequence Read Archive (SRA) 2 under accession SRP155335. In total, 20,698 aa × bb genotypes of SNP markers were used to screen the 198 F 2 individuals, accounting for 36.76% of all 56,299 SNP markers ( Figure 4A). After filtering for significant segregation distortion (P < 0.001) and with more than 40% missing data, 5,280 SNP markers (9.38%) were used to construct the final genetic map (Table 3 and Figure 4B). The total genetic map spanned a cumulative distance of 3,848.29 cM across 26 linkage groups with an average marker interval of 1.41 cM. The At subgenome contained 2,316 markers and covered 1,971.87 cM, whereas the Dt subgenome harbored 2,964 markers and spanned 2 https://www.ncbi.nlm.nih.gov/sra/ 1,876.41 cM. The SNP markers were unevenly distributed on the 26 chromosomes of upland cotton, in accordance with previous reports Zhang et al., 2016). The highest number of markers was identified on chromosome D08 (809) with an average density of 3.25 markers/cM, while the lowest number of markers (48), with an average density of 0.83 markers/cM, was found on chromosome D13. The length of the linkage groups varied from 57.96 cM on chromosome D13 to 249.28 cM on chromosome D08. The average marker interval was the lowest on chromosome A10 (0.43) and the highest on chromosome D08 (3.25 cM).

Identification of a Candidate Gene Potentially Underlying Fiber Length on Chromosome D03
To screen for reliable QTLs that can be used in gene function analyses and MAS, we compared the results of the GWAS and linkage mapping. According to the physical position of the SNP markers, an overlapping region on chromosome D03 was detected in both the GWAS analysis and linkage mapping that could explain the relatively high PV of 13.75 and 19.28%, respectively, indicating that a major gene may be responsible for FL in this genomic region (Figure 5). Based on the cotton gene annotation database 3 , 26 candidate genes contained in a significant LD block region on chromosome D03 (41.52-41.91 Mb) were identified (Supplementary Table S4). Among these genes, 22 candidate genes had annotation information, while that on four candidate genes was unknown. Interestingly, Gh_D03G1338 (F2KP) is involved in sucrose synthesis , which has been reported to play a critical role in the process of fiber cell elongation. Furthermore, the expression levels of 26 putative candidate genes were analyzed using RNA-seq data obtained during the cotton (TM-1) fiber development stage downloaded from NCBI SRA under accession number PRJNA248163. In total, 11 (42.31%) genes (Gh_D03G1316, Gh_D03G1318, Gh_D03G1319, Gh_D03G1325, Gh_D03G1326, Gh_D03G1330, Gh_D03G1331, Gh_D03G1332, Gh_D03G1337, Gh_D03G1338, and Gh_D03G1339) exhibited higher expression levels from fiber-5 dpa to fiber-20 dpa than other genes (Supplementary Figure S1). To confirm the accuracy of the RNA-seq analysis, we selected Gh_D03G1316, Gh_D03G1318, Gh_D03G1325, Gh_D03G1330, Gh_D03G1332, and Gh_D03G1339 and performed a qRT-PCR analysis to test the reliability of the transcription levels. The qRT-PCR results of these six genes showed trends similar to those observed in the deep sequencing data (Supplementary Figure S2).

Haplotype Analysis of Gh_D03G1338 in the Natural Population
Among these 26 genes, Gh_D03G1338 stood out as an F2KP homolog in the cotton genome. F2KP is a key regulator of carbohydrate metabolism in all eukaryotes (Draborg et al., 2001). In total, 21 exons have been found in Gh_D03G1338, which is similar to F2KP identified in Arabidopsis (Figure 6A). In addition, one non-synonymous variation between two haplotypes leads to a single amino acid substitution of proline (Pro) to serine (Ser) on exon 19 located in the histidine phosphatase superfamily domain by cloning the full coding region of Gh_D03G1338 from 75 accessions selected from natural populations ( Figure 6A). The varieties carrying haplotype A had positive phenotypic effects on the FL and showed a longer FL than the varieties carrying haplotype B (P = 0.0003) (Figure 6B). Further confirming the effect of this gene on FL, according to the qRT-PCR analysis, Gh_D03G1338 had a significantly higher expression in fiber development stages (5 and 10 dpa) of CG3020-3 (long fiber varieties) than that of Ken27-3 (short fiber varieties) (Figure 6C).

DISCUSSION
Single-nucleotide polymorphisms are the most abundant genetic variants in crop genomes (Rafalski, 2002;Ganal et al., 2009;McNally et al., 2009). The rapid development of next-generation sequencing technologies has led to tremendous progress in the development of numerous SNP markers for GWAS analyses and QTL mapping in various crop species, including rice (Han et al., 2016), maize , cotton (Su et al., 2018), and soybean (Sonah et al., 2015). To date, the application of SNP genotyping technology has successfully accelerated genetic research, revealing loci controlling various traits in cotton, including fiber quality , plant architecture (Su et al., 2018), disease resistance (Li T. et al., 2017), and other important agronomic traits . In addition, analyses of SNP haplotypes based on natural populations can be utilized to improve the selection of favorable alleles in breeding programs (Wen et al., 2015). In this study, over 90,000 high-quality SNPs were identified in a diverse set of 355 cotton accessions, and a high-density genetic map was constructed by GBS-seq using 5,280 SNP markers. Subsequently, GWAS and linkage analyses were successfully applied to FL with a greatly improved marker  (Jamshed et al., 2016;Nie et al., 2016). Most importantly, the low cost of using SNPs could effectively reduce labor-intensive and time-consuming processes, providing a convenient and effective alternative for the identification of molecular markers for MAS breeding in the future. Heritability is a main factor that significantly affects the accuracy of QTL analyses (Bernardo, 2004). As expected, most mapped QTLs corresponded to these characteristics with better genetic determination or stable heritability (Said et al., 2015). Generally, a heritability value < 20% is considered low, and a value > 50% is considered high (Stanfield, 1983). In our research, the broad-sense heritability (h 2 ) of FL was estimated to be relatively high at 81%. Similar results have been found in previous studies, and Wang et al. (2015), Jamshed et al. (2016) and Jia et al. (2016) also showed broad-sense heritabilities of 70, 93, and 89%, respectively. Although FL has a relatively high broad-sense heritability, due to the environmental instability of traits, detecting more reproducible QTLs in a few environments is challenging. Therefore, we performed a genome-wide association analysis in eight environments (four locations over 3 years) to detect stable QTLs under multiple environmental conditions and improve the accuracy of the QTLs in the genetic analysis of FL. All 14 significant loci detected in the present study were detected in more than two single environments and the combined analysis (FL-BLUP). Four loci, namely, D02_63765689, D02_63765702, D11_24102240, and D11_24067326, were detected in four environments and the combined analysis. Among these loci, D11_24102240 and D11_24067326 were also identified in the same location (Anyang) over three continuous years (2015)(2016), suggesting that D11_24102240 and D11_24067326 have environmental specificity and great potential for improving FL in the cotton-growing area of the Yellow River region. Notably, one stable QTL on chromosome A03 was detected in five environments, providing an opportunity to apply this locus to MAS breeding to improve fiber quality in various planting areas. In the present study, the normal distribution and high heritability of FL variations were detected in both the natural and biparental populations, suggesting that FL in cotton is a highly polygenic trait and is suitable for QTL analyses.
Previous reports have demonstrated that coexisting At/Dt subgenomes have been under asymmetric domestication and the Dt subgenomes have a substantial influence in determining fiber quality Wang et al., 2017). Over 1,000 fiber quality QTLs have been summarized in the CottonQTLdb database, and 455 (58%) QTLs have been identified in the Dt subgenome (Said et al., 2015). Previous comparative metaanalyses conducted by Rong et al. (2007), Lacape et al. (2010) and Said et al. (2013) also indicated that more QTLs responsible for fiber quality-related traits reside in the Dt subgenome, which has a greater impact than the At subgenome. Furthermore, more fiber quality-related genes, including GhPIS (Long et al., 2018), GhMML4 (Wu et al., 2018), and GhFL2 (Ma et al., 2018) on chromosomes D01, D08, and D11, respectively, have been identified in the Dt subgenome. Similarly, among the 27 QTLs identified in this study, only a few QTLs (19%) were identified in the At subgenome, and 22 (81%) QTLs were identified in the Dt subgenome. This result is consistent with that of previous reports and further supports the hypothesis that the Dt subgenome plays a more important role in determining fiber quality.
In the present study, we detected 14 and 13 QTLs for FL in the natural and biparental populations, respectively. To further screen for QTLs that can be used in MAS and gene cloning with a high accuracy, high stability, and smaller confidence intervals, we compared our results with published research according to the SSR markers (Shen et al., 2007;Zhang et al., 2012;Shao et al., 2014). Three reliable QTLs located on chromosome D02 and D03 were reported in previous studies. Two SNPs (D02_63765689 and D02_63765702) on chromosome D02 were mapped to an adjacent region of BNL2485, which was named qFL03.1 by Shao et al. (2014). qFL-D02-1 identified in 70 cM on chromosome D02 overlapped between NAU990 and NAU1529, which was reported by Shen et al. (2007). In particular, a tightly linked region including D03_41720764 and D03_41721072 was mapped in the vicinity of the common QTL qFL-D03-1/qFL-D03-2 in the F 2 and F 2:3 generations and could explain more than 15% of the observed PV. Furthermore, this region was also near NAU2297 as described by Zhang et al. (2012). Thus, these stable QTLs that are responsible for FL may provide valuable information for cotton breeders using MAS, and these findings provide validation that combining the results of GWAS with traditional QTL mapping can increase confidence in the identity of the main QTLs in cotton research. Over the past decade, numerous studies have reported that several important pathways, such as plant hormone (auxin, gibberellin, and brassinosteroid) signaling (Xiao et al., 2010;Chen et al., 2012), Ca 2+ /K + transporters (Ruan et al., 2001;Huang et al., 2008), vacuolar invertase (Wang and Ruan, 2010) and sucrose synthase , play critical roles in the process of rapid fiber cell elongation and have a significant effect on the molecular mechanisms associated with FL. The locus on chromosome D03 detected in both the GWAS analysis and linkage mapping could explain more than 10% of the variation in cotton FL, and 26 genes were located in a tightly linked LD block interval from 41.52 to 41.91 Mb (Figure 5). Moreover, GWAS analysis showed that haplotype TT had a significant increase in FL across multiple environments in this region (Figure 3). Therefore, we wonder whether there is a major gene associated with FL control that plays a determinate role in this genomic region. After carefully analyzing the 26 genes in this region, we found that the functional annotations of four of these genes are unknown, and three of them are highly expressed during fiber development stage compared with other genes (Supplementary Figure S1). However, the qRT-PCR analysis of these four genes did not significantly differ between the long fiber and short fiber varieties (Supplementary Figure S3). Gh_D03G1338 caught our attention based on the gene annotation of cotton, namely, this gene encodes the bifunctional enzyme fructose-6phosphate 2-kinase (F2KP) with two functional kinase domains. The ortholog of F2KP in A. thaliana is a key factor affecting photosynthetic carbon partitioning between sucrose and starch during photosynthesis (Draborg et al., 2001), and sucrose is the initial substrate required for cellulose synthesis and plays an important role in the development of cotton fibers (Haigler et al., 2001(Haigler et al., , 2007Xu et al., 2012). The results of the blast alignment show that the coding sequence identity of Gh_D03G1338 with the gene of F2KP is as high as 46%, and Gh_D03G1338 encodes a protein sharing 77.60% sequence identity with the F2KP protein. A non-synonymous SNP that caused a change from C (39 cultivars) to T (36 cultivars) at 6,867 bp in the genome region resulted in a change from proline to serine at amino acid position 1,011 in the histidine phosphatase superfamily domain ( Figure 6A). Furthermore, HapA was significantly associated with increased FL (P = 0.0003) (Figure 6B). The RNA-seq data showed that this gene was highly expressed during fiber development from 5 to 20 days post anthesis (dpa) (Supplementary Figure S1). Then, we selected two haplotype varieties (CC and TT) for a qRT-PCR analysis. At fiber-5 dpa and fiber-10 dpa, Gh_D03G1338 in the long fiber varieties with CC was more highly expressed than that in the short fiber varieties with TT ( Figure 6C). Hence, we speculate that this gene may promote cotton fiber elongation by regulating the synthesis of sucrose and diverting more carbon to fiber growth. These results imply that Gh_D03G1338 is the most likely candidate gene responsible for the QTL that controls the FL trait on chromosome D03.

AUTHOR CONTRIBUTIONS
SY, JH, and LL designed the experiments. CZ, QL, and LG collected the accessions. CZ and LL performed the experiments and wrote the manuscript. HlW and HtW revised the language. All authors read and approved the final manuscript.

FUNDING
This work was supported by the Foundation for State Key Laboratory of Cotton Biology (Grant No. CB2017C04).