GWAS Analysis and QTL Identification of Fiber Quality Traits and Yield Components in Upland Cotton Using Enriched High-Density SNP Markers

It is of great importance to identify quantitative trait loci (QTL) controlling fiber quality traits and yield components for future marker-assisted selection (MAS) and candidate gene function identifications. In this study, two kinds of traits in 231 F6:8 recombinant inbred lines (RILs), derived from an intraspecific cross between Xinluzao24, a cultivar with elite fiber quality, and Lumianyan28, a cultivar with wide adaptability and high yield potential, were measured in nine environments. This RIL population was genotyped by 122 SSR and 4729 SNP markers, which were also used to construct the genetic map. The map covered 2477.99 cM of hirsutum genome, with an average marker interval of 0.51 cM between adjacent markers. As a result, a total of 134 QTLs for fiber quality traits and 122 QTLs for yield components were detected, with 2.18–24.45 and 1.68–28.27% proportions of the phenotypic variance explained by each QTL, respectively. Among these QTLs, 57 were detected in at least two environments, named stable QTLs. A total of 209 and 139 quantitative trait nucleotides (QTNs) were associated with fiber quality traits and yield components by four multilocus genome-wide association studies methods, respectively. Among these QTNs, 74 were detected by at least two algorithms or in two environments. The candidate genes harbored by 57 stable QTLs were compared with the ones associated with QTN, and 35 common candidate genes were found. Among these common candidate genes, four were possibly “pleiotropic.” This study provided important information for MAS and candidate gene functional studies.


INTRODUCTION
Cotton is an important cash crop that provides major natural fiber supply for textile industry and human daily life. Four species in Gossypium, namely G. herbaceum (A1), G. arboreum (A2), G. hirsutum (AD1), and G. barbadense (AD2), are cultivated ones. G. hirsutum (2n = 4x = 52, genome size: 2.5 Gb) Wendel and Grover, 2015;Zhang et al., 2015), also called upland cotton, has a high yield potential, whereas fair fiber quality attributes (Cai et al., 2014), thus making it most widely cultivated and utilized worldwide, approximately accounting for 95% of global cotton fiber production (Chen et al., 2007). Along with the progress of technologies in textile industry and improvement of human living standard, the demand for cotton fiber supply not only increases in quantity but also is required in a diverse combination of various qualities such as high strength, natural color, various lengths, and fineness. Fiber quality traits and yield components are quantitative and controlled by multiple genes (Said et al., 2013), yet most of which were negatively correlated with each other (Shen et al., 2007;. Therefore, it is difficult to improve all these traits simultaneously by traditional breeding programs, even after time-consuming and laborious efforts were put (Shen et al., 2005;Lacape et al., 2009;Jamshed et al., 2016;Zhang et al., 2016). The rapid development of applied genome research provides an effective tool for improving plant breeding efficiency, a typical example of which is the marker-assisted selection (MAS) and genome selection through the molecular markers closely linked to target genes or quantitative trait loci (QTLs).
A genome-wide association studies (GWAS) is also an effective approach for connecting phenotypes and genotypes in plants, and helps us to avoid the difficulty of screening large biparental mapping populations, so it is widely applied to various studies (Thornsberry et al., 2001;Flint-Garcia et al., 2005;Maccaferri et al., 2005;Eizenga et al., 2006;Zhu et al., 2008;Jia et al., 2014;Nie et al., 2016) to identify quantitative trait nucleotides (QTNs) for complex traits (Zhao et al., 2011;Fernandes et al., 2012;Segura et al., 2012;Spindel et al., 2015). It has been successfully applied to Arabidopsis thaliana (Atwell et al., 2010;Horton et al., 2012), rice Zhao et al., 2011), corn (Kump et al., 2011;Samayoa et al., 2015), and soybean (Dhanapal et al., 2015;Zeng et al., 2017), and many QTNs and their candidate genes have been identified for various ecological and agricultural traits. More recently, it has also been used in cotton (Abdurakhmonov et al., 2008;Kantartzi and Stewart, 2008;Zeng et al., 2009;Cai et al., 2014;Mei et al., 2013;Zhang et al., 2013;Su et al., 2016;Huang et al., 2017;Sun et al., 2017). To better understand the genetic architecture of fiber quality traits and yield components in upland cotton, we genotyped an intraspecific recombinant inbred lines (RILs) using enriched high-density markers of both singlenucleotide polymorphisms (SNPs) based on the CottonSNP80K arrays (Cai et al., 2017) and simple sequence repeats (SSRs). To obtain reliable QTLs and their candidate genes, we tried to use two strategies. One was linkage-map-based QTL mapping, in which a high coverage genetic linkage map was constructed with HighMap software and QTLs were mapped using composite interval mapping (CIM); the other was GWAS along with four multilocus GWAS methods Tamba et al., 2017;Wen et al., 2017;. The results in the study could be worthy for further studies not only in molecular-assisted breeding through MAS but also in functional gene validations, which is of great significance to the improvement of cotton fiber quality and yield.

Plant Materials
An RIL population of 231 lines was developed from a cross between two homozygous upland cotton cultivars, Lumianyan28 (LMY28), a commercial transgenic cultivar with high yield potential and wide adaptability developed by the Cotton Research Center of Shandong Academy of Agricultural Sciences as a maternal line, and Xinluzao24 (XLZ24), a high fiber quality upland cotton cultivar with long-staple developed by XinJiang KangDi company as a paternal line.
The RIL development was briefed as follows: the cross between LMY28 and XLZ24 was made in the summer growing season in 2008 in Anyang, Henan Province. F 1 were planted and selfpollinated in the winter growing season in 2008 in Hainan Province. In the spring of 2009, 238 F 2 plants were grown and self-pollinated, and F 2:3 seeds were harvested in Anyang (Kong et al., 2011). Of the 238 F 2:3 lines, 231 were self-pollinated in each generation until F 2:6 . Then single plant selection was made from each of the 231 F 2:6 lines to form the F 6:7 population. The F 6:7 population was planted in plant rows and self-pollinated to construct the F 6:8 RIL population. All the generations beyond F 6:8 are regarded as F 6:8 for convenience of analysis. The target traits of the F 6:8 RIL population were evaluated in Henan (Anyang, 2013, designated as 13AY, 14AY, 15AY, and 16AY, respectively), Shandong (LinQing, 2013, Hebei (Quzhou 2013, designated as 13QZ), and Xinjiang (Kuerle 2014 and Alaer 2015, designated as 14KEL and 15ALE, respectively), and a randomized complete block design with two replications was adopted in all nine environmental evaluations. A single-row plot with 5-m row length, 0.8-m row spacing, and 0.25-m plant spacing was adopted in 13AY, 13LQ, 13QZ, 14AY, 14LQ, 15AY, and 16AY, whereas a two-narrow-row plot with 3-m row length, 0.66/0.10-m alternating row spacing, and 0.12-m plant spacing were adopted in 14KEL and 15ALE.

Phenotypic Detection and Data Analysis
Thirty naturally opened bolls from each plot were handharvested on the inner fruiting nods from middle to upper branches. Yield component traits, including boll weight (BW, g), lint percentage (LP, %), and seed index (SI, g), were evaluated. No less than 15 g fibers were sampled to evaluate the fiber quality traits, including fiber length (FL, mm), fiber strength (FS, cN tex −1 ), and fiber micronaire (FM). The evaluations were conducted using HFT9000 (Premier Evolvics Pvt. Ltd., India) instruments with HVICC Calibration in the Cotton Quality Supervision, Inspection and Testing Center, Ministry of Agriculture, Anyang, Henan Province, China.
One-way analysis of variance (ANOVA) between parents and the descriptive statistics for the RIL population was conducted using Microsoft Excel 2016, and correlation analysis was performed using SPSS 20.0 (SPSS, Chicago, IL, United States). Integrated ANOVA across nine environments along with the heritability of all the traits was conducted using ANOVA function in the QTL IciMapping software.

DNA Extraction and Genotyping
Genomic DNA was extracted from fresh leaves of parents and 231 RILs with a modified cetyltrimethyl ammonium bromide (CTAB) method (Song et al., 1998). The DNA was used both for SSR screening and CottonSNP80K array hybridization.
A total of 9668 pairs of SSR primer pool, which contained a variety of sources including NAU, BNL, DPL, CGR, PGML, SWU, and CCRI, were used to screen the polymorphisms between parents. The primer information was also available at the CottonGen Database 1 . PCR amplification and product detection were conducted according to the procedures described by Zhang et al. (2005). The polymorphic primers between the parents were used to genotype the population, and the SSR markers that were codominant and had a unique physical location in the reference genome were used to construct the linkage map.
The cottonSNP80K array, which contained 77,774 SNPs (Cai et al., 2017), was used to genotype the parents and the 231 RILs. The genotyping was conducted according to the Illumina suggestions (Illumina Inc., San Diego, CA, United States) (Cai et al., 2017). After genotyping, the raw data were filtered based on the following criteria : first, any or both of the SNP loci of parents were missing (69,395 SNPs were remained after filtering); second, the loci had no polymorphism between parents (15,128 loci were remained); third, the loci of any of the parent were heterozygous (7480 SNPs were remained); forth, the missing rate of SNPs in the population was more than 40% (Hulse-Kemp et al., 2015) (7479 loci were remained); and finally, the segregation distortion of SNPs reached criteria of P < 0.001 (5202 loci were remained). Subsequently, the remaining SNP markers were applied to the genetic map construction after converting into the "ABH" data format as SSR.

Genetic Map Construction
The remaining SSR and SNP markers were divided into the 26 chromosomes based on their position on the physical map of the upland cotton (TM-1) genome database . Then, the genetic linkage map was constructed using the HighMap software with multiple sorting and error-correcting functions . Map distances were estimated using Kosambi's mapping function (Kosambi, 1943).
The significance of segregation distortion markers (SDMs; P < 0.05) was detected using the chi-square test. The regions containing at least three consecutive SDMs were defined as segregation distortion regions (SDRs) . The distribution of SDMs and SDRs, and the size of SDRs on the map were analyzed.

QTL Mapping and Genome-Wide Association Studies
The Windows QTL Cartographer 2.5 software  was employed using the CIM method with a mapping step of 1.0 cM and five control markers (Zeng, 1994) for QTL identification. The threshold value of the logarithm of odds (LOD) was calculated by 1000 permutations at the 0.05 significance level. QTLs, identified in different environments and had fully or partially overlapping confidence intervals, were regarded as the same QTL. The QTL detected in at least two environments was regarded as a stable one. Nomenclature of QTL was designated following Sun's description (Sun F.D. et al., 2012). MapChart 2.3 (Voorrips, 2002) was used to graphically represent the genetic map and QTL.
Quantitative trait nucleotides for the target traits were identified by four multilocus GWAS methods. The first one is mrMLM , in which calculate Kinship (K) matrix model was used, with critical P-value of 0.01, search radius of the candidate gene of 20 kb, and critical LOD score for significant QTN of 3. The second one is FASTmrEMMA , with restricted maximum likelihood, in which calculate K matrix model was used, critical P-value of 0.005, and critical LOD score for significant QTN of 3. The third one is ISIS EM-BLASSO , with critical Pvalue of 0.01. The fourth one is pLARmEB ; each chromosome selected 50 potential associations at a critical LOD score of 2 with variable selection through LAR.

QTL Congruency Comparison With Previous Studies
Previous QTLs for the target traits were detected and downloaded in the CottonQTLdb database 2 (Said et al., 2015). The QTLs sharing similar genetic positions (spacing distance < 15 cM) were regarded as common or same QTL. The physical positions of a QTL were identified in the CottonGen database 3 . When a QTL in the current study shared the same physical region as the previous QTL, it was regarded as a repeated identification of the previous QTL; otherwise, the QTL in the current study was regarded as a new one.

The Candidate Genes Identification
Candidate genes harbored in the stable QTLs were searched and identified based on their confidence intervals in the following steps: The markers including the closest flanking ones in the confidence interval of a QTL were identified. The physical interval of that QTL was determined based on the physical position of its markers in the upland cotton (TM-1) genome 4 . All the genes in the physical interval were identified as candidate genes.
Candidate genes associated with QTNs in the multilocus GWAS analysis were confirmed based on the location of QTNs in the upland cotton (TM-1) reference genome . The gene in which the QTL was located was considered as the candidate gene. But when the physical location of a QTN was between two genes, both of the genes were considered as candidate genes.

Phenotypic Evaluation of the RIL Populations
The one-way ANOVA between parents in nine environments showed that a significant difference for FS at the 0.001 level and no significant differences for the other traits were observed ( Table 1). The descriptive statistical analysis showed that all traits in the RIL population performed transgressive segregations, with approximately normal distribution in all the nine environments ( Table 1). The integrated ANOVA of the RILs across nine environments also revealed significant variations for all traits among the RILs (Supplementary Table S1).
Most of the traits exhibited medium-high heritability across nine environments (Supplementary Table S2). Correlation analysis showed that significant or very significant positive correlations were observed between the trait pairs of FL-FS, FL-SI, FS-SI, FM-LP, FM-BW, and SI-BW; and significant negative correlations were observed between the pairs of FL-FM, FL-LP, FS-FM, FS-LP, BW-LP, and SI-LP. In addition, FL-BW showed a significant or very significant positive correlation in three environments, whereas no significant correlation was observed in the remaining six environments ( Table 2).

Genetic Map Construction
The genetic linkage map totally covered 2477.99 cM of the upland cotton genome with an average adjacent marker interval of 0.51 cM (Figure 1 and Table 3). It contained 4851 markers, including 4729 SNP and 122 SSR loci, with uneven distributions in the A t and D t subgenomes as well as on 26 chromosomes. A total of 3300 markers were mapped in the A t subgenome, covering a genetic distance of 1474.63 cM with an average adjacent marker interval of 0.45 cM. On the other hand, a total of 1551 markers were mapped in the D t subgenome, covering a genetic distance of 1003.36 cM with an average adjacent marker interval of 0.65 cM. At the chromosome level, chr08 contained the maximum number of markers (481 markers), spanning a genetic distance of 142.55 cM with an average adjacent marker interval of 0.32 cM. chr17 contained the minimum number of markers (19 markers), spanning a total genetic distance of 60.60 cM with an average adjacent marker interval of 3.56 cM. Gap analysis revealed that there were 33 gaps (≥10 cM), of which 19 were in the A t subgenome with the largest of 22.68 cM on chr07, whereas 14 were in the D t subgenome with the largest of 42.23 cM on chr17. chr11, chr16, chr19, chr20 and chr24 had no gap larger than 10 cM.

Segregation Distortion
There were a total of 1,563 SDMs (32.22%) (P < 0.05), which were unevenly distributed at both subgenome and chromosome levels ( Tables 3 and Supplementary Table S3). One thousand and sixty-one SDMs were found in the A t subgenome, whereas 502 in the D t subgenome. chr08 had the maximum number of SDMs of 237 (15.16% of total SDMs). The SDMs formed 110 SDRs, of which 66 were in the A t subgenome whereas 44 in the D t subgenome. chr05 contained the maximum number of SDRs of 10. There was no SDR in chr03 and chr17.

Collinearity Analysis
The reliability of the genetic map was usually assessed by comparing it with the physical maps of the upland cotton (TM-1) reference genome . The results of the collinear analysis are shown in Figure 2. The results revealed an overall good congruency between the linkage map and its physical one, while there also existed some discrepancies between the two on chr03, chr06, chr08, and chr13 in the A t subgenome and on chr15, chr16, chr17, chr19, chr22, chr23, and chr26 in the D t subgenome. The collinearity in subgenomes revealed that the A t subgenome showed a better compatibility between the linkage and the physical maps than the D t subgenome did.

QTL Mapping for Fiber Quality Traits and Yield Components
A total of 256 QTLs (Supplementary Table S4), 134 for fiber quality traits, and 122 for yield components, were identified across nine environments using the CIM algorithm, with 1.68-28.27% proportions of the phenotypic variance (PV) explained by each QTL. Fifty-seven stable QTLs (Figure 3 and Supplementary  Table S4) were identified in at least two environments, of which 32 were for fiber quality traits and 25 for yield components.

GWAS for Fiber Quality Traits and Yield Components
A total of 209 and 139 QTNs were identified by four multilocus GWAS methods to be associated with fiber quality and yield component traits, respectively, in the current study (Supplementary Table S6). Among these QTNs, 74 were simultaneously found by at least two algorithms or in two environments (Supplementary Table S6), each with 0.15-47.17% proportions of the observed PVs explained, and a total of 104 candidate genes were mined.

Fiber Quality Traits
A total of 68, 65, and 76 QTNs were found to be associated with FL, FS, and FM, respectively, and the corresponding 110, 99, and 126 candidate genes were identified. In these QTNs, 11 for FL, 17 for FS, and 22 for FM were simultaneously associated by at least two algorithms or in two environments, and each could explain 0.15-29.10, 1.43-47.17, and 2.54-41.39% proportions of the observed PVs, respectively.

Yield Components
A total of 51, 50, and 38 QTNs were found to be associated with BW, LP, and SI, respectively, and the corresponding 82, 83, and 65 candidate genes were identified. In these QTNs, 9 for BW, 5 for LP, and 10 for SI were simultaneously associated by at least two algorithms or in two environments, and each could explain 3.41-28.76, 3.00-22.49, and 1.21-38.73% proportions of the observed PVs, respectively.  Table S6). Annotation analysis of the 35 common genes from these two candidate gene pools revealed that 33 of them had annotation information, whereas 8 had unknown function (Supplementary Table S7). In the gene ontology (GO) analysis of the candidate gene for fiber quality (Supplementary Table S8), 24, 17, and 29 candidate genes were identified in the cellular component, molecular function, and biological process category, respectively. In the cellular component category, three main brackets of cell (six genes), cell part (six genes), and organelle (five genes) were enriched, whereas in the molecular function category, two main brackets of binding (eight genes) and catalytic activity (six genes), and in biological process category, four main brackets of metabolic process (seven genes), single-organism process (seven genes), cellular process (five genes), and response to stimulus (five genes) were, respectively, enriched ( Figure 4A). In gene ontology (GO) analysis of the candidate gene for yield components (Supplementary Table S10), 19, 13, and 27 candidate genes were identified in the cellular component, molecular function, and biological process category, respectively. In the cellular component category, three main brackets of cell (five genes), organelle (five genes), and cell part (five genes) were enriched, whereas in the molecular function category, two main brackets of binding (six genes) and catalytic activity (five genes), and in the biological process category, four main brackets of singleorganism process (seven genes), metabolic process (five genes), cellular process (five genes), and localization (four genes) were, respectively, enriched ( Figure 4B). Kyoto encyclopedia of genes and genomes (KEGG) analysis indicated that six candidate genes for fiber quality were involved in 10 pathways and two candidate genes for yield were involved in six pathways (Supplementary Tables S9, S11).

DISCUSSION
The High-Density Genetic Map Construction and Its Reliability The development of high-throughput sequencing technology enabled its applications in genotyping the accessions of both natural populations for GWAS and segregating ones for map Frontiers in Plant Science | www.frontiersin.org construction and QTL identification to be accumulated to agricultural important crops Kump et al., 2011;Dhanapal et al., 2015;Zeng et al., 2017). SNPs provided abundant genetic variation loci at the genome level and much improved the genome coverage and marker saturation when they were applied to genetic map construction (Agarwal et al., 2008;Hulse-Kemp et al., 2015;Cai et al., 2017). At present, two sets of SNP arrays were developed for Gossypium (Hulse-Kemp et al., 2015;Cai et al., 2017). Different from the first set of CottonSNP63K arrays (Hulse-Kemp et al., 2015;Zhang Z. et al., 2017), which was developed by international consortium of several different studies (Hulse-Kemp et al., 2015), the CottonSNP80K array (Cai et al., 2017) was developed from the re-sequencing of 100 upland cotton cultivars and the TM-1 genome database . Even though both sets were successfully applied in upland cotton linkage map construction and QTL identifications (Hulse-Kemp et al., 2015;Cai et al., 2017;Zhang Z. et al., 2017;Tan et al., 2018), the second set could have a higher genotyping accuracy, better coverage, and representative of hirsutum genome (Cai et al., 2017;Tan et al., 2018). In the current study, a linkage map was constructed mainly using SNP markers from the CottonSNP80K array in combination with SSR ones. The map spanned a total genetic length of 2477.99 cM, containing 122 SSR and 4729 SNP markers, with an average marker interval of 0.51 cM between adjacent markers. Compared with previous SSR maps (Shappley et al., 1998;Shen et al., 2005;Sun F.D. et al., 2012;, the current map contained more markers and were more effective in map construction Li et al., 2016;Zhang et al., 2016;Zhang Z. et al., 2017;Tan et al., 2018), and exhibited a high consistency with the genomic distribution of the SNP array, which demonstrated its representativeness in map construction (Figure 2; Cai et al., 2017). The reliability of the genetic map is also estimated by gap size, collinearity, and segregation distortion analyses (Figure 2 and Table 3). Although the development of SNP markers was based on the CottonSNP80K array, a few chromosomes still had a large gap or uneven distribution of makers Zhang Z. et al., 2017). Totally, there were 33 gaps larger than 10 cM, of which the largest one was of 42.23 cM on chr17 and there were only 19 markers mapped on it. The result of collinearity between the genetic map and the G. hirsutum (TM-1) reference genome indicated accuracy and quality of the map.
The segregation distortion is recognized as strong evolutionary force in the process of biological evolution (Taylor and Ingvarsson, 2003), which was also a common phenomenon in the study of genetic mapping (Shappley et al., 1998;Ulloa et al., 2002;Jamshed et al., 2016;Zhang et al., 2016;Tan et al., 2018). The current study observed that 32.22% of the total mapping markers were SDMs (P < 0.05). The maximum SDMs were on chr08, where there were 237 SDMs of the total 481 markers, forming five SDRs (Figure 1). This was in consistency with the SSR map constructed from the F 2 population of the same parents of the current study (Kong et al., 2011). However, some studies observed an increase of the SDM ratio from F 2 generation to the completion of RILs (Tan et al., 2018). This phenomenon was influenced by plenty of factors, including genetic drift (Shen et al., 2007) of mapping population, pollen tube competition, preferential fertilization of particular gametic genotypes, and others Zhang Z. et al., 2017;Tan et al., 2018). In the current study, some chromosomal uneven distribution of QTLs in SDR versus normal regions was also observed in chr01, chr06, chr07, chr10, chr16, chr19, and chr20. These facts implied an impact of the selections being imposed during the construction of the RIL population.
In addition, four multilocus GWAS algorithms were applied to the association of QTNs with the target traits, and their results were compared with the previous identified QTLs (Said et al., 2015). The results confirmed that quite a ratio of QTNs were coincided in the physical regions of the confidence intervals of reported QTLs in the database, namely 43 QTNs for FL, 44 QTNs for FS, 51 QTNs for FM, 40 QTNs for BW, 34 QTNs for LP, and 25 QTNs for SI (indicated with asterisks in Supplementary  Table S6). The remaining QTNs could possibly be novel QTNs, of which 27 were associated by at least two algorithms or in two environments. These loci could be of great significance for cotton molecular-assisted breeding, particularly the loci of TM9941 and TM54893, which were identified both by multiple algorithms and in multiple environments for more than one target trait.
Based on linkage disequilibrium, GWAS is an effective genetic analysis method to dissect the genetic foundation of complex traits in plants in natural populations. The four multilocus GWAS algorithms provided promising alternatives in GWAS. Usually, GWAS needed a large panel size with sufficient marker polymorphism (Bodmer and Bonilla, 2008;Manolio et al., 2009), and was effective to identify major loci while ineffective to rare or polygenes (Asimit and Zeggini, 2010;Gibson, 2012) in the population. Linkage analysis in segregating populations could effectively eliminate the false-positive results, which was a builtin defect of GWAS in natural populations. But linkage analysis usually identified large DNA fragments, which made it difficult to further study the initial identification results. In the current study, both GWAS and linkage analysis were applied in the segregating RILs to study the correlations between genotypes and phenotypes. When comparing the results of GWAS to the QTLs of both previous studies (Said et al., 2015) and current study, common loci (genes) (Supplementary Table S7) demonstrated the effectiveness and feasibility of multilocus GWAS methods to address the correlation between genotypes and phenotypes in segregating RILs. Especially under the condition of increased marker density and improved genome coverage, the accuracy of QTN identification in GWAS would also increase. The increased accuracy probably rendered the application of GWAS in segregating population to have a higher effect on the observed PVs, sometimes even higher than that of QTL on the PVs in linkage analysis, which was usually low in natural populations.

Congruency and Function Analysis of Candidate Genes
In this study, candidate genes were identified independently both from the physical region in the marker intervals of the QTLs, which were identified by CIM (Zeng, 1994) in WinQTL Cartographer 2.5 , and from the physical position of the QTNs, which were associated by multilocus GWAS algorithms. As the CIM algorithm gave not only the QTL position where the highest LOD value located, but also a marker interval of that QTL, the physical regions where the marker interval resided by QTL/QTN were used to search the candidate genes around the QTLs. To avoid redundant genes, the markers, which resided far away from the physical positions of the rest in the same confidence interval, were discarded for consideration of candidate gene searching. This increased the accuracy of the functional analysis of the candidate genes harbored in the confidence intervals of QTLs.
When comparing both candidate gene lists, even if they were not completely consistent, they still revealed a good congruency of candidate gene identification from both algorithms of QTL/QTN; namely, three congruent candidate genes for FL, seven for FS, nine for FM, five for BW, eight for LP, and nine for SI were identified (Supplementary Table S7). Further analysis of these candidate genes indicated that 1 for FL, 17 for FS, and 2 for FM (indicated with asterisks in Supplementary  Table S6) were congruent with some previous reports Sun et al., 2017). Two candidate genes, Gh_D102255 (a protein kinase superfamily gene) and Gh_A13G0187 (actin 1 gene), which were for fiber quality, were also reported to participate in fiber elongation Huang et al., 2008). Gh_A07G1730 and Gh_D03G0236 belonged to a WD40 protein superfamily were mainly involved in yield formation in the current study, and might be related to a series of functions (Sun Q. et al., 2012;Gachomo et al., 2014). Gh_D11G1653 (myb domain protein 6) functioned in BW formation, whereas reports indicated that several members of MYB family were involved in fiber development (Suo et al., 2003;Machado et al., 2009;Sun et al., 2015;Huang et al., 2016). Findings in the current study also indicated that some candidate genes could possibly be "pleiotropic, " namely Gh_A07G1744 for FS, FM, and SI; Gh_A07G1745 for FS and FM; Gh_A07G1743 for BW and SI; and Gh_D08G0430 for FM and BW. These candidate genes could be of great significance for further studies including functional gene cloning as well as cultivar development.

CONCLUSION
The enriched high-density genetic map, which contained 4729 SNP and 122 SSR markers, spanned 2477.99 cM with a marker density of 0.51 cM between adjacent markers. A total of 134 QTLs for fiber quality traits and 122 for yield components were identified by the CIM, of which 57 are stable. A total of 209 and 139 QTNs for fiber quality traits and yield components were, respectively, associated by four multilocus GWAS algorithms, of which 74 QTNs were detected by at least two algorithms or in two environments. Comparing the candidate genes harbored in 57 stable QTLs with those associated with the QTN, 35 were found to be congruent, 4 of which were possibly "pleiotropic." Results in the study could be promising for future breeding practices through MAS and candidate gene functional studies.

AUTHOR CONTRIBUTIONS
WG and YY initiated the research. WG, RL, and QC designed the experiments. RL, XX, and ZZ performed the molecular experiments. JG, JL, AL, HS, YS, QG, QL, MI, XD, SL, JP, LD, QZ, XJ, XZ, and AH conducted the phenotypic evaluations and collected the data from the field. RL, WG, YY, and HG performed the analysis. RL drafted the manuscript. YY and WG finalized the manuscript. All authors contributed in the interpretation of results and approved the final manuscript.