GWAS Mediated Elucidation of Heterosis for Metric Traits in Cotton (Gossypium hirsutum L.) Across Multiple Environments

For about a century, plant breeding has widely exploited the heterosis phenomenon–often considered as hybrid vigor–to increase agricultural productivity. The ensuing F1 hybrids can substantially outperform their progenitors due to heterozygous combinations that mitigate deleterious mutations occurring in each genome. However, only fragmented knowledge is available concerning the underlying genes and processes that foster heterosis. Although cotton is among the highly valued crops, its improvement programs that involve the exploitation of heterosis are still limited in terms of significant accomplishments to make it broadly applicable in different agro-ecological zones. Here, F1 hybrids were derived from mating a diverse Upland Cotton germplasm with commercially valuable cultivars in the Line × Tester fashion and evaluated across multiple environments for 10 measurable traits. These traits were dissected into five different heterosis types and specific combining ability (SCA). Subsequent genome-wide predictions along-with association analyses uncovered a set of 298 highly significant key single nucleotide polymorphisms (SNPs)/Quantitative Trait Nucleotides (QTNs) and 271 heterotic Quantitative Trait Nucleotides (hQTNs) related to agronomic and fiber quality traits. The integration of a genome wide association study with RNA-sequence analysis yielded 275 candidate genes in the vicinity of key SNPs/QTNs. Fiber micronaire (MIC) and lint percentage (LP) had the maximum number of associated genes, i.e., each with 45 related to QTNs/hQTNs. A total of 54 putative candidate genes were identified in association with HETEROSIS of quoted traits. The novel players in the heterosis mechanism highlighted in this study may prove to be scientifically and biologically important for cotton biologists, and for those breeders engaged in cotton fiber and yield improvement programs.

For about a century, plant breeding has widely exploited the heterosis phenomenonoften considered as hybrid vigor-to increase agricultural productivity. The ensuing F 1 hybrids can substantially outperform their progenitors due to heterozygous combinations that mitigate deleterious mutations occurring in each genome. However, only fragmented knowledge is available concerning the underlying genes and processes that foster heterosis. Although cotton is among the highly valued crops, its improvement programs that involve the exploitation of heterosis are still limited in terms of significant accomplishments to make it broadly applicable in different agro-ecological zones. Here, F 1 hybrids were derived from mating a diverse Upland Cotton germplasm with commercially valuable cultivars in the Line × Tester fashion and evaluated across multiple environments for 10 measurable traits. These traits were dissected into five different heterosis types and specific combining ability (SCA). Subsequent genome-wide predictions along-with association analyses uncovered a set of 298 highly significant key single nucleotide polymorphisms (SNPs)/Quantitative Trait Nucleotides (QTNs) and 271 heterotic Quantitative Trait Nucleotides (hQTNs) related to agronomic and fiber quality traits. The integration of a genome wide association study with RNA-sequence analysis yielded 275 candidate genes in the vicinity of key SNPs/QTNs. Fiber micronaire (MIC) and lint percentage (LP) had the maximum number of associated genes, i.e., each with 45 related to QTNs/hQTNs. A total of 54 putative candidate genes were identified

INTRODUCTION
The phenomenon of biological progeny outperforming either of their parents is defined as heterosis (Shull, 1922). The concept of heterosis dates back to early experiments on inbreeding and its complementing hybrid vigor (Shull, 1908(Shull, , 1909. Generally, heterosis is assumed to be highly characteristic of allogamous crops but less common in purely autogamous crops for improvements in their total growth rate, fitness, and biomass production, as well as yield (Lippman and Zamir, 2007;Chen, 2013;Schnable and Springer, 2013).
Highly conceptual quantitative genetic models attributed to heterosis, known as dominance (Xiao et al., 1995), over dominance (Li et al., 2001(Li et al., , 2008 and epistasis (Yu et al., 1997), are considered insufficient for explaining its basic molecular mechanism. Currently, many omics studies are trying to describe changes in gene expression across genome and histone modifications, deoxyribonucleic acid (DNA) methylation, and micro RNAs. These aspects are being studied in hybrids and their parents as well, but nevertheless the genetic mechanism underlying this phenomenon remains elusive (He et al., 2010(He et al., , 2013Fujimoto et al., 2012;Groszmann et al., 2015;Miller et al., 2015). With the revolution in computational methods and extensive advancements in genome sequencing methods, deployment of genome-wide association studies (GWAS) has proven to be a tremendously powerful tool. It has been applied especially for exploring the specific genetic loci potentially accountable for heterotic traits in crop plants (Atwell et al., 2010;Kump et al., 2011;Huang et al., 2012;Meijón et al., 2014).
Cotton is widely cultivated across the globe as a natural fiber crop on a commercial basis. In this respect, Gossypium hirsutum is responsible for approximately 95% of cotton production worldwide (Grover et al., 2015). China is considered a top cotton-growing territory given the vast number of genotypic diversity and agro-ecological zones for cotton that exist in the country. Although both wider adaptability and increased productivity attributes are associated with Upland Cotton crop, the low quality of its fiber product requires novel improvements and advances in spinning technology. According to previous investigation, we know now that a substantial amount of heterosis  (Sarfraz et al., 2018). Both India and China are enjoying substantial benefits of hybrid cotton via a cryptic process of heterosis since the last century (Basunanda et al., 2010). Nowadays, the focus of most studies in top edible crops like maize (Frascaroli et al., 2007), rapeseed (Radoev et al., 2008), and rice (Xiao et al., 1995) is on the heterosis mechanism. Linkage mapping studies that utilized segregating populations of these crops have found more than a single gene involved and related mechanisms for hybrid vigor existing among them (Schnable and Springer, 2013). Furthermore, the underlying genetic basis of heterosis in maize and rice is somewhat different. The prominent peculiarity related to the fact seems to be highly correlated to their self-pollinated or open-pollinated nature (Garcia et al., 2008). There is now a need to also study this confounding process systemically in Upland Cotton.
The use of bi-parental crossing scheme yields little additional information, but GWAS or linkage disequilibrium (LD) mapping have emerged as extensively utilized, powerful techniques for the genetic dissection of complex mechanisms via high density molecular markers (Zhu et al., 2008). Further, single nucleotide polymorphism (SNP) assays have empowered GWAS for such studies, especially as related to intricate cotton traits. Yet only a few reports of GWAS mapping have used F 1 populations based on SNP markers in cotton.
Accordingly, this study was planned and executed in which GWAS was used to detect allelic variation in the genome of cotton and identify candidate SNPs strongly associated with economic quantitative traits. Restriction-site associated DNA (RAD) sequencing was applied to 1136 F 1 individuals of upland cotton; these were then evaluated phenotypically in 10 environments over 2 years. The current study aimed at the (i) detection of SNP loci associated with trait performance of F 1 hybrids and of Quantitative Trait Nucleotides (QTNs) related to the heterosis of these traits and (ii) the identification and validation of potential candidate genes, especially heterotic Quantitative Trait Nucleotide (hQTNs), related to the investigated traits. Line × Tester mating design was proposed for the current study to obtain different variables related to heterosis and combining abilities, for the purpose of utilizing them as variables in association studies. For this purpose, 284 female parents (lines) were crossed with four testers as males, namely 7886, A971Bt, 4133Bt, SGK9708 coded, respectively as tester A, C, D, and E to produce the F 1 hybrid populations. The ensuing F 1 hybrids were divided into four groups (SetA, SetC, SetD, and SetE) according to their respective parents.

MATERIALS AND METHODS
Each set of F 1 hybrids along with their respective parents were subjected to a field evaluation for plant phenotyping. Field plantations of the experimental material were established during the crop seasons of 2012 and 2013 at 12 locations in cotton-growing belts of China, these mainly spanning parts of the Yangtze River region (Changsha, Changde, Jiujiang, Hefei, Wuhan, and Jingzhou) and the Yellow River region (Anyang [ICR], Anyang [Beibi], Hejian, Dongying, Baoding, and Xinxiang). These locations were selected on the basis of significant differences in agro-ecological features including climate, amount of precipitation, temperature, soil fertility, growing period, and cultural practices.
Field experiments were carried out that used a triplicated randomized complete blocked design (RCBD) at all 12 locations. A single row of genotypes was 8.0 m, row × row distance was maintained at 0.8 m the plant × plant distance was kept at 0.3 m in the Yellow River region and at 0.5 m in the Yangtze River region. The distance between replications was 1.0 m. Local cotton growing practices were followed for sowing; i.e., direct sowing of delinted seeds/seeds with lint or transplanting of seedlings using the method predominantly used in that given region. All recommended agronomic practices-fertilizer application, seed treatment, seed rate, sowing methods, thinning, cultural practices, irrigation, insect pest control and weed managementwere followed in similar manner to establish and maintain a good crop stand on all 12 experimental locations.
Data collection for all traits under study was carried out for all experimental units, by following the unanimous standard Descriptors for Data collection used for Cotton Germplasm, which was developed based on guidelines issued by the International Plant Genetic Resources Institute (IPGR). Ten individual guarded plants were randomly selected and tagged for data collection related to agronomic traits as well as qualityrelated characters. When the crop had about 70% open bolls, 30 bolls from each tagged individual plant (middle branches) per plot were harvested and examined for agronomic and fiber quality traits. The ginning of collected seed cotton samples was done using a roller-gin. About 150 g of ginned and clean lint samples were taken and sent to the Laboratory of Quality and Safety Risk Assessment for Cotton Products, Anyang, Henan, China, to examine fiber quality-related traits. Fiber quality analysis was carried out using high-volume instrument (HVI). Ten phenotypic traits-boll weight (BW), lint percentage (LP), fiber fineness or micronaire (MIC), fiber strength (FS), fiber length (FL), fiber elongation (FE), fiber uniformity (FU), fiber uniformity index (FUI) number of bolls per plant (BN) and plant height (PH)-were recorded (Supplementary Table 2) from 284 individuals of the above-mentioned F 1 populations, as well as their parents, planted at different locations as experimental units during the two different years.

Sample Preparation for RAD Sequencing
Young fresh leaves were collected from each genotype and immediately frozen and then stored at −80 • C. Genomic DNA was extracted following the CTAB method (Paterson et al., 1993) albeit with some modifications. The purified DNA was digested with FastDigest TaqI (Thermo Scientific Fermentas, United States), at 65 • C for 10 min. Bar-coded adapters were ligated to the digested DNA fragments with T4 DNA ligase (Enzymatics, United States), during 1 h incubation at 22 • C. The samples were then heated at 65 • C for 20 min, after which 24 samples were pooled. The DNA fragments (400-600 bp) were purified from 2% agarose gel electrophoresis with the help of the QIA-quick Gel Extraction kit (QIA, Qiagen, Valencia, CA, United States). Adapter-ligated DNA fragments were further amplified by polymerase chain reaction (PCR), using the Phusion-High-fidelity DNA-polymerase (Finnzymes, Thermo Scientific, United States). Next, these amplified fragments were separated via agarose gel electrophoresis, and the ensuing DNA fragments (400-600 bp) were purified with the QIA-quick PCR Purification kit (Qiagen, Germany). Finally, the purified libraries were quantified on a 2100 Bioanalyzer (Agilent, United States) and each library sequenced by the Hi-Seq 2000 system (Illumina, United States). The raw reads were then aligned with the G. hirsutum L. TM-1 reference genome (v.1.1) 1 , using the "mem -t 8" parameter of the BWA program . The GATK and SAMTools packages were used for SNP calling, after which any SNPs with a high missing-data rate (>40%) and a low minor allele frequency (MAF) (<5%) were eliminated (Li et al., 2009;McKenna et al., 2010). The generated sequencing data have been deposited into the NCBI database (accession number: PRJNA353524).

Phenotypic Data Analysis
The collected data for 10 agronomic and fiber quality traits (BW, LP, MIC, FS, FL, FE, FU, FUI, BN, and PH) recorded from 284 F 1 crosses were subjected to univariate analysis for determining variability among the studied traits (Gomez et al., 1984). The relative increase or decrease (in percentage) for F 1 hybrids over their respective parental values was determined for the estimation of possible heterotic effects for the agronomic and quality traits, by using these formulas (Fehr, 1987): Heterobeltosis was calculated this way: Mid-parent heterosis (MP) was calculated this way: Competitive heterosis (K) over local cultivar(s)/Check(s) (CK) was calculated as: x 100 In the current study, two standard heterosis measures, K3 and K4, were calculated by using two commercial Chinese cotton cultivars, i.e., Rui za 816 and Eza mian 10 hao (Tai D5), respectively. The heterosis index (HI) was calculated as follows: The specific combining ability (SCA) variance was calculated by using Line × Tester variance analysis (Singh and Chaudhary, 1977).

Genotypic Data Analysis
To explore different genetic factors presumably associated with heterosis in cotton, GWAS was performed by considering familial relatedness as well as population structure (Yu et al., 2006), utilizing the data for all traits under investigation for 2 years at 10 locations. The experimental genotypes were examined using "Restriction-site associated DNA (RAD) sequencing." The BWA v.0.7.12 software was utilized to analyze all the SNP data. Only the reads mapped uniquely to the reference genome and the SNPs with high missing rate (>40%) and MAF (<5%), were considered for elimination for conducting GWAS.
The paired-end reads of each individual were identified by its barcode and aligned against the reference genome, using the BWA v.0.7.12 (Staples et al., 2014). The program SAMTools v.0.1.18 (Price et al., 2006) was used to generate the consensus sequences for every individual under study and further preparation of input data for SNP calling; the latter was carried out by realSFS v0.983 2 , using Bayesian-based estimation. The data obtained from the four sets of F 1 genotypes along with parents were considered for calculations, based on principle that the 284 female parents and 4 male parents must be homozygous; otherwise, they were removed. Moreover, only those female and male parents having a different genotype (e.g., AA, BB) were considered for analysis, to ensure a heterozygous F 1 genotype. The expected F 1 genotype was calculated, by focusing on the genotype of respective male and female parent and heterozygous SNPs in either of the parents have been scored as missing.
Single nucleotide polymorphisms that met the following criteria were removed: (1) Length (distance) between two adjacent SNP loci was less than 5 bp.
The GWAS analysis was performed on filtered high-quality SNPs, using EMMAX software, by following an efficient mixedmodel association-expedited model designation, as described by Kang et al. (2010), for which a threshold of p = 1.0 × 10 −5 was used throughout. For the visualization of results, Manhattan and quantile-quantile plots were constructed in R using the package 2 http://128.32.118.212/thorfinn/realSFS/ FIGURE 1 | Violin plots based on phenotypic variation of ten agronomic and fiber quality traits of F 1 hybrids (Y axis) from four sets (SetA, SetC, SetD, and SetE) across multiple environments for two years; 2012 and 2013 (Xaxis). Legends on top right in different colors are representing ten evaluated phenotypic traits. "qqman." The peak SNPs with the highest p-value as well as their detection across multiple environments, were considered as key SNPs. For further confirmation, the favorable allelic variations of the key SNPs were identified for each trait variable (trait phenotype, SCA, and heterosis types). Box plots for the relative phenotypic values were drawn in R software. The HAPLOVIEW 4.2 software (Zhang and Endrizzi, 2015) was used to carry out the haploblock analysis. All genomic positions provided here are based on the G. hirsutum L. reference genome (v.1.1) .
Gene ontology (GO) analysis was performed using the cotton functional genomics database 3 , to propose annotated putative candidate genes for each locus. For the transcriptome-based predictions, the gene expression database (TM-1)  was used for the assessment of specific expression patterns of these nearby genes across various tissues: an organ or perhaps different growth and development stages of cotton viz. root, leaf, stem, torus, petal, stamen, pistil, and fiber (5 DPA to 25 DPA) and ovule (−3 DPA to 35 DPA). By applying the abovementioned criteria within a 100-kb flanking window, candidate genes were thus selected. The differential expression patterns of these genes (i.e., those with expression level >1) were plotted in a heatmap.

Phenotypic Characteristics Evaluation
The results shown in Figure 1 revealed variation among the different agronomic and fiber quality traits performance for F 1 s and parents. Upper and lower ends as shown in the vertical projections are assumed to represent highest to lowest data points (further details in Supplementary Table 3). Figure 2 shows the five types of studied heterosis related to fiber quality and agronomic traits performance, along with SCA among four sets of crosses (284 each). The averaged MP heterosis of each trait showed a positive trend with highest range occurring for BN (251.8) and the lowest for FU (11.3). A considerable range of variation was observed regarding these variables, thus providing sufficient ground for their further GWAS (Supplementary Table 3).

Population Structure
Based on the fact that population structure increases the authenticity of identified SNPs, the number of subgroups that existed in the experimental accessions was critically estimated. The experimental accessions encompassed subgroups on the basis of their different geographic origins. The results from ADMXTURE software analysis of the experimental accessions could be divided into three divergent groups: Group I, II, and III with 86, 64, and 134 individuals, respectively (Figure 3). A genotypic principal component analysis (G-PCA) was performed in EIGENSOFT v. 6.0.1 software; this clearly displayed the top three eigen vectors: PC1, PC2, and PC3 (Figure 3). Both analyses clearly distinguished the accessions into three groups on the basis of which further GWA studies were implemented, with Q = 3.

Genome Variation Based on the SNPs
Evidently these SNPs, which totaled 252,110 in number, were not evenly distributed across entire cotton genome (A t : 151,104 and D t : 101,006). The A t sub-genome housed a greater number of SNPs associated with the fiber quality-related traits, while the D t sub-genome harbored more SNPs for the agronomic traits. The A t 08 chromosome had the most SNPs (20,960), whereas the A t 04 chromosome had the least (4,726) (Supplementary Table 4). All these SNPs were utilized for the GWAS of female parents, amounting to 35,769 high quality SNPs for the four sets of F 1 s, as follows: 18,391 SNPs for F 1 _A, 7458 SNPs for F 1 _C, 23,128 SNPs for F 1 _D and 17,692 SNPs for F 1 _E (Figures 4A-E). On chromosome A t 08, maximum number of associated SNPs were found i.e., 113, while the minimum number of associated SNPs was estimated to be 16, on chromosome D t 04 (Supplementary Table 5).

SNPs' Associations in F 1 Sets and Heterosis Types
A total of 1,192 significant SNPs revealed 2,847 significant associations (−log 10 (p) ≥ 4) with the 10 studied traits of the cotton parents and 4 F 1 sets (Figure 5). The maximum number of associations was discovered for BW (441) and the minimum for FU (185). However, FE ranked highest in terms of number of associated SNPs, with 181, this being lowest for FU with 92 ( Figure 6A and Supplementary Table 6). Collectively, 236, 264, 368 and 268 SNPs were revealed by F 1 _A, F 1 _C, F 1 _D, and F 1 _E sets, respectively. Furthermore, we discovered six SNPs shared in common by these three sets: F 1 _C, F 1 _D, and F 1 _E. Seven common SNPs were found between the F 1 _A and F 1 _C sets, 14 SNPs were commonly shared by F 1 _A and F 1 _D sets, and 10 SNPs were commonly shared between the F 1 _A and F 1 _E sets. Similarly, 10 SNPs were observed in  common between the F 1 _C and F 1 _D sets, four in the F 1 _C and F 1 _E sets, and five in the F 1 _D and F 1 _E sets ( Figure 6B and Supplementary Table 7). However, 476 SNPs/hQTNs were found associated with the five evaluated types of heterosis. Of those, 199, 148, and 95 SNPs/hQTNs of heterosis types were also commonly shared by SCA, F 1 sets and both SCA and F 1 sets, respectively ( Figure 6C and Supplementary Table 8). Likewise, the numbers of significant pleiotropic SNPs related to agronomic and fiber quality traits were tallied to gain insight into pleiotropy. Those details are presented in Figure 7 and Supplementary Table 9.

Mining of Associated Key SNPs
A total of 298 significant (−log 10 (p) > 4) key SNPs/QTNs were identified, based on the highest p-value, presence in multiple environments and function i.e., boxplots and haploblock analysis (Supplementary Table 10). Figure 8 summarizes the results for the simultaneous identification of key SNPs/QTNs in different F 1 sets. Of 298 significant key SNPs/QTNs, 271 heterotic SNPs/hQTNs were related specifically to the heterosis evaluated in this study (Supplementary Table 11). The F 1 _D set contributed the highest number of key SNPs/hQTNs, with 87, followed by F 1 _E set with 77 SNPs/hQTNs, the F 1 _A with 59 SNPs/hQTNs and the F 1 _C with 56 SNPs/hQTNs (Supplementary Table 12). A total of 19 highly stable hQTNs were detected on the basis of their simultaneous contribution by multiple paternal sources and their detection of association signals in multiple environments (Supplementary Table 13). Further investigations revealed that 8, 4, 4, 2, and 1 stable hQTNs were FIGURE 7 | Depicted here are results from the multivariate analysis of pleiotropy. For each associated SNP, the method returns the best-fitting solution of which phenotypes were associated with that SNP. All SNPs with one or more associated phenotypes are shown here. For example, every SNP associated with FE was found to be pleiotropic for other phenotypes. The total number of pleiotropic as well as unique associated SNPs for each trait from these analyses were 181 ( associated with LP, BW, FS, FL, and MIC, respectively. These hQTNs were further validated by functional analysis using genotype-phenotype interaction, SNP-SNP interaction, and gene expression.

Identification of Candidate Genes and Their Annotation
We conducted an exploration of nearby genes (i.e., 100kb flanking window) of 298 QTNs on the basis of genes' annotation with reference to TM-1 genome of G. hirsutum . Overall, 275 genes (A t : 128, D t : 147) were identified for further scrutiny (Supplementary Table 11). Based on the transcriptome analysis, a heatmap of the differential expression of genes in various tissues and growth stages was plotted (Figure 9). These genes were assumed exert effects on related traits; for instance, a gene differentially expressed across fiber during the different DPA would be involved in determining agronomic quality as well as fiber quality. The GO analysis was performed using cotton functional genomics database (see text footnote 3) to annotate the putative candidate genes with biological processes, cellular components, and molecular functions ( Table 1). The GO analysis revealed that those candidate genes with known functions were involved in different catalytic activities, metabolic pathways, and transcription factors. In all, 271 hQTNs were found in close vicinity of 275 candidate genes, including Gh_D02G0165 which had two hQTNs associated with BN and PH; Gh_D12G1396 and Gh_A021302 each harboring two hQTNs and all four of them associated with LP; the rest of the traits were found associated with one hQTN each. The maximum number of associations between genes and traits were detected for LP, at 16, followed by MIC and FUI with 12 and 10 associations, respectively (Table 1). Of 64 putative candidate genes, 54 were considered as potential candidate genes related to the heterosis of the studied traits. Figure 10 shows the GWAS summary of the MIC-associated hQTN, hqMICD09_43629201_C, found on chromosome D09 that was contributed by the male parent A971Bt. This hQTN was found in association with all the types of heterosis as well as trait phenotypes, and it was expressed in cotton's fiber, ovule, and different plant tissues.

DISCUSSION
In conventional plant breeding, a huge number of hybrid crosses are screened to glean genotypes exhibiting ideal performance traits. However, only a few tested hybrid crosses are considered worthwhile for use as hybrid varieties. Once the heterotic loci or actual causative heterotic genes are identified with certainty, the genotypes are more likely to get scrutinized. The genotypes harboring key loci can be identified through whole-genome assembly of parental lines, by narrowing down directly those potential combinations conferring robust performances. This study is a perfect integration of both conventional and modern techniques for hybrid crosses generation which can be done quickly and with greater predictive ability. Globally, an enormous body of systematic surveys on heterosis has since accumulated. Phenological attributes have been investigated for hybrid vigor in many crops, such as grain amaranths (Amaranthus cruentus, Amaranthus hypochondriacus) (Lehmann et al., 1991), maize (Parentoni et al., 2001;Betrán et al., 2003), tomato (Lycopersicon esculentum) (Makesh, 2002), and rice (Oryza sativa) (Verma et al., 2002). The number of genotypes analyzed in those works is comparable to that accessed in our survey on cotton. In our study, we analyzed 284 female lines, four testers and their subsequent hybrid progenies across a wide spectra of environments. Hence, heterosis also ranged widely, with higher values for agronomic traits but lower values for fiber quality traits, due to the presence of many individuals with varying higher and lower phenotypic values than their parents. Because the genotype of a hybrid is obtained after the combination of both its parental genotypes, the overdominance hypothesis postulates the heterozygosity of individual loci is FIGURE 9 | Heat map for expression patterns of the 275 genes nearby significant keys SNPs/QTNs associated with studied agronomic and fiber quality traits. Shaded portion is representing expression >1 while white portion is representing <1. consequential for the superior performance. Our finding of a higher heterosis trend in agronomic traits than fiber quality traits is consistent with previous findings and can prove beneficial for cotton breeding.
Due to the scarcity of available genetic divergence in the founder parents of Cotton World stock, global climatic changes are continuously posing threats to Upland Cotton crops with respect to progress in breeding and their survival. Thus, it is imperative that we explore potential genetic diversity that might have been eroded from cultivated cotton collections. Population structure within the collection of accessions is considered crucial for explaining heterogeneity. Chinese cotton production as well as cotton breeding programs are largely based on the introduction of germplasms since long (Chen and Du, 2006). However, improved cultivated species in the last two decades have population structures with a reasonable extent of heritability. Accessions used as parents were clustered into three distinct groups on the basis of genotypic data. We identified three major subpopulations in our experimental stock of F 1 sets, which formed at earlier stages of the cotton breeding period and were not affected by geographical influences of China.
In the last two decades, GWAS has been extensively utilized by researchers to map different quantitative traits in plants and this achievement is considered a complex milestone (Ingvarsson and Street, 2011). It is thought the power of GWAS usually depends on four distinct factors: availability of rich genetic diversity, credibility in acquisition of phenotypic data, density of markers and use of adequate statistical methods. The current collection of G. hirsutum accessions used as parents exhibited reasonable amounts of phenotypic and genotypic diversity. It offers a highly efficient way to mine heterosis-related loci by high-resolution GWAS in plants. Through GWAS, the relationships among significantly associated hQTNs to fiber quality as well as other agronomic traits, and the annotation of putative genes containing these hQTNs, were examined here in depth.
The identification of hQTNs using five different types of heterosis, trait performance, SCA and four F 1 sets is another noteworthy feature of this study. In this way, the loci controlling heterosis of different traits could be separated from those concerned with trait performance in earlier studies. We distinguished 19 highly stable hQTNs for LP, BW, FL, FS and MIC traits based on their detection from five heterosis types and/or four F 1 sets across multiple differing environments. These stable heterotic loci could be used in the future to assist in Upland Cotton breeding via MAS applications. The remaining significant hQTNs that were found related to other traits could also prove useful in cotton breeding programs. Moreover, a reasonable number of identified SNPs from the F 1 sets and trait phenotypes overlapped with those detected from the heterosis types. These findings revealed that both heterosis and trait performance were not independently controlled by different loci, which agrees with a recent study on upland cotton (Li et al., 2018). Conversely, in rice, (Hua et al., 2003) reported them as being independently controlled by different sets of loci.
Boll weight was identified in relation with Gh_D08G0894, which encodes an ethylene-responsive transcription factor   detected earlier in Arabidopsis (Nole-Wilson et al., 2005) and later in cotton (Qin et al., 2019). Ethylene is considered as a key factor in the growth of cotton fiber and in its elongation (Qin et al., 2007;Qin and Zhu, 2011); its crucial role is evident from the findings that when it occurs in excessive or insufficient amounts this negatively affects FE (Li et al., 2015). Two candidate genes, Gh_D02G1201 and Gh_A10G1233, showed an association with FE-related hQTNs. Both Gh_D02G1201 and Gh_A10G1233 encode FAR-RED elongated proteins which are involved in light responses and FE development (Ju et al., 2019) along with the positive regulation of chlorophyll biosynthesis (Tang et al., 2012;Miao et al., 2017). Gh_A11G1858, which displayed an association with the hQTN related to FL, encodes a serine-threonine kinase SAPK1 protein that may play a role in the signal transduction of a hyperosmotic response markedly influencing the fiber development process (Kobayashi et al., 2004;Magwanga et al., 2018;Li et al., 2019). FU was identified with hQTNs associated with Gh_A01G0481 and Gh_A11G0475; the former encodes a cytochrome P450 protein which may have a role in the maturation or aging of tissues (Ju et al., 2019), while the latter encodes the Ras-related protein RAB11A, it detected previously in fiber development and elongation (Qin et al., 2017). The FUI trait was associated with a hQTN related to Gh_A03G0366, which encodes a scarecrow-like protein that acts as transcription factor and which regulates the development of vegetative and reproductive plant parts (Cenci and Rouard, 2017;Zhang et al., 2018). The LP-associated hQTN was related to Gh_A05G0285, a gene earlier detected in cotton for coding nucleotide binding protein responsible for fiber development (Ju et al., 2019). The MIC-directed association toward hQTNs related to Gh_A03G1505, Gh_A03G0332, Gh_A02G0495, and Gh_D06G1287; these reportedly encode in cotton the production of calcium-dependent proteins related respectively to the kinase family (Si et al., 2020), histones H2B (Qin et al., 2019), phosphatase 2C (Ju et al., 2019;Song et al., 2019;Shahzad et al., 2020), and cytochrome C oxidase . Finally, FS was associated with a hQTN related to the Gh_A01G1348 gene known for encoding axial regulator proteins controlling the biomass vigor of hybrid cotton (Shahzad et al., 2020).
The current study is based on the concept of genomic hybrid breeding, previously utilized in rice (Xu et al., 2014), which exploited the strategy of genome sequencing. The sequence data was then deployed to evaluate F 1 progenies' performance in hybrid breeding. An earlier study on rice revealed the power of SNP-directed yield estimation of F 1 hybrids. In the current study, 298 QTNs were uncovered in association with fiber quality as well as agronomic traits. A set of 271 hQTNs were detected with 19 highly stable heterotic loci in relation with LP, BW, FL, FS, and MIC based on their detection from five evaluated types of heterosis and/or four F 1 hybrid sets across a wide spectrum of environments. These discovered hQTNs and putative candidate genes related to HETEROSIS of quoted traits could be used further deliberately in marker-assisted breeding of forthcoming cotton hybrid breeding programs. Once the genotype-based predictions achieve relatively high levels of accuracy, the labor and time costs of hybrid breeding are greatly reduced. The reported information derived in this study is of practical and scientific significance for both cotton breeders and biologists engaged in elucidating the heterosis mechanism of fiber as it could assist in successful accomplishments in both domains.

DATA AVAILABILITY STATEMENT
The data related to RAD-sequencing have been submitted to NCBI under reference No. PRJNA353524. Other used data are provided in the form of Supplementary Material/Tables. Any other information supporting the conclusions of this article, if needed, will be made available by the authors, at appropriate request without undue reservation.