Identification of novel candidate loci and genes for seed vigor-related traits in upland cotton (Gossypium hirsutum L.) via GWAS

Seed vigor (SV) is a crucial trait determining the quality of crop seeds. Currently, over 80% of China’s cotton-planting area is in Xinjiang Province, where a fully mechanized planting model is adopted, accounting for more than 90% of the total fiber production. Therefore, identifying SV-related loci and genes is crucial for improving cotton yield in Xinjiang. In this study, three seed vigor-related traits, including germination potential, germination rate, and germination index, were investigated across three environments in a panel of 355 diverse accessions based on 2,261,854 high-quality single-nucleotide polymorphisms (SNPs). A total of 26 significant SNPs were detected and divided into six quantitative trait locus regions, including 121 predicted candidate genes. By combining gene expression, gene annotation, and haplotype analysis, two novel candidate genes (Ghir_A09G002730 and Ghir_D03G009280) within qGR-A09-1 and qGI/GP/GR-D03-3 were associated with vigor-related traits, and Ghir_A09G002730 was found to be involved in artificial selection during cotton breeding by population genetic analysis. Thus, understanding the genetic mechanisms underlying seed vigor-related traits in cotton could help increase the efficiency of direct seeding by molecular marker-assisted selection breeding.


Introduction
Upland cotton (Gossypium hirsutum L.) is one of the world's most important cash crops and a major source of natural fibers, accounting for more than 95% of global cotton production (Chen et al., 2007).Lint yield depends largely on the quality of cotton seeds, while seed vigor (SV) is crucial for evaluating seed quality (Sawan, 2016).SV also determines the growth of crops and food safety; for example, rapidly and uniformly germinating seeds can significantly increase the emergence rate in the field and suppress weed growth (He et al., 2019a).In addition, with the widespread application of mechanized direct seeding (DS) in cotton production, cotton seeds with low vigor will make it difficult to sow all seedlings at once, leading to many problems such as uneven seedling age and weak seedling vigor (Qun et al., 2007;Liu et al., 2015).Therefore, the identification of loci and genes related to SV is urgently needed for DS of cotton.
Seed germination is a key factor affecting SV traits in plants.Phytohormones such as gibberellin (GA) and abscisic acid (ABA) have been reported to be essential for the regulation of seed germination (Yamaguchi, 2008;Ryu and Cho, 2015)-for example, GA and ABA synthesis pathway-related genes (GA20ox3, GA3ox1, GA2ox5, ABI3, and ABI5) have a strong effect on seed germination (Yamauchi et al., 2004;Yamaguchi, 2008;Iglesias-Fernandez and Matilla, 2009).When plants are under abiotic stress, ABA in the plant will increase rapidly, and high levels of ABA will close the stomata and activate complex signaling pathways mediated by kinase/phosphatase regulation (Kim et al., 2010).Low levels of reactive oxygen species (ROS) act as signaling particles to promote dormancy release and trigger seed germination (Li et al., 2022)-for example, OsCDP3.10 promotes the accumulation of H 2 O 2 during the early stage of seed germination by increasing the amino acid content (Peng et al., 2022).The relationship between seed germination and the ROS scavenging system has been validated in many crops and other plants, such as Arabidopsis (Leymarie et al., 2012), wheat (Ishibashi et al., 2008), and rice (Ye et al., 2012).Furthermore, crosstalks between ABA and ROS signaling pathways have also been reported in plants.In rice, qSE3 significantly increased ABA biosynthesis and activated ABA signaling responses, resulting in decreased H 2 O 2 levels in germinating seeds under salinity stress (He et al., 2019b).
SV-related traits are quantitative traits controlled by both genetic and environmental factors (Li W. et al., 2021).These traits include germination rate (GR), germination percentage (GP), germination index (GI), vigor index (VI), seedling shoot length (SL), and shoot fresh weight (FW) (Dai et al., 2022;Si et al., 2022).In recent years, linkage mapping has been widely used to identify SV-related quantitative trait loci (QTLs) in crops, and multiple QTLs have been cloned (Fujino et al., 2004;Fujino et al., 2008;He et al., 2019b;Jiang et al., 2020;Veisi et al., 2022).By using BC 1 F 5 populations derived from a rice intraspecific cross ('WTR-1' × 'Y134'), 28 SVrelated QTLs were identified by a SNP genotyping array, and one major QTL (q1stGC 11.2 ) explaining 19.9% of the phenotypic variation (PV) was flanked by SNP_11_27994133 on chromosome 11 (Dimaano et al., 2020).In wheat, a total of 49 QTLs were detected on 12 chromosomes, including seven SV candidate genes involved in the processes of cell division during germination of aged seeds, carbohydrate and lipid metabolism, and transcription (Shi et al., 2020).Wang L. et al. (2022) constructed a linkage map based on specific-locus-amplified fragment sequencing (SLAF-seq) SNP markers in melon; 2020/2021-qsg5.1 was significant in both environments, and MELO3C031219.2, in this region, exhibited a significant expression difference between the parental lines during multiple germination stages (Wang L. et al., 2022).Under low temperature conditions, three QTLs (qLTG-3-1, qLTG3-2, and qLTG-4) related to GR were identified by 122 backcross inbred lines, and the phenotypic variation explained (PVE) by qLTG-3-1 was 35.0%(Fujino et al., 2004).Subsequently, qLTG-3-1 was cloned, which was closely related to tissue vacuolation, by covering the embryo (Fujino et al., 2008).Furthermore, the genome-wide association study (GWAS) approach is a method in which germplasm resources are used to study the genetic structure of target traits.Compared to traditional QTL mapping, GWAS can provide higher resolution by using ancestral recombination events and has been successfully applied to identify significant SNP loci and potential candidate genes associated with important agronomic traits in major crops (Zhu et al., 2008;Shikha et al., 2021)-for example, SV-related QTLs were identified in 346 rice accessions using GWAS, while 51 significant SNPs were detected for SL, GR, and FW (Dai et al., 2022).In addition, a previous study involving 187 rice accessions identified the candidate gene OsSAP16; the loss of OsSAP16 function reduced the rice seed germination rate (Wang et al., 2018).Recently, a candidate gene (Gh_A09G1509) responsible for seed germination was detected through a GWAS panel in upland cotton by using whole-genome resequencing (Si et al., 2022).These results suggest that genome-wide association analysis is an effective method for identifying genes associated with seed germination.
To date, many quantitative traits have been reported in cotton, such as fiber quality traits (Su et al., 2016b;Zhang et al., 2019), early maturity traits (Li et al., 2017;Li L. et al., 2021), and yield component traits (Su et al., 2016a;Feng et al., 2022).However, SV-related traits in cotton have received little attention, and most research have focused on seed germination in relation to stress tolerance (Yuan et al., 2019;Chen L. et al., 2020;Gu et al., 2021;Guo et al., 2022).Few candidate genes for cotton SV-related traits have been identified (Si et al., 2022), and the mechanism of seed germination needs further study.In this study, GR, GP, and GI were determined in a natural population of upland cotton in three environments, and whole-genome resequencing was used to achieve deep coverage and obtain high-quality SNP markers.In addition, six stable QTLs and two novel candidate genes (Ghir_A09G002730 and Ghir_D03G009280) for SV-related traits were further identified by a GWAS panel, laying the foundation for understanding the genetic mechanism underlying SV and providing potential information for applying these potential elite loci for markerassisted selection (MAS) in cotton breeding.

GWAS population and field experiments
The 355 upland cotton germplasm resources collected by laboratories worldwide represent a natural population.Previous studies focused on early maturity (Li L. et al., 2021), fiber quality (Su et al., 2016b), fiber yield (Su et al., 2016a;Feng et al., 2022), and plant architecture component traits based on abundant phenotypic variations in this population (Su et al., 2018).These upland cotton varieties are from different countries and represent accessions resulting from more than 100 years of global upland cotton breeding.Seeds of the GWAS population used for phenotyping SV-related traits were collected from three environments, including Huanggang in Hubei Province (30°57′ N, 114°92′ E) in 2021 (E1: Huanggang-2021) and Sanya in Hainan Province (18°36′ N, 109°1 7′ E) in two consecutive years (2021 and 2022) (E2: Sanya-2021 and E3: Sanya-2022).The field experiments in Sanya and Huanggang were conducted following a randomized complete block design with two and three replications, respectively.

Phenotyping for SV-related traits and statistical analysis
The phenotyping of SV-related traits was carried out by the sandponic method based on previously described methods (Si et al., 2022).Cotton seeds collected from the field were ginned, and cotton fuzz was removed by concentrated sulfuric acid.Then, all seeds were sun-dried for 2 days to break dormancy uniformly.A total of 150 plump seeds with uniform size and full grain were selected, disinfected with 15% sodium hypochlorite for 10 min, and then washed clean with distilled water.Then, each line was evenly planted in a plastic sand box containing 800 g of dry quartz sand with a size of 13 cm × 19 cm × 12 cm.Subsequently, the seeds were covered with 250 g of dry quartz sand, and 200 mL of distilled water was added.The number of germinated seeds was counted each day until the seventh day.All experiments were conducted in a phytotron with 16 h of light (25°C) and 8 h of darkness (18°C).Three biological replicates were included for each accession, and 50 seeds were used for each replicate.Moreover, three SV-related traits (GR, GP, and GI) were selected for measurement.The full name, abbreviation, and measurement method of each trait are listed in Table 1 as described by Yuan et al. (2019).The statistical analysis of the maximum value, minimum value, average value, etc., was performed using R software (version: 4.2.2).

GWAS and genetic diversity analysis
Genome-wide association analysis was performed by combining 2,262,367 high-quality SNPs with the phenotype data of 355 upland cotton accessions collected in three environments for SV-related traits using linear mixed models in GEMMA (version: 0.98.3) and executed by vcf2gwas software (version: 0.8.7) (Zhou and Stephens, 2012;Vogt et al., 2022).P <1 × 10 -6 was used as the threshold to detect significant SNP loci.Additionally, the PVE by each marker was calculated as previously reported (Feng et al., 2022).The nucleotide diversity (p)a value was calculated using VCFtools based on the release years (before the 1950s, 1950s-1970s, 1980s-1990s, and 2000s-2020s) and geographical distribution (early maturity region: NSER, Yellow River region: YRR, Yangtze River region: YZRR, and Northwest Inland region: NIR) of the 355 accessions.The packages 'CMplot' (https://github.com/YinLiLin/CMplot), 'LDheatmap' (Shin et al., 2006), and 'ggplot2' (Wickham, 2011) in R software were used to generate Manhattan plots and for linkage disequilibrium (LD) block analysis and haplotype analysis.

Candidate gene identification and expression analysis
Based on the 'TM-1' reference genome (HAU_v1.1)(Wang et al., 2019), the genes in the interval located 200 kb upstream and downstream of the significant SNPs were defined as candidate genes.The protein sequences of the candidate genes were obtained from Cottongene (https://www.cottongen.org/).Then, local BLAST software was used to compare the protein sequence of the candidate gene with the Arabidopsis protein database (https:// www.arabidopsis.org)to obtain the homologous sequence, and the criterion was set to less than E -60 (Johnson et al., 2008).The expression patterns of SV candidate genes in upland cotton were determined by RNA-seq and quantitative reverse-transcription PCR (qRT-PCR) analysis.RNA isolation method was performed as described by Feng et al. (2022).GhUBQ7 was used as an internal control.Quantitative analysis method was performed using a Roche real-time qPCR system (Light Cycler 480 II) and SYBR with three biological repeats.The public RNA-seq data (PRJNA248163) including SRR1695160, SRR1695161, and SRR1695162 were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/bioproject/).The Illumina Hiseq2000 platform was used to perform RNA sequencing on 'TM1' seeds soaked in water for 0, 5, and 10 h, and the paired-end clean reads length was more than 100 bp.The gene expression values were normalized by the average expression levels (log2) based on transcripts per million values.The clustered heat map was drawn by the R package 'pheatmap' (Kolde, 2012).

Characterization and distribution of SNPs in the upland cotton genome
Resequencing of the natural population libraries by the Illumina HiSeq 4000 platform with 150 bp paired-end reads, as described in previous reports (Li L. et al., 2021), yielded approximately 65,013 million reads in total for the 355 cotton genotypes.Approximately 88.3% of the total bases were successfully mapped to the cotton reference genome, and the statistical sequencing depth corresponded to 11.7-fold in the 355 upland cotton accessions.A total of 2,262,367 SNPs distributed across the cotton genome with a MAF >0.05, and missing rate of resequencing data of less than 20% was used for the GWAS of the 355 cotton germplasm accessions, of which the At and Dt subgenomes contained 1,404,637 and 857,730 SNPs, resulting in an average SNP density of 993.44 and 1045.91 SNP/Mb, respectively (Table 2; Figure 1).The percentage of the SNPs in each chromosome varied from 1.4% on chromosome D04 to 11.4% on chromosome A08 (Figure 1).Most of the SNPs were located in intergenic regions (84.38%), whereas the exonic and intronic genome regions contained only 0.89% and 3.03% of SNPs, respectively (Supplementary Table S1).In addition, SNPs in the coding regions (coding sequences, CDSs) included 33.26% synonymous mutations and 64.13% nonsynonymous mutations.

PV of SV-related traits
The three SV-related traits (GI, GP, and GR) of natural population accessions were measured in three environments.The values followed a normal distribution for the GI and GP but showed a skewed distribution for GR based on Shapiro-Wilk tests (Supplementary Table S2).The frequency histograms of SVrelated traits are shown in Figures 2A-I.The lowest average GI was 55.23 in the E1 environment, and the highest average GI was 58.92 in the E2 environment, with a coefficient of variation (CV) ranging from 6.52% to 12.02% (Supplementary Table S2).For GP, the E1 environment had the lowest average value of 70.92%, while the E2 environment had the highest average value of 79.61%; the CV in the E1 environment (11.67%) was higher than that in the E2 environment (9.21%) and the E3 environment (9.37%) (Supplementary Table S2).For GR, the lowest average value was 87.37% in the E1 environment, and the highest average value was 93.25% in the E2 environment, with a CV ranging from 3.71% to 10.48% (Supplementary Table S2).Two-way analysis of variance (ANOVA) showed that genotype (G) and the genotype-byenvironment interaction (G × E) had significant effects on the GI, GP, and GR (P < 0.001) (Supplementary Table S3).Furthermore, the heritability of these three SV-related traits ranged from 74.23% (GR) to 81.75% (GP), whereas that of GI was 76.03% (Supplementary Table S3).These results suggested that SV-related traits have extensive PV in the GWAS panel, which is suitable for further GWAS.

Identification of a candidate gene for GR on chromosome A09
In this study, a novel QTL, qGR-A09-1, exhibited a significant SNP cluster (rsA09_7745467, rsA09_7791621, rsA09_7878527, rsA09_7908017, rsA09_7954329, rsA09_7954353, and rsA09_7962794) occupying a physical region of 0.2 Mb on chromosome A09 (Figure 4A).Meanwhile, 22 genes were annotated in this QTL region based on the G. hirsutum reference genome (Wang et al., 2019), except for Ghir_A09G002720 and Ghir_A09G002760, which did not have annotation information (Supplementary Table S5).We further conducted LD analysis on the significant SNP rsA09_7962794, and LD blocks were found in this region (Figure 4A).In this QTL interval, rsA09_7962794 on chromosome A09 showed a strong association with GR, with 7.95% of the PVE downstream of Ghir_A09G002730 (Table 3).rsA09_7962794 had two haplotypes, GG and AA, which resulted in the accessions carrying the AA genotype having a significantly higher GR than those carrying the GG haplotype in three environments (P < 0.01) (Figure 4B).In addition, to gain a further understanding of the genetic characteristics of rsA09_7962794 in relation to geographic distribution, the 355 upland cotton accessions were divided into four groups: NIR, YZRR, YRR, and NSER.Interestingly, YRR and NSER showed an extraordinarily low frequency of the nonfavorable haplotype (GG), while the accessions obtained from YZRR and NIR had a relatively high frequency of the favorable haplotype (AA) (>75%) (Figure 4C).Furthermore, the genetic diversity of Ghir_A09G002730 decreased following the breeding period.Cotton accessions released before the 1980s showed greater diversity than accessions bred from the 1980s Genome-wide association study results for seed vigor-related traits.(A-C) Manhattan plots of GI-BLUP, GP-BLUP, and GR-BLUP for singlenucleotide polymorphism (SNP) markers, respectively.Significant SNP markers are distinguished by black lines.(D-F) QQ plots for GI-BLUP, GP-BLUP, and GR-BLUP, respectively.to the 2000s, while accessions bred after the 2000s showed the lowest diversity (Figure 4D).Specifically, Ghir_A09G002730 belongs to the pentatricopeptide repeat (PPR) superfamily protein family and has higher expression levels during the seed germination stage from 0 to 10 h than other genes (Figure 4E).The qRT-PCR analysis also showed that Ghir_A09G002730 had higher expression levels in the accessions ('Liaomian27' and 'Xinluzhong35') carrying the AA allele than in accessions ('PB12-1-8' and 'Xiazao2') with GG allele during the seed germination stage (Supplementary Figure S4).

Identification of a candidate gene for GR on chromosome D03
As mentioned above, another distinct SNP enrichment QTL region, qGI/GP/GR-D03-3, was detected for the GI, GP, and GR across multiple environments, which could explain the relatively high PVE of 6.65-8.43%,indicating that a major gene in this genomic interval may improve seed germination (Table 3).Interestingly, 12 associated SNPs were located within the most significant haplotype block, which was almost 920 kb long and contained five haplotypes (Figures 5A, B).A haplotype analysis revealed that qGI/GP/GR-D03-3 had two major haplotypes according to SNP location.Comparatively, Hap1 had a higher GP than Hap1 (Figures 5C, D).In total, 46 candidate genes contained in the qGI/GP/GR-D03-3 region on chromosome D03 were identified.Among them, Ghir_D03G009280 was annotated as auxin response factor 9 (ARF9) in Arabidopsis (Supplementary Table S6), and its homologs played a crucial role in seed dormancy.The RNA-seq and qRT-PCR assays also showed that Ghir_D03G009280 had higher expression levels during the seed germination stage, suggesting a positive regulatory effect (Figure 5E; Supplementary Figure S5).

Discussion
The importance of seed vigor for field production SV is an indispensable indicator of seed quality, which directly affects the rapid and uniform germination of seeds and the robust growth of seedlings and affects the tolerance of plants to abiotic stress in the early stage of seedling growth (Qun et al., 2007;Fujino et al., 2008).In recent years, mechanical DS of cotton has been widely used due to its cost-saving and labor-saving advantages, leading to rapid and uniform seed germination becoming necessary conditions for high yield and mechanization in the cotton industry.However, seeds with low SV make it difficult for mechanical DS to achieve full seeding, which leads to problems such as subsequent filling of the gaps with seedlings and final singling of seedlings (Xie et al., 2014)-for example, Xinjiang Province is the major cottongrowing area in China and experiences serious saline-alkali stress (He et al., 2023).A high SV of cotton varieties will improve seed germination in the field and thus increase the yield.In addition, cotton breeding without plastic film in Xinjiang Province to eliminate "white pollution" has become popular.The germination rate and seedling emergence rate of seeds have higher requirements for cotton without plastic film (CWPF).CWPF needs to quickly establish robust seedlings after seed germination to resist the invasion of diseases, insect pests, adverse environments, and other factors in the field.Importantly, SV is the result of genetic and environmental factors and is thus often difficult to effectively select in conventional breeding (Dai et al., 2022).This study utilized highthroughput sequencing to generate widely distributed SNP markers that cover the whole genome (Figure 1), and over 200,000,000 highquality SNPs were detected in a diverse set of 355 cotton accessions.Combining phenotype data from multiple environments for GWAS analysis can be used to effectively identify genetic loci and candidate genes that improve SV in upland cotton, providing an effective way to improve cotton yield in Xinjiang when using the MAS method.Li et al. 10.3389/fpls.2023.1254365Frontiers in Plant Science frontiersin.org

Comprehensive analysis of SV-related traits at multiple environments
To ensure the accuracy of the GWAS results, phenotypic identification in multiple environments was conducted with at least three replicates per environment.The three SV-related traits (GI, GP, and GR) were measured for seeds collected from three locations: E1, E2, and E3.Among them, GR and GP did not show an absolute normal distribution, which was also found in previous studies (Dai et al., 2022;Si et al., 2022), indicating a complex genetic basis for these SV-related traits.Through phenotypic correlation analysis, it was found that there were significant positive correlations between the three traits.The GI showed a strong correlation with GR and GP (0.71 and 0.76, respectively) (Figure 2J).The highest GI was accompanied by the highest GP and GR, which is consistent with previous findings (Si et al., 2022).Furthermore, according to the measurement results for each trait, the CV of SV-related traits in upland cotton is affected by the environment (Supplementary Table S2), resulting in different variations in the seeds of each accession harvested in different planting locations and years-for example, the CV of the GI and GR in E1 showed a larger range of variation than that in E2 and E3.Previous studies have shown that the environment in the planting area has a great influence on the growth and development of seeds (Fenner, 1992).It is speculated that the E2 and E3 (Sanya City, Hainan Province) environments with tropical climates are more suitable environments for seed growth, and the performance of the seeds may be relatively stable.In contrast, the E1 environment (Huanggang City, Hubei Province) has high precipitation and temperature during the seed maturation period, which can affect the success of pollination.

Candidate genes related to SV
In the past two decades, GWAS has become a powerful and widely used tool for analyzing the genetic mechanisms underlying complex quantitative traits in crops (Tibbs Cortes et al., 2021).At present, most research on SV mainly focuses on the mechanism under stress in upland cotton (Sun et al., 2018;Yuan et al., 2019;Zheng et al., 2021), while genetic analysis of SV-related traits associated with normal seed germination is less common (Si et al., 2022).In this study, a GWAS panel was used to measure three SVrelated traits of seeds harvested in three environments.In total, six significant QTLs were stably identified on three different cotton chromosomes (Table 3), including 26 SNPs.Numerous studies have reported that several pathways are involved in regulating SV in plants, such as phytohormone signaling (GA, ABA, and auxin), amino acid metabolism, and the reactive oxygen pathway, which play a crucial role in the seed germination process and have a significant effect on the molecular mechanisms related to SV (Reed et al., 2022).It has been reported that high concentrations of ABA promote dormancy and inhibit seed germination, while high concentrations of GA promote seed germination by reversing dormancy, leading to an endogenous balance of the ABA/GA ratio but not the absolute hormone contents (Finch-Savage and Leubner-Metzger, 2006;Chen H. et al., 2020).Ghir_A09G002650 was annotated on chromosome A09, belonging to the GA-regulated family of proteins and encoding a protein containing the GASA domain, which is most closely related to the known homolog GASA14 in Arabidopsis.GASA14 regulates the increase in plant growth through GA induction and DELLAdependent signal transduction, which could increase resistance to abiotic stress by reducing the accumulation of ROS (Sun et al., 2013).Thus, it is speculated that Ghir_A09G002650 has the potential to improve the SV of cotton under stress.MYB-type and bHLH-type transcription factors have been reported to be involved in the regulation of seed germination signaling in plants (Penfield et al., 2005;Reyes and Chua, 2007;Kim et al., 2015;Wang X. et al., 2022;Xu et al., 2022).Specifically, Ghir_D03G006550 is in the qGI/GR-D03-2 region and is homologous to MYB52.It has been previously shown that its shared common targets with ERF4 regulate the development of the seed coat in Arabidopsis (Ding et al., 2021).Ghir_D03G010510 encoded bHLH-type family proteins in the QTL region of qGI/GP/ GR-D03-4, sharing 35.52% sequence identity with the PIF8 protein in Arabidopsis, which binds to promoter regions of AtPIF6.The expression level of AtPIF6 during seed development plays a crucial role in establishing primary seed dormancy levels (Peters et al., 2010).
Notably, Ghir_A09G002730 and Ghir_D03G009280 were detected in two distinct enriched regions located on chromosome A09 (qGR-A09-1) and chromosome D03 (qGI/GP/GR-D03-3) (Figure 3).Interestingly, Ghir_A09G002730, within the strong-LD region at 21.9 kb upstream of rsA09_7962794 and highly expressed during the development of seed germination (Figures 4A, E), encodes a PPR superfamily protein in Arabidopsis.SOAR1 belongs to the PPR protein family and acts as a core negative regulator downstream of ABAR and upstream of ABI5, participating in ABA signaling regulation of seed germination and seedling growth processes (Ma et al., 2020).We also discovered that cotton accessions carrying rsA09_7962794-A with a higher GR had a much higher allele frequency for Ghir_A09G002730 in YZRR and NIR than in YRR and NSER (Figures 4B, C).It is possible that the planting mode of seedling raising and transplanting in YZRR and mechanized planting in the NIR all employed single-seed sowing, which increased the selection frequency of rsA09_7962794-A.In addition, we compared the genetic diversity of the region on chromosome A09 containing Ghir_A09G002730 in different breeding periods, and it was found that cultivars bred after the 2000s had lower genetic diversity than cultivars from other stages, implying that with the continuous increase in cotton SV during the breeding process, this gene was associated with artificial selection (Figure 4D).Therefore, it is reasonable to postulate that Ghir_A09G002730 is a new candidate gene influencing SV in cotton.Ghir_D03G009280 caught our attention based on the gene annotation of cotton.This gene encodes an auxin response factor.Recent studies have shown that ARF16 interacts with ABI5 and positively regulates the ABA response during seed germination (Mei et al., 2023).Furthermore, Ghir D03G009280, tightly linked with haplotype Hap1, showed a significant association with GP (Figure 5C), and materials carrying the Hap1 haplotype had longer roots (Figure 5D).The RNA-seq analysis showed a high expression level of this gene during seed germination (Figure 5E).From the above-mentioned results, we inferred that Ghir_A09G002730 and Ghir_D03G009280 were two major candidate genes that may play an important role in cotton SV.

FIGURE 1
FIGURE 1 Single-nucleotide polymorphism distributions in the upland cotton genome.The number of SNPs within a 10-Mb window.A01-A13 and D01-D13 on the Y axis are the numbers of the 26 chromosomes.The X axis represents chromosome length (Mb).
FIGURE 4 Variation analysis of the germination rate (GR)-associated gene Ghir_A09G002730 in the candidate region.(A) Local Manhattan plots for GR-related genes on chromosome A09 and linkage disequilibrium heat map for the candidate region within 21.9 kb.(B) Box plots for GR of the two haplotypes mentioned above (**P < 0.01).(C) Differentiation of the genetic diversity distribution of Ghir_A09G002730 in four geographic areas (NIR, Northwest Inland region; YZRR, Yangtze River region; YRR, Yellow River region; and NSER, Northern Specific Early-Maturity region).(D) Gene structure diversity of Ghir_A09G002730 across three breeding stages.(E) Heat map of candidate gene expression patterns in the seed germination stage (0, 5, and 10 h) on chromosome A09.
FIGURE 5Variation analysis of seed vigor (SV)-related traits associated with qGI/GP/GR-D03-3 in the candidate region.(A) Local Manhattan plots for SVrelated genes on chromosome D03 from 30 to 33 Mb.(B) Top two haplotypes of qGI/GP/GR-D03-3 in 355 upland cotton accessions.(C) Comparison of germination potential between accessions containing Hap1, Hap2, Hap3, Hap4, and Hap5.Letters on the violin plot indicate significant differences according to one-way ANOVA (LSD test; P < 0.05).(D) Comparison of seed germination status for 3 days between Hap1 and Hap2.(E) Heat map of candidate gene expression patterns in the seed germination stage (0, 5, and 10 h) on chromosome D03.

TABLE 1
Method of measurement for seed vigor-related traits.
Germination potential GP The number of germinated seeds in the early stage of germination (3 days)/the number of seeds tested Germination rate GR The number of germinated seeds on the 7th day after planting/the number of tested seeds Germination index GI GI = ∑(Gt/Dt), where Gt represents the number of germinated seeds per day and Dt represents the number of days corresponding to Gt Li et al. 10.3389/fpls.2023.1254365Frontiers in Plant Science frontiersin.org

TABLE 2
Distribution and frequency of single-nucleotide polymorphisms (SNPs) identified using the resequencing approach in upland cotton.

TABLE 3
Significant quantitative trait locus (QTLs) associated with seed vigor-related traits.