Inheritance, QTLs, and Candidate Genes of Lint Percentage in Upland Cotton

Cotton (Gossypium spp.) is an important natural fiber plant. Lint percentage (LP) is one of the most important determinants of cotton yield and is a typical quantitative trait with high variation and heritability. Many cotton LP genetic linkages and association maps have been reported. This work summarizes the inheritance, quantitative trait loci (QTLs), and candidate genes of LP to facilitate LP genetic study and molecular breeding. More than 1439 QTLs controlling LP have been reported. Excluding replicate QTLs, 417 unique QTLs have been identified on 26 chromosomes, including 243 QTLs identified at LOD >3. More than 60 are stable, major effective QTLs that can be used in marker-assisted selection (MAS). More than 90 candidate genes for LP have been reported. These genes encode MYB, HOX, NET, and other proteins, and most are preferentially expressed during fiber initiation and elongation. A putative molecular regulatory model of LP was constructed and provides the foundation for the genetic study and molecular breeding of LP.

Cotton breeding is performed to achieve simultaneous improvement of cotton yield and fiber quality . However, LP is positively or negatively correlated with other yield and fiber quality components Ulloa and Meredith, 2000;Wang et al., 2007;Imran et al., 2012;Tang and Xiao, 2013;Wang et al., 2014;Carvalho et al., 2015;Chen et al., 2019;Zhu et al., 2021). Selecting for LP alone to increase lint yield reduced boll and seed mass, but also fiber length and fiber strength . The negative correlation between these traits and the reduction in a cotton cultivar due to the introduction of deleterious genes along with the beneficial ones make it challenging to synchronously improve fiber quality and lint yield. Thus, effective cotton breeding requires characterization of the inheritance, quantitative trait loci (QTLs), candidate genes, molecular regulatory networks of LP, and other yield and fiber quality components. By using big segregating populations and whole genome selection techniques, it may be possible to break tight linkages of QTLs with the opposite genetic effects.
Genetic study, germplasm resource enhancement, and breeding of upland cotton are very important for efforts to improve cotton production. Abundant cotton germplasm resources with high genetic diversity can be used to improve upland cotton, and LP can be used as an indicator of genetic diversity. There have been recent advances in the molecular and genetic basis of cotton LP, however, the genetic basis of cotton LP remains incompletely understood. To provide detailed genetic information related to cotton LP for future cotton lint yield studies and breeding, we comprehensively summarized the inheritance, QTLs, and related candidate genes of cotton LP. We reanalyzed available data and mined the QTLs and candidate LP genes, screened the stable and most effective QTLs for use in marker-assisted selection (MAS), identified the key biological pathways regulating LP, and outlined a molecular regulatory network for cotton LP development. These results can guide cotton yield improvement.

INHERITANCE OF COTTON LINT PERCENTAGE LP is a Typical Quantitative Trait Controlled by Major Genes Plus Polygenes
Cotton yield per unit area includes four major components, plant number per unit area, boll number per plant, single boll weight, and LP . LP (also known as ginning out-turn) is the ratio of lint weight to seed cotton weight [or lint index/(lint index + seed index)] as a percentage. LP is determined by fiber number, fiber length, fiber diameter, and the weight of seed on which the fibers are grown. LP is a critical criterion for cotton cultivars, and it influences the purchase price of the seed cotton. Historic data of Huang-Huai Regional Trials in China from 1973to 1996 reported an increase in yield from 1200 kg·hm −2 -1425 kg·hm −2 that was explained by improvements of LP, which increased 5.5% . Obviously, LP is a key selection trait for cotton yield breeding.
The phenotypes of complex traits are often determined by the combined actions of multiple genes and environmental factors (Mackay et al., 2009). Cotton yield and its components, BN, BW, LP, SI, and LI, are inherited quantitatively and are genetically related to each other (Wang et al., 2007;Qin et al., 2009). Of these components, the heritability of LP is the highest (49%; Badigannavar and Myers, 2015), and LP correlates with seed cotton yield, lint yield and other yield components at different strengths Chen et al., 2019;Zhu et al., 2021).
LP is a typical quantitative trait of cotton cultivars, except in fuzzless and lint-less mutants where it is considered a qualitative trait .
We can choose appropriate breeding methods to improve cotton yield when there are good genetic models of yield-related traits. Studies of the inheritance of cotton LP began in the 1970s in China. First, simple crosses were performed between high and low LP cultivars, and then the LP of the offspring was evaluated. Overall, the inheritance of LP was not clear. In the 1990s, the model of major genes plus polygenes became more widely studied (Mo, 1993), allowing the later establishment of a mixed genetic model of major genes plus polygenes. The model assumed that the studied traits were diploid inherited, and lacked female effects, interaction, and linkage among major genes and polygenes (Gai et al., 2003). A genetic model to simultaneously analyze genetic effects of nuclear, cytoplasm, and nuclear-cytoplasmic interaction (NCI) and genotype by environment (GE) interactions for quantitative traits was subsequently developed (Han et al., 2007). These methods have been applied in genetic studies on cotton yield related traits, including LP (Du et al., 1999;Yuan et al., 2002;Zeng and Wu, 2012;Xia et al., 2014;Li B et al., 2016;Gong et al., 2019).
Lint yield and yield components are mainly controlled by genetic effects, i.e., additive and dominant variance account for ≥66% of the total phenotypic variance (Zeng and Wu, 2012). LP has the highest additive variance as a proportion of the total phenotypic variance (Zeng and Wu, 2012). LP is controlled mainly by genetic effects and is less affected by GE interaction effects (Yuan et al., 2002;Han et al., 2007). The variances of additive effect (VA), cytoplasmic effect (VC), and the dominant nuclear-cytoplasmic interaction effect (DC) interaction variance (VDC) are significant for LP (Han et al., 2007). LP shows great genetic variability. Other quality parameters have high heritability values (up to 90%), while LP exhibits moderate heritability (49%) (Badigannavar and Myers, 2015).
Segregation analysis of the mixed genetic model of major genes plus polygenes was used to identify the major genes for cotton yield-related traits using six generations of P 1 , P 2 , F 1 , B 1 , B 2 , and F 2 generated from a cross of Baimian1 × TM-1 (TexasMarker-1) . One major gene of LP was detected, with positive complete dominance, and an additive effect value equal to the dominant effect value (0.512) (Xia et al., 2014). A major QTL, qLP-3b(F 2 )/qLP-3(F 2:3 ), of LP was detected in both F 2 and F 2:3 using QTL mapping, with additive effect values (1.86 and 1.49, respectively) slightly smaller than the dominant effect values (1.95 and 1.51, respectively), thus displaying a dominant effect (Xia et al., 2014).
We studied LP using a female parent sGK Zhong156, male parent 901-001, and their 250 introgression lines (ILs) and found that LP is controlled by two to four pairs of major genes or two pairs of major genes plus polygenes. The heritability of major genes ranged from 1.26 to 83.13%, the heritability of polygenes ranged from 27.35 to 90.83%, and the heritability of major genes plus polygenes ranged from 92.00 to 99.35% .

LP Correlates With Other Components of Yield and Fiber Quality
Many agronomic traits are correlated due to linkage, pleiotropy, or the correlated traits are components of a more complex variable (Ulloa et al., 2005). LP is positively or negatively correlated with other components. Generally, LP has significantly negative correlation with seeds per boll, seed index, seed cotton yield, fiber length (FL), and fiber strength; and is positively correlated with boll number per plant, lint per seed, lint index, and lint yield Ulloa and Meredith, 2000;Wang et al., 2007;Imran et al., 2012;Tang and Xiao, 2013;Carvalho et al., 2015;Zhu et al., 2021).

Germplasm Resources and Enhancement of High LP Cotton
Excellent germplasm resources are required for the successful breeding of cotton cultivars (Zeng and Wu, 2012). Based on introducing foreign cotton cultivars, three cultivar substitutions happened in China from the 1920s, first, Jinzimian, Tuozimian, and Longzimian from the United States substituted some Asiatic cotton cultivars planted in north China in 1920s; second, Sizimian4, Dezimian531, and Daizimian14 substituted the Asiatic cotton cultivars planted in a large area of China in the 1930s-1940s; third, Daizimian15, Sizimian2B, and Sizimian5A substituted the Asiatic cotton cultivars planted for a long period, and the degenerated foreign upland cultivars in China in the 1950s (Yin et al., 2002).
Chinese researchers traveled to Mexico, the United States, France, Australia, and other countries to explore and collect cotton germplasm resources from the 1970s to 1980s, with more than 4800 cultivars, lines, wild species, semi-wild species, and special lines with genetic markers collected by 1984 (Gao et al., 2021). A total of 8868 accessions were collected and preserved by 2020, including 7362 G. hirsutum, 350 G. hirsutum wild races, 633 G. barbadence, 433 G. arbareum, 18 G. herbaceous, and 32 wild species, making the cotton collection in China the fourth largest in the world (Du et al., 2012). Many of these cotton resources have been characterized. For example, 28 cotton lines with high LP (43-47%) were screened by Li and Wang (1992), and 8 high LP (> 42%) lines were screened from former Soviet Union germplasm resources by Liu et al. (1998). Cultivars introduced from the former Soviet Union (108V, C1470, 611B, and KK1543), the United States, and Africa (DPL15, STV2B, and UGDM) were applied for modern cultivar improvement in Xinjiang, China (Han et al., 2020). Using foreign germplasm resources, many high LP (> 43%) cultivars/ lines have been bred in China.
In addition to traditional methods of plant cultivar improvement, such as intraspecific crosses and physical and chemical mutagenesis (Gai, 1997;Hulse-Kemp et al., 2015), the introduction of elite alien genes or chromosome fragments by interspecific crosses is a main approach for cotton germplasm enhancement, especially for LP. Upland cotton has a high lint yield but undesirable fiber quality, sea island cotton has superior fiber quality but lower lint yield. Therefore, significant efforts have been made worldwide to combine the desirable lint yield of G. hirsutum with the superior fiber quality of G. barbadense by interspecific hybridization. Many germplasm resources with improved fiber quality containing chromosomal introgressions from G. barbadense have been constructed (Saha et al., 2006;Saha et al., 2010;Zhu et al., 2010;Wang et al., 2011;Zhu et al., 2017;Chen et al., 2018;Li et al., 2019) (Supplementary Table S1). Compared to G. barbadense, the superior introgression lines (IL) have fiber quality more similar to that of G. barbadense and are usually more convenient to use for upland cotton improvement .
Artificial selection efforts have reduced genetic diversity but significantly increased LP. A cotton variation map of an intact breeding pedigree including 7 elite and 19 backbone parents was constructed . The 26 pedigree accessions were subjected to strong artificial selection during domestication and have reduced genetic diversity but stronger linkage disequilibrium and higher extents of selective sweeps. The elite parents acquired significantly improved agronomic traits, with an especially pronounced increase in LP. Although there is a negative correlation between LP and FL in long fiber materials (Carvalho et al., 2015), two cultivars and one advanced line have the highest LP and FL as well as high productivity: FM 993, FM 910, and CNPA MT 04 2088, suggesting these materials can be used to increase LP and FL values (Santos et al., 2017).
A set of chromosome segment introgression lines (CSILs) were developed by crossing the recipient parent TM-1 and the donor parent Hai7124. TM-1 is a genetic G. hirsutum standard accession developed in the United States, and Hai7124 is an early, high-yielding G. barbadense accession developed in China (Song and Zhang, 2009). This set of CSILs has been used to explore the genetic basis of heterosis for interspecific hybrids (Guo et al., 2013). A set of 14 chromosome substitutions (CS-B) with specific chromosomes or chromosome arms from G. barbadense substituted into G. hirsutum and chromosomespecific F 2 families has been reported, and additive and dominant effects are significant for LP (Saha et al., 2006). Another set of 17 CS-B lines (2n = 52) were reported that contain G. barbadense doubled-haploid line "3-79" germplasm systematically introgressed into "TM-1". TM-1 yield is much greater than 3-79, but cotton from 3-79 has superior fiber properties. CS-B14sh, 17, 22Lo, and 25 produce positive homozygous dominant effects on lint yield (Saha et al., 2010). A CSSL population was developed via crossing and backcrossing of G. hirsutum and G. barbadense by PCR-based MAS (Zhu D et al., 2020). By whole genome re-sequencing, 11,653,661 highquality SNPs were identified with 1211 recombination chromosome introgression segments from G. barbadense. Six QTLs for LP were identified that have negative alleles from G. hirsutum background.
Extensive efforts have generated successful intraspecific crosses of G. hirsutum, and interspecific crosses between G. hirsutum and G. barbadense. Many interspecific ILs or CSILs have been developed and studied Li et al., 2019), and some ILs have been identified with positive effects that increase LP, such as the IL008 lines (Zhu et al., 2017). Many advanced new cotton germplasm resources and cultivars have been bred, such as C 24 (LP = 48-50%; Wang and Xiao, 1996), IL-10-1 (LP > 37%; Ma et al., 2012) (Li and Wang, 1992), Lumianyan22 , and Sumian16 and Simian3 (both LP > 43%; . These enhanced cotton germplasm resources can be used for LP breeding in upland cotton.

The Heterosis of LP
Utilization of heterosis has greatly improved the productivity of many crops worldwide, including cotton. Hybrid cotton varieties exhibit strong heterosis that confers high fiber yields . Understanding the potential molecular mechanism underlying how hybridization produces superior yield in upland cotton is critical for effective breeding (Shahzad et al., 2020). Although the role of loci with over-dominant effects continues to be discussed by biologists, the genetic basis of heterosis has not yet been determined. Genetic models are important, as heterosis is a nonlinear effect from multiple heterozygous gene combinations (Gai et al., 2007). Trait phenotype and heterosis may be controlled by different sets of loci and many genetic factors likely function together to yield heterotic output (Shahzad et al., 2020). All three forms of genetic effects, additive, over-dominant, and epistasis, and environmental interaction contribute to the heterosis of yield and its components in upland cotton, with over-dominance and epistasis the most important (Guo et al., 2013;Li et al., 2018).
A study with 286 upland cotton cultivars found that female parent groups based on genetic distance (GD) and F 1 performance significantly differ in LP. The correlations between GD and F 1 performance, mid-parent heterosis (MPH), and best parent heterosis (BPH) are significant for LP . Fourteen putative candidate genes were identified as associated with heterosis of LP (Sarfraz et al., 2021). A study with CP-15/2, NIAB Krishma, CIM-482, MS-39, and S-12 found greater specific combining ability (SCA) variance than general combining ability (GCA) variance for LP (0.470), indicating the predominance of non-additive genes (Imran et al., 2012). Dominance plays an important role in the genetic basis of heterosis in Xiangzamian2, and non-additive inheritance is also an important genetic mode for LP (Liu et al., 2012).
Studies on the interspecific heterosis of G. hirsutum × G. barbadense were performed starting in the 1960s in China (Diao and Huang, 1961;Hua et al., 1963). Although many traits exhibit heterosis such as boll setting, fiber quality, photosynthesis, and drought and disease resistance, some hybrid cultivars suffer from severe defects that preclude planting in many areas, with characteristics such as small bolls, low LP, and low yield (Chen et al., 2002). Breeding chromosome segment substitution lines (CSSLs) is a main strategy to utilize excellent genes from G. barbadense and partial interspecific heterosis in upland cotton.
Studies with CSSLs and their 50 F 1 hybrids found that yield and yield components are controlled by combined additive and dominant effects, and LP is mainly controlled by additive effects. Significant positive MPD was detected for all yield traits, and significant positive BPH was detected for all yield traits except LP (Shen et al., 2006). A proposed model was studied in G. hirsutum introgression lines (ILs) harboring a segment of G. barbadense. These ILs contained 396 QTLs for 11 yield and other traits. The yield group has significantly higher over-dominant QTL ratios, with 16 over-dominant QTLs for 5 yield-related traits consistently detected during 5 cotton growing seasons, one of which is LP. Overdominance is the major genetic basis of lint yield heterosis in interspecific hybrids between G. barbadense and G. hirsutum (Tian et al., 2019).
Over-dominance is a major factor for yield heterotic output, so genetic diversity is especially important. Genitors with short GD share many alleles, and there are fewer complementary genes in heterozygotes. The genetic diversity among 16 cotton cultivars was studied. Variables with low magnitude, redundancy, and non-stability are not ideal for genetic diversity study, but LP is the trait that most contributes to genetic diversity at 31.3% (Santos et al., 2017).
Although the genetic basis of heterosis has not been resolved in cotton, the available data indicate that LP is a significant contributor to genetic diversity (Santos et al., 2017). SCA variance is greater than GCA variance for LP, showing the predominant contribution of non-additive genes (Imran et al., 2012). Overdominance is the major genetic basis of lint yield heterosis in interspecific hybrids of G. barbadense and G. hirsutum, including LP (Tian et al., 2019), and LP has significant negative heterosis value (Yuan et al., 2002;Shen et al., 2007). Additive and over-dominant effects have been found for QTLs of LP . At the singlelocus level, the genetic bases of heterosis vary in different populations. In conclusion, additive effects, over-dominant effects, epistasis, and environmental interactions all contribute to the heterosis of yield and its components in upland cotton, with over-dominance and epistasis the most important .

High-Density Genetic Maps of Cotton LP
A QTL is a "gene" controlling a certain quantitative trait. Although where it is and what it encodes may be unknown, the term QTL theoretically links a gene and a trait. QTLs can be detected and located by genetic linkage analysis and QTL mapping relying on a similar set of genetic assumptions. Genetic linkage analysis can detect both major gene and polygene effects, does not need a linkage map and provides no information about the location of the QTL. QTL mapping can detect and locate a possible QTL if a linkage map is available (Gai et al., 2007). Significant advances in QTL mapping occurred in the 1990s, enabling the study of individual QTLs and interaction between QTLs for heterosis, providing a powerful tool to identify the genetic bases of yield and yield components (Liu et al., 2012;Li et al., 2018). QTL mapping using populations segregating for mutants can increase the sensitivity of this method due to the enlarged quantitative variation scale by the expression of qualitative mutant genes . In the discovery of major genes or major QTLs, segregation analysis and QTL mapping are mutually complementary methods that can verify results.
The first cotton linkage map was reported in 1985, with 11 linked groups located on chromosomes, and mapping of some mutant genes, including two LP genes (Endrizzi et al., 1985). Currently, more than 31 genetic linkage maps have been reported in upland cotton (Supplementary Table S1). Crosses including intra-species of G. hirsutum, inter-species of G. hirsutum and G. barbadense, populations of immortalized F 2 and F 2:3 , RIL, backcross inbred lines (BIL), and immortalized backcross populations (DHBCF1s and JMBCF1s). Various molecular markers such as simple sequence repeats (SSR), expressed sequence tag-SSR (EST-SSR), fragment length polymorphism (RFLP), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), chip-SNP, and whole genome re-sequencing have been employed to make genetic linkage maps of cotton. The density of the genetic maps has increased, and the marker span has reached 5115.16 cM, with an average marker interval that is less than 1 cM 0.92, 0.72, and 0.5 cM (Blenda et al., 2012;Shi et al., 2015;Diouf et al., 2018;Gu et al., 2020). We have constructed a consensus genetic linkage map covering the whole genome with 8295 markers (458 SSR, 5521 SLAF-SNP, and 2316 chip-SNP), spanning 5197.17 cM, with 2384.94 cM for the A t subgenome and 2812.23 cM for the D t subgenome. The average interval between adjacent markers is 0.88 cM (Zhang K et al., 2019).
With the development of techniques related to genetic mapping, the coverage and accuracy of the maps are constantly increasing, enabling discovery of more QTLs. High density maps make it possible to clone candidate genes of major effective QTLs, including LP. As more QTLs are discovered, the number of high-quality QTLs for LP with closely linked markers has increased, leading to the discovery of more effective candidate genes. Thus, these improved techniques have greatly facilitated molecular breeding and cotton LP improvement efforts. Some genetic and association maps reported in cotton, including the numbers of the identified QTLs for LP, are listed in Supplementary Table S1.

High-Density Genetic Maps of G. hirsutum × G. barbadense
Cotton genetic maps have been constructed with interspecific crosses of G. hirsutum × G. barbadense (Supplementary Table  S1) (Li et al., 2009;Blenda et al., 2012;Yu et al., 2013;Shi et al., 2015). For example, a high-density SSR genetic linkage map has been drawn using a BC 1 F 1 population developed from a cross of G. hirsutum × G. barbadense. The map comprises 2292 loci and covers 5115.16 cM of the cotton AD genome, with an average marker interval of 2.23 cM. For 1577 common loci on this map, 90.36% agree well with the marker order on the D genome sequence genetic map. Twenty-six QTLs for LP were identified on 9 chromosomes, and 50% of the QTLs are from G. barbadense and increase LP by 1.07-2.41% .

High-Density Genetic Maps of QTLs From G. barbadense
Upland cotton CSILs with chromosome fragments from G. barbadense are ideal materials to identify new agronomic genes from sea island cotton, which can be identified using genetic maps (Supplementary Table S1) (Zhu et al., 2010;Wang et al., 2011;Guo et al., 2013;Said et al., 2015;Zhai et al., 2016;Zhu et al., 2017;Chen et al., 2018;Kong et al., 2018). For example, F 2 and F 2:3 populations derived from the cross of CSILs-IL-15-5 × IL-15-5-1 were used to map QTLs for LP and seed index with SSR markers. The SI and LP characteristics of 774 F 2 individuals and F 2:3 families were analyzed by composite interval mapping (CIM), and two QTLs were detected for LP: qLP-15-1 in the F 2 and F 2:3 populations located between NAU3040 and JESPR152, with confident genetic distances of 5.40 and 3.20 cM, respectively, and qLP-15-2, mapped between NAU5302 and NAU2901 with confident genetic distance of 0.08 cM (Zhu et al., 2010). In this way, more than 107 QTLs for LP have been reported (Said et al., 2015) (Supplementary  Table S1).

QTLs Controlling Cotton LP
LP is controlled by many QTLs. Study of the F 2:3 population derived from the cross of HS46 × MARCABUCAG8US-1-88 identified a single QTL that explains 15.45% of phenotypic variance of LP, with other QTLs that explain 10.03-15.46% (Diouf et al., 2018). Mapping where these genes reside on the chromosomes will facilitate breeding, especially if easily measured molecular markers are closely linked with the specific QTLs so they can be used in MAS (Shappley et al., 1998;Liu et al., 2014). The classical genetic method of studying quantitative traits is based on a linkage analysis of both parents, enabling the mapping of multiple QTLs responsible for LP using segregating populations from different parents . To date, many QTLs controlling LP have been reported (Table 1;  Supplementary Table S2).

QTLs for LP in Upland Cotton With Lintless Mutants
Using mapping populations developed from a G. barbadense line (fuzzy-linted) and a single fuzzless locus mutant (G. hirsutum, N 1 or n 2 ), considerable quantitative variation in lint fiber production superimposed on the discrete effects of fuzz mutants was observed and both loci mapped within the intervals of QTL for LP and lint index (Rong et al., 2005). These results were confirmed by QTL mapping for LP and fiber quality traits (micronaire, uniformity, and fiber elongation) using an n 2 mutant derived population (Rong et al., 2007). Two F 2 populations were studied from crosses of MD17, a fuzzless-lintless line containing three fuzzless loci, N 1 , n 2 and a postulated n 3 , with line 181, fuzzless-linted and with FM966, a fuzzy-linted cultivar. QTLs for LP and LI explain a high percentage of the phenotypic variation (62.8-87.1%) and are probably on the same location on chromosome 26 with the common marker BNL3482-138 in both populations. The MD17 BNL3482-138 allele at this locus decreases both LP and LI   (Supplementary Table S1).

QTLs for LP From G. barbadense
Many QTLs for LP from G. barbadense have been identified. Eight QTLs for LP were discovered in an RIL population of G. barbadense (Fan et al., 2018). The F 2 populations from two CSSLs MBI7455 and MBI7358 were quantified using SSR, and five LP QTLs were identified (Guo et al., 2015). Four CSILs, MBI7115, MBI7412, MBI7153, and MBI7346, were obtained by advanced backcrossing and continuous inbreeding of upland cotton variety CCRI45 and sea island cotton variety Hai1 and were used to construct double-cross segregating populations F 1 and F 1:2 through the following crosses [(MBI7115 × MBI7412) × (MBI7153 × MBI7346)]. The analysis revealed that three QTLs control LP (Kong et al., 2018). Four BC 1 F 1 populations, E(E3), E(3E), (E3)E, and (3E)E, were developed by crossing sea island cotton 3-79 ("3") and upland cotton "Emian22" ("E"), using "Emian22" as the recurrent parent. Using composite interval mapping, two LP QTLs were detected on a single chromosome (Dai et al., 2018). We previously developed a set of CSSLs using CCRI8 of G. hirsutum as the recipient parent and Pima90-53 of G. barbadense as the donor parent. We genotyped the BC 3 F 5 generation of CSSLs with SSR markers, constructed a QTL map for fiber quality and yield traits, and identified the stable QTLs in three different environments (Baoding, Qingxian, and Luntai). Seven QTLs for LP were detected, and each QTL explained 2.03-19.38% of the phenotypic variation .

High-Density Association Maps and QTNs of Cotton LP
A QTN is a marker associated with a QTL/gene and is likely in a candidate gene of the associated QTL if it is in a functional gene region. A QTN links a QTL with a nucleotide mutation. Different genome-wide association study (GWAS), single locus-GWAS (SL-GWAS), multi-locus GWAS (ML-GWAS), and restricted two-stage, multi-locus, multi-allele GWAS (RTM-GWAS) approaches have been used to search QTNs for LP in a large number of cotton accessions (Supplementary Table S1). For example, using RTM-GWAS, 86 single-nucleotide polymorphism linkage disequilibrium block (SNPLDB) loci for LP were identified from 315 cotton accessions . A total of 719 upland cotton accessions were screened by GWAS combined with cottonSNP63K array. High density SNP maps have been constructed (Sun et al., 2018), including more than 16 association maps (Supplementary Table S1). Many candidate genes for agronomic traits have also been reported (Fang et al., 2017;Huang et al., 2017;Sun et al., 2018;Shen et al., 2019;Su et al., 2019;Zhu G et al., 2020;Yu et al., 2021).
Using GWAS, six significant SNPs associated with LP were identified on chromosomes A02, A04, D03, and D12, together explaining approximately 43.41% of the total phenotypic variation. Three loci, i03389Gh, i33922Gh, and i03391Gh, are near each other on chromosome D03 (Sun et al., 2018). A comprehensive genomic assessment of modern improved upland cotton was performed based on the genome-wide resequencing of 318 landraces and modern improved cultivars or lines. Two major haplotypes with two nonsynonymous SNPs are associated with higher LP and more bolls per plant (GhLYI-A02 HLB ), and lower LP and fewer bolls per plant (GhLYI-A02 LLB ) (Fang et al., 2017). A diversity panel consisting of 355 upland cotton accessions was subjected to specific-locus amplified fragment sequencing (SLAF-seq). Twelve SNPs associated with LP were detected via GWAS, with five SNP loci distributed on chromosomes A t 3 (A02) and A t 4 (A08) that contain two major effective QTLs that were detected in best linear unbiased predictions (BLUPs) and in more than three environments simultaneously (Su et al., 2016). To date, more than 1396 QTNs for LP have been reported (Supplementary Table S1).

A Consensus Physical Map of QTLs for LP in Upland Cotton
QTLs of LP represent genes controlling the phenotype of LP. QTNs are markers of the QTLs. Referring to the cotton genome sequences of G. hirsutum and G. barbadense    Yuan et al., 2015;Zhang et al., 2015;Huang et al., 2020), the QTLs of LP can be mapped using the locations of the associated QTNs or markers. In this way, more than 1439 QTLs for LP have been reported via genetic linkage maps. Excluding the replicate QTLs, there are more than 417 unique QTLs for LP. QTLs (183) with available tightly linked marker sequences were mapped to a consensus physical map of upland cotton (Figure 1). Obviously, many QTLs are clustered.

CANDIDATE GENES AND REGULATORY NETWORK OF LINT PERCENTAGE Candidate Genes of Cotton LP QTLs
Genes control phenotypes and isolation of the genes regulating LP is essential for functional gene marker development for MAS and gene engineering breeding. Fine mapping of the major effective QTLs for LP makes it possible to clone the candidate genes by map-based cloning. The availability of whole-genome sequences also paves a way to identify and clone functional genes, e.g., those relating to fiber development and increased LP (Pan et al., 2020;Sarfraz et al., 2021). A diversity panel of 355 upland cotton accessions was subjected to specific-locus amplified fragment sequencing (SLAF-seq). Twelve SNPs associated with LP were detected via GWAS, in which five SNP loci distribute on chromosomes A t 3 (A02) and A t 4 (A08). G. hirsutum_A02G1268 may determine LP (Su et al., 2016). A GWAS uncovered 23 polymorphic SNPs and 15 QTLs significantly associated with LP and identified two candidate genes, G. hirsutum_D05G0313 and G. hirsutum_D05G1124, as promising regulators of LP . Two ethylene-pathway-related genes are associated with increasing lint yield in improved cultivars, with two QTLs for LP, qLP-5-17.8 and Cs9_PF_21, located within the two association loci, respectively. G. hirsutum_A02G1392 is a homolog of the AP2/ethylene response factor (ERF)-type transcription-factor-encoding gene AINTEGUMENTA-like 6 FIGURE 1 | A consensus physical map of QTLs for LP in upland cotton. The QTLs were mapped to G. hirsutum genome by the locations of their linked markers. Left: the marker unit is Mb.
Frontiers in Genetics | www.frontiersin.org March 2022 | Volume 13 | Article 855574 (AIL6) in Arabidopsis thaliana (Fang et al., 2017). Another candidate gene, G. hirsutum_D08G2376, may control LP (Huang et al., 2017). Two genes, G. hirsutum_D03G1064 and G. hirsutum_D12G2354, increase lint yield and with G. hirsutum_D03G1067 are candidate genes of LP (Sun et al., 2018). Cotton PROTODERMAL FACTOR1 (GbPDF1) gene is related with LP . A non-synonymous SNP (A-to-G) site in a gene encoding the cell wall-associated receptor-like kinase 3 (GhWAKL3) protein is highly correlated with increased LP . A class I TCP transcription factor (designated GbTCP), from G. barbadense affects LP (Hao et al., 2012). A cotton GhH 2 A 12 gene encoding a typical SPKK motif in the carboxyterminal and a plant-unique peptide-binding A/T-rich DNA region is involved in fiber differentiation . Candidate gene Ghir_A03G020290 of a stable QTL qLPA03-1 for LP encodes a thioredoxin domain-containing protein 9 homolog (Gu et al., 2020). G. hirsutum_D03G0919 (GhCOBL4), G. hirsutum_D09G1659 (GhMYB4), and G. hirsutum_D09G1690 (GhMYB85) of LP QTLs were identified from RILs with G. barbadense introgressions . A MYB gene (GhMYB103), with two SNP variations in cis-regulatory and coding regions, is significantly correlated with LP, implying a crucial role in lint yield (Zhu G et al., 2020). A gene orthologous to Arabidopsis receptor-like protein kinase HERK 1 (GB_A07G1034) is predicated as a candidate gene for LP improvement (Yu et al., 2021). Two NET genes, GH_A08G0716 and GH_A08G0783, in novel QTL hotspots (qtl24 and qtl25), are mainly expressed during early fiber development stages and exhibit significant correlation with LP . More than 90 candidate genes of QTLs have been reported (Supplementary Table S5).
None of the genes of the major QTLs for LP have been isolated via map-based cloning, however, high density genetic linkage maps and multi-genome sequence data identified more than 90 candidate genes of QTLs for LP (Supplementary Table S5). According to the putative protein functions, most candidate genes of QTLs for LP are transcription factor (TF) and transcription related genes, with many candidate genes belonging to the MYB family of TF. According to the predicted gene functions of the candidate genes, transcription; signaling pathways including hormone, calcium (Ca 2+ ), and MAPK; energy metabolism; substance transport; cell division; and cell wall development are important biological processes for LP formation (Figure 2; Table 3; Supplementary Table S5).
To identify the major metabolic pathways of the candidate genes, pathway enrichment analysis was carried out using the Kyoto Encyclopedia of Genes and Genomes (KEGG; https://www.kegg. jp/kegg/) database. KEGG annotation analysis indicated that the 64 candidate genes of the QTLs for LP belonged to 44 KEGG pathways in G. hirsutum (Supplementary Tables S6 and S7). Among them, metabolic pathways, biosynthesis of secondary metabolites, MAPK signaling pathway, plant hormone signal transduction, plantpathogen interaction, starch and sucrose metabolism, spliceosome, and two-component systems were most enriched (Table 4; Figure 3). The above results indicated that most of the candidate genes that affect LP in upland cotton do so through various metabolic pathways.

Molecular Regulatory Network of Cotton Fiber Development
The coat of normal cottonseed is covered with lint and fuzz. The spinnable long fibers are called lint (25-35 mm) and initiate around flowering and then elongate rapidly afterward; the short fibers are called fuzz (~5 mm), which develops at a later stage (Stewart, 1975;Du et al., 2001). Anatomically, cotton fibers are single-celled trichomes that differentiate from the outermost cell layer (protoderm) of the ovule. Cotton fiber is a highly elongated single cell, and its development synchronizes with and depends on seed development (Liu D et al., 2015). Cotton fiber development is a highly programmed and regulated process that can be divided into four discrete, yet overlapping stages on the basis of morphological characteristics during a 50-d period: initiation (0-3 days post anthesis, DPA), elongation (3-20 DPA), secondary cell wall thickening (16-40 DPA), and maturation (40-50 DPA) (Rong et al., 2005;Liu D et al., 2015;Qin et al., 2019). LP is a representative phenotypic trait of fiber development and the candidate genes of QTLs for LP (Table 3; Supplementary Tables  S5 and S6) suggest relationships between LP and fiber development.
Many candidate genes of QTLs for LP participate in the regulation of fiber cell development. One candidate gene of QTLs for LP, G. hirsutum_A02G1268 (MIPS), is highly expressed during the early fiber development stage and may determine LP by regulating seed and fiber development (Su et al., 2016). Two candidate genes of QTLs for LP, G. hirsutum_D05G0313 and G. hirsutum_D05G1124, are highly expressed during ovule and fiber development stages . A core genomic segment has a non-synonymous SNP (A-to-G) site in a gene encoding the cell wall-associated receptor-like kinase 3 (GhWAKL3), which may be involved in signaling and fiber cell wall-development to increase LP . Analysis of CSILs of SL15 (low-LP) and LMY22 (high-LP) indicate that delayed secondary-cell-wall thickening results in altered LP (Gai et al., 2007). In selective sweeps, an apoptosis-antagonizing transcription factor gene (GhAATF1) and mitochondrial transcription termination factor family protein gene (GhmTERF1) were found to be highly involved in the determination of LP (Han et al., 2020). Most of these genes are preferentially expressed at the initiation of cotton fiber and early elongation stages (Table 3; Supplementary Table S5). Transgenic and functional gene assays have demonstrated that some genes that regulate fiber development can increase LP. GbPDF1 is expressed preferentially during fiber initiation and early elongation. GbPDF1 plays a critical role together with interaction partners in hydrogen peroxide homeostasis and steady biosynthesis of ethylene and pectin during fiber development via the core cis-element HDZIP2ATATHB2 . GbTCP is preferentially expressed in the elongating cotton fiber from 5 to 15 DPA and GbTCP positively regulates the level of jasmonic acid (JA) to activate down-stream genes necessary for fiber elongation (reactive oxygen species, calcium signaling, ethylene biosynthesis and response, and several NAC [for NAM, ATAF1/2, and CUC2] and WRKY transcription factors) (Hao et al., 2012). GhH 2 A 12 is preferentially expressed during cotton fiber initiation and early elongation stages, suggesting a role in fiber differentiation . GhFSN1 is a cotton NAC transcription factor that acts as a positive regulator to control secondary cell wall (SCW) formation of cotton fibers by activating downstream SCW-related genes, including GhDUF231L1, GhKNL1, GhMYBL1, GhGUT1, and GhIRX12 . Overexpression of GhCIPK6a enhances expression levels of co-expressed genes induced by salt stress that scavenge reactive oxygen species (ROS), are involved in MAPK signaling pathways, and increase LP under salt stress (Su Y et al., 2020). MiR319, GhTCP4, and GhHOX3 dynamically coordinate the regulation of fiber elongation . The glucosyltransferases, Rab-like GTPase activators, and myotubularins (GRAM) domain gene GhGRAM31 (Ghir_D02G018120) regulate fiber elongation and affect LP. GhGRAM31 directly interacts with GhGRAM5 and GhGRAM35. GhGRAM5 also interacts with the transcription factor GhTTG1, while GhGRAM35 interacts with the transcription factors GhHOX1 and GhHD1 (Ye et al., 2021). GhMYB109 is required for cotton fiber development (Pu et al., 2008). Overexpression of GhMYB3 in cotton leads to a slight increase in LP (Jiang et al., 2021). Over-expressing TF genes, GhHD-1, GhMYB25, and GhMYB25-like, in cotton can increase the density of fiber and fiber yield under field conditions. There are positive relationships between lint yield and LP and lint yield and fiber density in transgenic lines . Currently, more than 50 genes related to fiber development have been reported, including the miR319 gene (Supplementary Table  S8). The MYB-bHLH-WD40 (including MYB-DEL-TTG; CPC-MYC-TTG) (Liu D et al., 2015;Shangguan et al., 2021) and TCP-HOX-HD Ye et al., 2021) regulatory complexes play key roles in cotton fiber development. The NAC-MYB-CESAs module is a central pathway of SCW cellulose biosynthesis . Phytohormone balance, Ca 2+ signaling, and ROS also play key roles in the regulation of fiber development to affect cotton LP (Tang et al., 2014;Cheng et al., 2016) (Figure 4).
Transcriptomics studies have discovered many candidate genes and noncoding RNAs regulating LP. Cotton fibers initiate from the ovule epidermis on the day of anthesis. After fiber initiation, fiber cells expand rapidly as a result of increased intracellular swelling pressure and cell wall relaxation Qin et al., 2019). Comparative transcriptomics studies exploring differences in gene expression between long-and short-fiber cotton lines at five developmental stages identified 22 consistently down-regulated differentially expressed genes  (DEGs) involved in fiber initiation and 31 consistently upregulated genes involved in fiber elongation. These may be good candidate genes for improving LP. Of the candidate genes, ERF1 is involved in ethylene metabolism and is expressed at a higher level in long-fiber cotton lines at fiber initiation; TUA2 and TUB1 are involved in microtubule synthesis and expressed at a higher level in long-fiber cotton lines at fiber elongation; and PER64 encodes a peroxidase (POD) and is expressed at a higher level in short-fiber cotton lines (Qin et al., 2019). Three lines with different LP were developed by crossing Xu142 with its fiberless mutant, Xu142 fl, and three long noncoding RNAs (lncRNAs) were identified as involved in fiber development. Silencing XLOC_545639 and XLOC_039050 in Xu142 fl increases the number of initiating fibers on the ovules, but silencing XLOC_079089 in Xu142 results in a short fiber phenotype . This demonstrated that lncRNAs manipulated the initiation of lint and fuzz fibers and affected LP. There is good evidence for the involvement of at least 62 genes, 45 miRANs, and 3 lncRNAs, which has been reported (Supplementary Table S9). Because the numbers of involved genes and potentially involved genes are exceptionally large, further work is required to determine the exact functions of these genes or miRNAs and the importance for LP development.

Molecular Regulatory Network of Cotton LP
Although there have been many studies about cotton LP, the molecular mechanism of LP development remains largely unknown. The number of initial fibers determines the LP. The isolated candidate genes of QTLs for LP and the functionally analyzed genes suggest that key genes involved in fiber initiation and early elongation are the more important regulators, particularly those involved in hormone signaling, transcription, cell wall development, and signal transduction, as described above (Figure 4; Tables 3 and 4; Supplementary Tables S5, S6, S7, and S8). Based on the above summarized data, we proposed a molecular regulatory network for LP ( Figure 5). Calcium signaling regulates hormone signaling, such as ABA signaling in leaf guard cells, and regulates gene expression, energy, and substance transport FIGURE 3 | Candidate genes (gene number) that regulate LP are involved in major metabolic pathways.
Frontiers in Genetics | www.frontiersin.org March 2022 | Volume 13 | Article 855574 FIGURE 5 | Proposed molecular regulatory network for LP. The gray rectangle indicates the four overlapped fiber developmental stages. The arrows indicate the regulatory relationships. The yellow rectangles indicate biological processes. The circles indicate different regulators. All genes and favorable alleles are preferentially and highly expressed during the initiation and early elongation stages; however, BR and JA pathways are repressed. The red rectangle indicates the active biological processes that help increase LP.
FIGURE 4 | A molecular regulatory network of cotton fiber development. In the schematic, arrows indicate positive regulation; symbols of 'T' indicate negative regulation; lines indicate receptor or interaction; three tightly connected rectangles indicate tri-molecular complexes; red letters in clouds indicate phytohormones. Et, ethylene; ABA, abscisic acid; GA, gibberellin; JA, jasmonic acid; ROS, reactive oxygen species. "Gh" before gene symbols was omitted for space. The detailed information of the genes and their functions in cotton fiber development is listed in Supplementary Table S8.
Frontiers in Genetics | www.frontiersin.org March 2022 | Volume 13 | Article 855574 (McAinsh et al., 1997). TFs modulate transcription and hormone signaling and can affect LP. For example, MYB52 is involved in ABA signaling  and AIL6, a AP2-like ethylene-responsive TF, involved in ethylene (ET) signaling . Most candidate genes of QTLs for LP (27/90) are TFs or otherwise related to transcription, suggesting that gene expression is more active during fiber initiation and elongation. The second most genes (12/90) are related to hormone signaling, indicating key roles in regulating fiber development ( Table 3). The biological processes of signaling, cell division, cell wall development, energy metabolism, and substance transport are important for LP formation Ma et al., 2019;Qin et al., 2019). In addition, some lncRNAs are involved in fiber development, and increase LP . According to the currently available data, longer fiber initiation and elongation stages should result in higher LP.

CONCLUSION AND PERSPECTIVE
The current understanding of inheritance, QTLs, and candidate genes of LP is systemically summarized here. A large amount of genetic research results has been reported, including more than 417 unique QTLs for LP, 60 major effective QTLs, and more than 90 candidate genes of the QTLs. Additionally, more than 50 genes related to fiber development and microRNAs and long noncoding RNAs affecting LP have been reported. Most LP QTL candidate genes and genes related to fiber development affecting LP are preferentially expressed during fiber initiation and elongation. The biological processes of signaling, transcription, cell division, cell wall development, energy metabolism, and substance transport are important for LP development. By summarizing and analyzing the available data, this work reports the current state of the field and reveals what is known about cotton LP formation.
Although great genetic achievements have been made in cotton LP studies, it is difficult to improve LP without altering other yield components. The main reasons are as follows: (1) the inheritance of LP is complex; (2) LP is significantly negatively correlated with other key yield and quality components; (3) most QTLs for LP are unstable in different populations and environments, and may explain too little phenotype variance to trace with MAS; (4) LP has a significant negative heterosis value; (5) most genes of QTLs for LP are unknown and the molecular regulatory mechanism of LP is unclear. To address the above problems, future studies should: (1) Fully characterize the orchestrated inheritance model of LP and the relationship of LP with other yield and quality components; (2) identify more stable and major effective QTLs for LP; (3) improve elite germplasm resources of LP by breaking the linkage between LP and other deleterious yield and quality component QTLs/alleles; (4) study the molecular mechanisms of fiber development and increasing LP using omics and gene functional validation; and (5) develop effective techniques to pyramid favorable genes or alleles for high yield and high LP cotton breeding.

AUTHOR CONTRIBUTIONS
HN searched and analyzed the data and drafted the manuscript. QG helped with analysis of the data. HS and YY designed the whole study, revised the manuscript, and gave the final approval to the version of the manuscript that is being sent for consideration for publication.