Genome-Wide Identification and Evolutionary Analysis of Gossypium Tubby-Like Protein (TLP) Gene Family and Expression Analyses During Salt and Drought Stress

Tubby-like proteins (TLPs) possess a highly conserved closed β barrel tubby domain at C-terminal and N-terminal F-box. The role of TLP gene family members has been widely discussed in numerous organisms; however, the detailed genome-wide study of this gene family in Gossypium species has not been reported till date. Here, we systematically identified 105 TLP gene family members in cotton (Gossypium arboreum, Gossypium raimondii, Gossypium hirsutum, and Gossypium barbadense) genomes and classified them into eight phylogenetic groups. Cotton TLP12 gene family members clustered into two groups, 4 and 8. They experienced higher evolutionary pressure in comparison to others, indicating the faster evolution in both diploid as well as in tetraploid cotton. Cotton TLP gene family members expanded mainly due to segmental duplication, while only one pair of tandem duplication was found in cotton TLPs paralogous gene pairs. Subsequent qRT-PCR validation of seven putative key candidate genes of GhTLPs indicated that GhTLP11A and GhTLP12A.1 genes were highly sensitive to salt and drought stress. The co-expression network, pathways, and cis-regulatory elements of GhTLP11A and GhTLP12A.1 genes confirmed their functional importance in salt and drought stress responses. This study proposes the significance of GhTLP11A and GhTLP12A.1 genes in exerting control over salt and drought stress responses in G. hirsutum and also provides a reference for future research, elaborating the biological roles of G. hirsutum TLPs in both stress responses.


INTRODUCTION
Tubby-like proteins are a family of bipartite transcription factors that were first studied in animals (Boggon et al., 1999;Santagata et al., 2001;Carroll et al., 2004) but have subsequently been identified from single-celled to multicellular organisms (Liu, 2008). Tubby-like proteins (TLPs) are characterized by the presence of the conserved C-terminal tubby domain, comprising of 12 antiparallel closed β-barrel strands with a central hydrophobic α-helix (Boggon et al., 1999). In animals, TLPs are present in fewer numbers (ranging up to five) but have been ascribed to a wide range of cellular functions, including involvement in neuronal functions and development (Kleyn et al., 1996). In contrast to animals, the TLP family in plants is much larger with more than 10 members. Moreover, unlike animal TLPs, which possess a variable N-terminal region, plant TLP proteins possess a conserved N-terminal F-box domain besides the C-terminal tubby domain (Noben-Trauth et al., 1996;Gagne et al., 2002;Lai et al., 2004;Xu et al., 2016). Fbox-comprising proteins are involved in the ubiquitin-mediated degradation of proteins (Kile et al., 2002), suggesting a role for TLPs in such processes.
In plants, tubby-like proteins have been studied in some dicot and monocots, such as Arabidopsis thaliana, poplar, rice (Yang Z. et al., 2008), apple , and maize . In A. thaliana (At), 11 TLP genes have been identified while 14, 15, and 10 genes have been identified in rice, maize, and apple (Yang Z. et al., 2008;Chen et al., 2016;Xu et al., 2016). Studies of the TLP families in these plants reveal expression in different tissues and in response to different hormone treatments or under abiotic stress conditions (Lai et al., 2004;Liu, 2008;Xu et al., 2016). In Arabidopsis, AtTLP3, and AtTLP9 were found to function redundantly in response to abscisic acid (ABA) and osmotic stress treatments (Lai et al., 2004), while AtTLP9 was also demonstrated to respond in drought and salt stress (Lai et al., 2004;Bao et al., 2014). In apple, several TLP genes were upregulated in response to abiotic stress treatments, suggesting an important role for TLP genes in stress responses . In Cicer arietinum, CaTLP1 was expressed in response to dehydration stress, and its expression in tobacco led to enhanced tolerance to drought, oxidative, and salt stress (Wardhan et al., 2012). Collectively, these studies suggest that TLPs might have a significant function in the stress response of diverse plant species . However, the role of plants TLPs and their mode of action remain largely unknown .
Cotton (Gossypium spp.) is the most important natural fiber producing crop worldwide (Yang et al., 2020a). A lot of diversity exists in the Gossypium genus that includes six tetraploid species (2n = 52) and 45 diploids (2n = 26) (Hawkins et al., 2006;Grover et al., 2015). Interspecific hybridization events among the G. herbaceum (A1) or G. arboreum (A2) (A-genome ancestral African species) and G. raimondii or G. gossypioides (D6) (D-genome native species) have resulted in allotetraploid G. hirsutum (upland cotton) and G. barbadense (Senchina et al., 2003;Zhu and Li, 2013), which are possibly the oldest main allopolyploid crops (Paterson et al., 2012;Chalhoub et al., 2014;Marcussen et al., 2014). Diversity within the Gossypium species provides a perfect model for examining the evolution and polyploid domestication (Yang et al., 2020a) and has been facilitated further with the completion of the whole genome sequences of G. raimondii, G. arboreum, G. barbadense, and G. hirsutum in the last few years (Wang K. B. et al., 2012;Li et al., 2014;Liu X. et al., 2015;Zhang et al., 2015). The large evolutionary diversity in cotton allows it to adapt to several different types of regions with differing environmental conditions, although the molecular basis of this adaptation is not yet well-understood.
We are interested in the evolution of the cotton TLP gene family and their possible roles in abiotic stress responses. So far, only a single member of the family GhTULP34 has been characterized . In this study, we have carried out comprehensive genomic exploration of TLP protein family in G. raimondii, G. arboreum, G. barbadense, and G. hirsutum and studied the expression profiles of G. hirsutum TLPs (GhTLPs) in salt and drought stress responses. The aim of this study was to provide a comprehensive understanding of cotton TLP genes for future breeding programs for the improvement of plant quality, production, and response to abiotic stresses.

Identification of the TLP Gene Family in Gossypium Species
The protein, cDNA, gene annotation, and genome files (gff) of G. arboreum (BGI_A2 v1), G. raimondii (JGI v2), G. barbadense (HAU v2), and G. hirsutum (HAU v1) were retrieved from the CottonGen resources (Yu et al., 2014) while the protein sequence of A. thaliana was procured from the TAIR database (Lamesch et al., 2012). The Hidden Markov Model (HMM) profiles of the TLP domain (PF01167) were taken from the Pfam database (El-Gebali et al., 2019). The four cotton genomes were employed as queries, and the Pfam database was used as a reference to identify the TLP protein in the cotton dataset, using the HMMER search program (http://hmmer.wustl.edu/; Eddy, 2011). The identified TLP genes were further confirmed by BLASTP search and NCBI Batch-CD search (Lu et al., 2020).

Physiochemical Properties and Characterization of Cotton TLP Genes
The physiochemical properties, such as charge, molecular weight (Da), grand average of hydropathy (GRAVY), instability index, isoelectric points (pI) of G. arboreum (Ga), G. raimondii (Gr), G. barbadense (Gb), and G. hirsutum (Gh) TLPs, were determined through the ProtParam tool in the ExPASy web server (Gasteiger et al., 2003). The subcellular localization of cotton TLP proteins was predicted by using the software CELLO v.2.5 (Yu et al., 2006).

Analysis of the Encoded Protein Motif, Gene Structure, and miRNA Target Sites of Cotton TLP Proteins
The conserved protein motifs of cotton TLP genes were identified through the MEME (Multiple Em for Motif Elicitation) version 5.0.1 (Bailey et al., 2009) by employing the full-length proteins encoded by TLP genes in cotton. The exon-intron structure analysis was carried out with Gene Structure Display Server 2.0 , using the genomic and coding sequences of the identified cotton TLPs. To decipher the miRNA target sites in the cotton TLP transcripts, the complete sequences of all known and reported miRNAs of the four cotton species were fetched from the miRBase database (http://www.mirbase.org/; Kozomara et al., 2019). miRNA target site prediction analysis was performed through a plant small RNA target analysis server (psRNATarget 2017 release) (Dai et al., 2018), using the 376 cotton miRNA sequences.

Multiple Sequence Alignments (MSA) and Phylogenetic Analysis
To identify conserved regions of predicted cotton TLP proteins, multiple sequence alignments (MSA) were performed with the ClustalX2.1 program (Larkin et al., 2007), using default criteria. Phylogenetic tree construction was carried out through MEGA7 software (Kumar et al., 2016), with the maximum likelihood (ML) method, using the 1,000 bootstrap and the Jones-Taylor-Thornton (JTT) model. Visualization of the tree was carried out through Interactive Tree Of Life (iTOL; Letunic and Bork, 2019).

Expression Profile of Cotton TLP Gene Family Members Under Salt and Drought Stress Conditions
To gain insight into the expression profile of cotton TLP gene family members under salt and drought stress conditions, the Illumina RNA-Seq data of G. hirsutum (accession number: PRJNA532694) were retrieved from the NCBI database. The poor-quality reads were filtered by Fastx-toolkit (Schmieder and Edwards, 2011) and mapped to the G. hirsutum genome, using the TopHat2 (Kim et al., 2013). The estimation of transcript abundance was carried out with fragments per kilobase per million (FPKM) through Cufflinks software (Trapnell et al., 2012). The hierarchical clustered heatmap generation and visualization were done in the R program, using the pheatmap package.
RNA Isolation, cDNA Preparation, and Quantitative Real-Time PCR (qRT-PCR) Validation The selected putative TLP genes were validated in 2-month-old drought and salt-stressed plants of G. hirsutum, grown in the field under normal photoperiodic conditions. Plants were grown in triplicate, and a single treatment of 300-mM NaCl was used to stimulate salt stress (Wei et al., 2017) and 20% PEG8000 solution to decrease the osmotic potential of the root, inducing drought stress (Shafiq et al., 2015). The non-treated plants were taken as control. Leaf tissues were collected at 6, 12, 24, 48, and 72 h after treatment, RNA isolated as per the protocol (Sigma USA), followed by cDNA (1 µg/µl) synthesis with the verso cDNA synthesis kit (Thermo scientific) as per the provided protocol. Expression of seven genes was checked by qRT-PCR fluorescent quantitative detection system (HiMedia Insta Q 48 M4), using fast the SYBER TM green master mix (Applied Biosystem) with primers designed with the help of primer express 3.0. Ubiquitin was taken as the internal control. The reaction conditions of qRT-PCR were 95 • C for 10 min, followed by cycling for 40 cycles of denaturation at 95 • C for 10 s, annealing at 56 • C for 15 s and extension at 72 • C for 30 s. Relative expression of the employed genes was calculated with mean ± SD of biological triplicate samples by the 2-Ct method (Livak and Schmittgen, 2001).

Co-expression Network and Metabolic Pathway Analysis of Negatively and Positively Co-expressed Genes With
GhTLP11A and GhTLP12A.1 The co-expression network of the GhTLP11A and GhTLP12A.1 genes in salt and drought stress conditions was constituted by the FPKM values with the "expression correlation networks" module in Cytoscape version 3.8.0 (Smoot et al., 2011). The module calculated positive Pearson correlations (r ≥ 0.95) and negative correlations (r ≤ −0.95), with interacting members of the network. Network visualization and co-expression of genes were shown in Cytoscape by applying the force-directed layout. The important metabolic pathways and functional categories of positively and negatively co-expressed genes (PCoEGs and NCoEGs) with GhTLP11A and GhTLP12A.1 were estimated, using the MapMan software 3.5.1 version (Thimm et al., 2004). The average statistical test accompanied by the Benjamini Hochberg (multiple correction tests) was employed to know the functional categories.

Identification of cis-Regulatory Elements of GhTLP Genes and Homology Modeling of the Highly Expressed GhTLPs
The 2-Kb sequences upstream of GhTLP genes was analyzed for cis-regulatory elements by using the PlantCARE database (Lescot et al., 2002) by the "Signal Scan Search" program. The three-dimensional structure of GhTLPs was obtained through homology modeling, using the Phyre2 (Kelley et al., 2015) server. Structure visualization of GhTLPs was carried out with Chimera 1.11.1 version (Pettersen et al., 2004).

Statistical Analysis
The statistical analysis of qRT-PCR was carried out, using GraphPad Prism version 8.4.3 software, with two-tailed Student's T-tests in triplicate sample repeats.

Multiple Sequence Alignment (MSA) and Evolutionary Analysis
The multiple sequence alignment (MSA) of all cotton TLP genes showed a highly conserved C-terminal tubby domain and F-box at the C-terminal (Supplementary Figure 2). To determine the evolutionary relationship between TLP proteins of cotton and A. thaliana, MSA of 105 identified cotton TLPs with 11 A. thaliana TLPs was carried out. Furthermore, a phylogenetic tree was constructed, using the maximum likelihood tree (ML) method. On the basis of phylogenetic relationships, cotton TLP genes were clustered into eight major groups (Groups 1-8), each containing 21, 18, 17, 17, 16, 6, 6, and 4 TLPs, respectively (Figure 1).
The majority of the cotton TLPs were found to be clustered with A. thaliana TLPs, the exception being groups 4 and 8 cotton TLPs (GaTLPs12, GrTLPs12, GhTLPs12A, GhTLPs12D, GbTLPs12A, and GbTLPs12D). To gain insight into the groups 4 and 8 TLPs, a phylogenetic tree of these TLPs with other eudicots (Ranunculaceae, Brassicaceae, Caricaceae, Cucurbitaceae, Rutaceae, Myrtaceae, Rosaceae, Fabaceae, Euphorbiaceae, Scrophulariaceae, Salicaceae, Solanaceae, Malvaceae, and Vitaceae) was generated. Also, synteny analysis of cotton TLPs with those from A. thaliana (Brassicaceae) and T. cacao (Malvaceae) was carried out. This revealed that groups 4 and 8 TLPs were greatly conserved among G. arboreum (A genome), G. raimondii (D genome), G. hirsutum (At and Dt sub-genomes), G. barbadense (At and Dt subgenomes), and T. cacao genomes. The groups 4 and 8 cotton TLP genes were conserved among closely related species with Gossypium (Byng et al., 2016). On the other hand, no conserved homologs were found in Brassicaceae (  Table 2). The phylogenetic tree of groups 4 and 8 cotton TLP gene family members (GaTLPs12, GrTLPs12, GhTLPs12A, GhTLPs12D, GbTLPs12A, and GbTLPs12D) also revealed that groups 4 and 8 cotton TLP genes were not clustered with Brassicaceae family members (Supplementary Figure 3). This finding showed that, after the divergence from the common ancestor, the groups 4 and 8 cotton TLP homologs might have been evicted from the Brassicaceae family. The evolutionary associations among the Gossypium TLPs were deduced by building a separate phylogenetic tree, using the ML method with 1,000 bootstraps value. On the basis of the topology of the tree, paralogous nodes, organization of exon-intron, and conservation of motifs, the cotton TLP genes were categorized into seven groups with higher bootstrap value. The proteins in each group had a high identity (>70%) among orthologous members but differed considerably from the members of the other groups, suggesting a divergent evolution from a common ancestor or origin from gene duplication events ( Figure 3A).
To determine the consistency of the exon-intron pattern in the phylogenetic groups, a gene structure comparison of the cotton TLPs was carried out. Intron number varied from 3 to 8 (GaTLPs), 0 to 8 (GrTLPs), 2 to 8 (GhTLPs), and 1 to 8 (GbTLPs) ( Figure 3B and Table 1). The majority of the cotton TLP genes within the same group showed a similar pattern of exon-intron distribution. To study the conserved motif organization in TLP proteins, the MEME tool was employed for the analysis followed by annotation through InterProScan. A total of 15 conserved motifs were identified in the cotton TLP proteins. Only seven of these, motifs 1-7 (with the exception of motif 3), were found to form parts of the tubby domains. Motif 3 was annotated as the F-box domain ( Figure 3D and Supplementary Table 3). Motif 1 was found in all cotton TLP proteins, except GaTLP12.4, GhTLP12D.4, and GhTLP7D.1. The majority of the cotton TLPs with close evolutionary relatives had similar motif composition and were assumed to have a similar function ( Figure 3C).

Chromosomal Location and Gene Duplication Events of Gossypium TLP Genes
To identify the chromosomal localization of GaTLP, GrTLP, GhTLP, and GbTLP genes in the cotton genome, the BLASTN search was performed. GaTLP genes were distributed on chromosomes 1-11 ( Figure 4A), GrTLP genes were localized across chromosomes 2 and 4-11 ( Figure 4B), GhTLP genes were located on At chromosomes 1, 3, 5, 6, 8-12 and on Dt chromosomes 1, 2, 3, 5, 6, 8-12 with 14 and 19 genes, respectively (Figures 4C,D), GbTLP genes were distributed on At chromosomes 1-6, 8-12, and Dt chromosomes 1, 2, 3, 5, 6, 8-12 with 16 and 17 genes, respectively (Figures 4E,F). Given the expansion of the number of cotton TLP genes, gene duplication events were next studied. High amino acid sequence similarities were detected among protein encoded by TLP genes, as five pairs of paralogous TLP genes were identified in diploid cotton (G. arboreum and G. raimondii) (Figures 4A,B), while seven pairs of the paralogous genes were determined in G. hirsutum (At and Dt sub-genomes) and 12 in G. barbadense (At and Dt sub-genomes) (Figures 4C-F). These paralogous TLP gene pairs existed in the same group, and most of them showed >70% sequence similarities between the proteins encoded by these TLP gene pairs. Except for the GaTLP2.1/GaTLP2.3 gene pair, which was tandemly arranged, all other paralogous gene pairs were placed on distinct chromosomes, providing evidence that the expansion of the cotton the TLP family was mainly due to segmental duplication, not tandem duplication. In G. arboreum, four segmental gene duplications (GaTLP5.   Table 2). Most of the paralogous gene pairs showed recent duplication events (13-20 MYA) . The non-synonymous and synonymous substitution ratios (Ka/Ks ratios) for the duplicated Gossypium TLP gene pairs were consistently <1 (Table 2). Therefore, duplicated cotton TLP genes experienced intense purifying selection, which contributes to conserving their functions and reveals that not much diversion had taken place during the course of evolution (Gabaldon and Koonin, 2013). The orthologous gene pair, having <90% sequence identity in cDNA, and amino acid sequence were analyzed further for evolutionary studies (Supplementary Table 4). The selection pressure and the potential function of Gossypium TLPs were examined by computing the Ka, Ks, and Ka/Ks ratios among orthologous (A vs. D, At vs. A, Dt vs. D, At vs. At, and Dt vs. Dt) and within the homeologs (At vs. Dt). Interestingly, the Ka value of cotton orthologous TLP2 (GaTLP2/GrTLP2), TLP5 (GaTLP5/GrTLP5), TLP6 (GaTLP6/GrTLP6), TLP7 (GaTLP7/GrTLP7), and TLP12 (groups 4 and 8 TLPs) (GaTLP12/GrTLP12, GhTLP12A/GhTLP12D, and GbTLP12A/GbTLP12D) genes were greater in inter-genomes (A vs. D and At vs. Dt) in comparison to other orthologous TLP gene pairs, suggesting that these pairs experienced faster evolution. Subsequently, during the course of evolution, orthologous TLP gene pairs often retain their corresponding function in different species (Gabaldon and Koonin, 2013). A total of 153 out of 172 orthologous TLP gene pairs have a Ka/Ks ratio <1, and the rest 16 have Ka/Ks >1 in both diploid and allotetraploid species, indicating a greater number of the TLP orthologous genes pairs experienced purifying selection pressure, and some of them experienced Darwinian selection pressure ( Figure 4G and Supplementary Table 5). The Ka/Ks values of TLP2, 5, 6, 7, 8, 11, and 12 were higher in A vs. D, At vs. Dt, and Dt vs. D. Therefore, these TLPs experienced greater evolutionary pressure in diploid as well as in allotetraploid cotton and might have evolved rapidly in D subgenome as compared with A subgenome.

Effects of Salt and Drought Stress on the Expression Profiles of GhTLP Genes
It has been previously reported that TLP gene family members are expressed and regulated by several abiotic stresses (Lai et al., 2004;Wardhan et al., 2012;Bao et al., 2014). In view of these reports, we studied the involvement of cotton TLP genes in drought and salt stress conditions by analyzing transcriptomic data of leaf transcriptomes in response to drought and salt stress conditions. Thirty GhTLP genes exhibited differential expression, and twelve (GhTLP5A.2, GhTLP5D.2, GhTLP6A.3, GhTLP7D.2, GhTLP7D.3, GhTLP8A, GhTLP11A, GhTLP12A.1, GhTLP12.2, GhTLP12D.1, GhTLP12D.2, and GhTLP12D.3) showed significant higher  Figure 5H). The studies show that the GhTLP gene family members respond to different abiotic stresses such as drought and salt and may have a role in regulating stress responses of cotton against salt and drought. Furthermore, to validate the transcriptome data of GhTLPs in response to drought and salt stresses, the qRT-PCR validation of seven putative genes (GhTLP5A.1, GhTLP5A.2, GhTLP5D.2, GhTLP7A.2, GhTLP7D.3, GhTLP11A, and GhTLP12A.1) was carried out. The corresponding primers are listed in Table 3. The expression of seven GhTLP genes was significantly upregulated in salt-stressed plants, where the majority of the GhTLPs showed responses at 12 and 48 h (Figures 5A-G), while GhTLP11A and GhTLP12A.1 showed highest responses in terms of fold change (FC) (>80 and >90) compared with control (Figures 5F,G). Similarly, six out of seven GhTLP  (Figure 5N), but no expression was detected in GhTLP5A.1 at any time scale (Figure 5I), while GhTLP12A.1 showed highest response in terms of fold change (>140) in comparison to the control ( Figure 5O). Altogether, the differential responses of GhTLP gene family members to salt and drought stresses suggest that GhTLP genes may function to combat abiotic stresses in cotton. Still, further studies are required to clone the significantly higher expressed genes to establish the role

Co-expression Network and Pathways Analysis of Highly Expressed GhTLP11A and GhTLP12A.1 Genes at Different Time Intervals Under Drought and Salt Stress Condition
The highly expressed genes (GhTLP11A and GhTLP12A.1) were further selected for co-expression network and pathways study. Using FPKM values, the co-expressed genes with GhTLP11A and GhTLP12A.1 were explored. We identified positively co-expressed genes (PCoEGs) (2 and 4) and negatively co-expressed genes (NCoEGs) (5 and 14) in salt and drought stress with GhTLP11A ( Supplementary Figure 4 and Supplementary Tables 6, 7). Similarly, PCoEGs (13 and 48) and NCoEGs (9 and 24) were determined with GhTLP12A.1 in salt and drought stress (Supplementary Figure 4 and Supplementary Tables 6, 7).
PCoEGs and NCoEGs of GhTLP11A and GhTLP12A.1 were subjected for PageMan pathways analysis to understand the molecular and functional role of these genes. The pathway study revealed that NCoEGs of GhTLP11A showed higher expression of calcium signaling, and PCoEGs of GhTLP11A displayed higher expression of AS2 (lateral organ boundaries DOMAIN family protein) (Ma et al., 2006) at all time points (0, 6, 12, 24, 48, and 72 h) under salt stress condition (Supplementary Figure 5A). Earlier studies demonstrated that the elevated calcium levels help to protect plants from salt stress via SOS (salt overly sensitive) with signal transduction (Seifikalhor et al., 2019). In Medicagotruncatula, lateral organ boundaries domain (LBD1), Sorghum bicolorLBD, and Vitis viniferaLBD genes were upregulated under salt stress condition (Ariel et al., 2010;Wang et al., 2010;Grimplet et al., 2017), suggesting its role in salt stress response. Moreover, PCoEGs of GhTLP12A.1 showed higher expression of β-ketoacyl-CoA synthase (KCS), a key enzyme for the fatty acid elongation (Yang et al., 2020b), and NCoEGs of GhTLP12A.1 also displayed upregulation of a PHD-type transcriptional regulator at all time points (0, 6, 12, 24, 48, and 72 h) in salt stress response  Figure 5B). Results revealed that β-ketoacyl-CoA synthase improves salt tolerance in A. thaliana (Yang et al., 2020b), and a PHD-type transcriptional regulator also improves salt tolerance in transgenic A. thaliana (Wei et al., 2009). Additionally, NCoEGs of GhTLP11A demonstrated little higher expression of jasmonate hormone metabolism, abiotic stresses for 12 h while the MYB domain and the G2-like transcriptional regulator highly expressed at all time points (0, 6, 12, 24, 48, and 72 h) in drought stress response (Supplementary Figure 5C). Methyl jasmonate (MeJA) has been reported to get enhanced during drought stress (Wu et al., 2012) and causes stomatal closure to save the water in wheat and enhance the antioxidant ability under the drought stress condition . Moreover, MYB has been reported to play a crucial role in providing tolerance under drought stress in A. thaliana and cotton Chen et al., 2015). These results suggested the important roles in drought stress tolerance. Alteration of its expression might improve tolerance under drought stress in cotton. Furthermore, NCoEGs of GhTLP12A.1 showed higher expression in cellulose synthase enzyme related to the cell wall; PCoEGs of GhTLP12A.1 displayed significantly highly expressed in lipid metabolism-related enzymes, the DOF zinc finger family, and MAP (mitogen-activated protein) kinases, signaling at all time points (0, 6, 12, 24, 48, and 72 h) in drought stress response (Supplementary Figure 5D). An earlier report revealed the importance of cellulose synthase in drought stress via induction of gene expression in A. thaliana (Chen et al., 2005), and lipid metabolism also showed an important role in drought stress response in A. thaliana by decreasing the lipid content progressively (Gigon et al., 2004). Moreover, overexpressed the DOF zinc finger family provides resistance under drought stress in Populustrichocarpa , and MAP kinases signaling also enables to enhance the tolerance under drought stress via the transmission of definite stimuli and regulating the antioxidant defense system (Sinha et al., 2011).

Potential miRNA Target Sites in Gossypium TLP Transcripts
A large number of transcripts are regulated by miRNAs in response to stresses, signal transduction, and in plant development (Witkos et al., 2011). To study whether the cotton TLP genes may be regulated by miRNAs, target sites for miRNA binding were analyzed in the identified cotton TLP genes through the plant small RNA target analysis server (psRNATarget). The miRBase database possesses 378 cotton miRNAs. For the identification of miRNA target sites, the cutoff threshold of 4 was set in the search parameter. We were able to identify target sites for 41 cotton miRNAs in 56 cotton TLP transcripts with an expectation score (e) varied from 0.5 to 4 (Supplementary Table 8). Only the miRNA/target site pairs with cut-off 3.5 were selected to reduce the false-positive identification. Later, 44 miRNAs from the 14 miRNA families, comprising the target sites among 28 cotton TLP genes, were considered reliable in terms of e ≤3.5 (Table 4). Although the majority of the cotton TLP genes contain target sites for a single miRNA, some genes, such as GbTLP2D.1, GbTLP7D.1, GhTLP2D.1, GhTLP7D.1, GrTLP2.1, and GrTLP7.1, have target sites for more than one miRNA. Target site accessibility was evaluated by estimating unpaired energy (UPE), an essential factor in the identification of targets. The UPE of the target sites varies from 7.274 (gra-miR7494b) to 24.749 (gra-miR7486d), where a lesser amount of energy illustrated the greater chance of interaction among a miRNA and target sites (Marin and Vanicek, 2011).

cis-Regulatory Element Analysis of GhTLPs
The cis-regulatory elements are crucial to controlling the regulation of transcription in several stress conditions (Nakashima et al., 2009). To identify cis-regulatory elements that may govern TLP expression in cotton, a 2-Kb region upstream from the translational start site of GhTLPs was scanned for various cis elements. A total of 1,182 proximal elements were identified in 33 GhTLPs that included 737 for abiotic stresses, 343 for hormonal responses, 30 for biotic stresses and 72 elements for the other cis-regulatory elements (Figure 6 and Table 5). A higher number of cis-regulatory elements were identified in GhTLP2D.2 (60), whereas the least number of cis-regulatory elements were detected in GhTLP12A.1 (18) (Supplementary Table 9). In abiotic stress-responsive cisregulatory elements, the majority of the elements were involved in light responses, followed by low temperature, flavonoid biosynthesis, defense, and stress (Supplementary Table 9). The cis-regulatory elements related to hormonal responses comprised auxin, salicylic acid (SA), abscisic acid, gibberellin, and methyl jasmonate-responsive elements (Supplementary Table 9). The SA-responsive TCA element was present in a higher number in GhTLP11A. The auxin-responsive AuxRR-core element, which has an important role in salt, as well as drought stress responses (Guo et al., 2018;Kang et al., 2018), was detected in GhTLP5A.2, GhTLP5D.3, and GhTLP6A.2. Ethylene-responsive elements (EREs), which provide defense against salt and drought stress conditions (Pei et al., 2017;Sharma et al., 2019), were also detected and present in a higher number in GhTLP5D.2. The presence of these cis-regulatory elements in GhTLPs hints at their potential roles in the regulation of gene expression in cotton growth and development as well as under various environmental conditions (Nawaz et al., 2014).

Three-Dimensional (3D) Structural Analysis of the Putative GhTLPs Tubby Domain
The 3D structures of the GhTLPs tubby domain were determined through homology modeling of a central alpha-helix surrounded by a beta-barrel (Figure 7). All the putative GhTLPs have a typical tubby structure with a closed beta-barrel formed by 12 anti-parallel strands and a central alpha helix. While most GhTLPs contain five alpha-helices, GhTLP7A.2 and GhTLP12A.1 comprise six alpha-helices. These three-dimensional structural differences might lead to the functional diversification of different GhTLPs and suggest a slightly altered role for GhTLP7A.2 and GhTLP12A.1 as compared with the other GhTLP proteins. The higher transcript level of GhTLP12A.1 during drought and salt stress conditions further supported this hypothesis.

DISCUSSION
The genus Gossypium includes ∼45 diploid (2n = 2x = 26) and six tetraploid (2n = 4x = 52) species (Hawkins et al., 2006;Grover et al., 2015). G. hirsutum and G. barbadense are allotetraploids that have been arisen in the new world from interspecific hybridization among A-genome-like ancestral African species and D-genome-like American species . The closest relatives of the tetraploid progenitors are the A-genome species G. herbaceum (A1) and G. arboreum (A2) and the D-genome species, G. raimondii (D5) or G. gossypioides (D6) (Brubaker et al., 1999;Senchina et al., 2003). Approximately, 1-2 million years ago, polyploidization occurred, giving rise to allotetraploid species (Wendel and Cronn, 2003). G. hirsutum and G. barbadense are possibly the oldest main allopolyploid crops (Paterson et al., 2012;Chalhoub et al., 2014;Marcussen et al., 2014). In our efforts to study the TLP family in cotton, we have identified a total of 105 cotton TLP genes in four Gossypium genomes (G. arboreum, G. ramondii, G. hirsutum, and G. barbadense), 19 GaTLPs, 18 GrTLPs, 33 GhTLPs, and 35 GbTLPs (Table 1 and Supplementary Table 1). The genome sizes of G. arboreum and G. raimondii are 1,746 and 885 Mb , respectively, and, expectedly, G. arboreum had higher numbers of TLP genes as compared with G.  barbadense had a greater number of TLP genes as compared with G. hirsutum. The higher number of the TLP gene family members in G. barbadense might be due to the whole genome duplication events Qiao et al., 2019) which facilitate diversification (Clark and Donoghue, 2018). The domain study revealed that all the conserved cotton TLP proteins comprised the tubby domain at the C-terminal and F-box domain at the N-terminal end while some of the cotton TLPs lack the F-box (Supplementary Figure 1). This was also observed in Arabidopsis thaliana, indicating the functional role of TLP proteins lacking the F box (Lai et al., 2004). The phylogenetic analysis of cotton TLPs and A. thaliana protein sequences grouped the cotton TLP proteins into eight major groups (Groups 1-8). However, groups 4 and 8 cotton TLP genes were not clustered with any of A. thaliana genes (Figure 1). Further analysis of groups 4 and 8 cotton TLP genes with other eudicots showed the loss of these genes from the brassicaceae family only. The orthologous gene pair analysis of the cotton TLP12 genes (Groups 4 and 8) with A. thaliana and T. cacao (closest relative of Gossypium). The outcomes of synteny analysis showed that cotton the TLP12 gene family members (Groups 4 and 8) have orthologous duplicated genes in T. cacoa (Figures 2A-D) while no orthologous duplicated genes were detected in A. thaliana (Figures 2E-H). Therefore, it may be hypothesized that groups four and eight cotton TLP genes (TLP12) were the consequence of recent speciesspecific duplication events that led to independent functional diversification. Groups four and eight TLPs orthologous pairs experienced faster evolution as compared with the other TLP gene family members, indicating their functional divergence in Gossypium, proposing that groups four and eight TLPs might have a specific function in cotton species. The identified orthologous TLP12 gene pairs in G. hirsutum and G. barbadense are approximately double in comparison to G. arboreum and G. raimondii, respectively, showing the effect of polyploidy. This leads to more orthologous gene pairs in GhTLP12 and GbTLP12 genes than GaTLP12 and GrTLP12 genes (Qanmber et al., 2019a).
The evolutionary analysis within the Gossypium TLP genes showed most of them were greatly conserved during evolution, showed introns of these genes were not lost during evolution, and, at the early expansion stages of evolution, these genes diverged, whereas, over evolutionary time, other genes lost their introns (Qanmber et al., 2019b), indicating that group specific genes may have similar functions. According to a previous report on gene structure, introns performed essential functions during the course of evolution in several plant species (Roy and Gilbert, 2006). During the early expansion period, there was more intron, which subsequently decreased over the passage of time (Roy and Penny, 2007). GrTLP5.2 comprises no intron in gene structure; lack of intron indicates that the TLP gene is advanced where introns were disappeared over the evolutionary time period (Qanmber et al., 2019a); this gene might have some conserved evolutionary function in cotton. These findings demonstrated more advanced species comprise fewer introns in their genomes (Roy and Gilbert, 2005). Higher number of introns led to new functions (Qanmber et al., 2019a). Moreover, several gene families comprise no intron or with fewer introns in their genes Qanmber et al., 2018). Insertions or deletions events participate in the structural differences of exon/intron that might be useful to calculate the evolutionary mechanisms (Lecharny et al., 2003). Introns are absent in some genes that might be due to a rapid evolution rate, whereas a greater number of introns comprising genes leads to a gain of function in evolution (Qanmber et al., 2019b). The loss or gain of genes through segmental duplication or incomplete sequencing of genomes is the major cause for TLP genes distribution in cotton (Qanmber et al., 2019b).
Chromosomal allocation studies demonstrated that cotton TLP genes expansion has arisen due to segmental duplication except for GaTLP2.1/GaTLP2.3 (Figures 4A-F). The purifying selection probably excludes the deleterious loss-of-function mutations, refining functional alleles at both duplicate loci and fixing recent duplicate genes (Tanaka et al., 2009). All the identified paralogous cotton TLP gene pairs indicated the purifying selection ( Table 2). The recent duplication events in Gossypium TLPs have had implicit ecological, morphological, and physiological diversification (Wendel and Cronn, 2003). The diploid genomes of G. arboreum and G. raimondii were diverged 2-13 MYA, and allotetraploid cotton (G. hirsutum and G. barbadense) was originated about 1-2 MYA Wang M. J. et al., 2017;Wang et al., 2019). The duplication time of  implied that duplication events in Gossypium TLP gene families were more ancient than that of both polyploid formation and divergence of diploid species. This duplication might facilitate the unique role of TLP genes in Gossypium, i.e., cotton stress responses. The average duplication time of GaTLPs and GrTLPs was around 24.78 and 16.31 MYA, which probably took place after their divergence from T. cacoa (33 MYA)  and A. thaliana (93 MYA) ; before the reunification of A and D diploid genomes that lead to allotetraploid cotton Wang et al., 2019) (Table 2). These observations suggested that TLP2.2 and TLP2.3 in G. arboreum, G. raimondii, and G. barbadense might have arisen from the same duplication event of cotton TLP2.1 genes. All paralogous cotton TLP gene pairs except GaTLP2.1/GaTLP2.3 experienced segmental duplication (Figures 4A-F). Here, both segmental and tandem duplication helped in the TLP gene family expansion, but segmental duplication might have some significant role in the expansion of the TLP gene family members Qanmber et al., 2019a;Ali et al., 2020).
The orthologous gene pairs had the sequence identity >90% in cDNA and also in amino acid compositions (Supplementary Table 4), which were carried out for further evolutionary study. Among orthologous-duplicated pairs, the Ka/Ks values of TLP2, TLP5, TLP6, TLP7, TLP8, TLP11, and TLP12 were higher in A vs. D, At vs. Dt, and Dt vs. D. The divergence analysis showed that cotton TLPs experienced greater evolutionary pressure in diploid as well as in allotetraploid cotton and might have evolved rapidly in D subgenome as compared with A subgenome (Supplementary Table 5). All the identified orthologous cotton TLP gene pairs show purifying selection ( Figure 4G and Supplementary Table 5).
The TLP genes are known to play important roles in stress responses in various plant species (Lai et al., 2004;Liu, 2008;Xu et al., 2016). The transcriptome analysis data of G. hirsutum showed the high expression of GhTLP genes in salt and drought stresses. The expression analysis showed that GhTLP5A.1 and GhTLP5D.2 genes have a significantly higher relative expression in salt stress response, but not in drought stress; therefore, these genes might have a major role in salt-stress tolerance. Moreover, GhTLP11A and GhTLP12A.1 showed higher expression in both salt and drought-stress responses. Therefore, these two genes FIGURE 7 | The homology 3D model of the GhTLPs tubby domain. The GhTLPs were selected for three-dimensional structure prediction and display with confidence level >99%. The alpha helix is shown with red and beta fold is shown with violet color.
might have an important role in salt and drought tolerance and could be appropriate targets for further manipulation to protect the cotton from salt and drought stress.
To further characterize the function of GhTLP11A and GhTLP12A.1 genes, the co-expression network of these two genes (Supplementary Figure 4) was studied. This study revealed that PCoEGs of GhTLP11A comprised the lateral organ boundaries (LOB) domain (LBD), which were upregulated via ABA treatment in Vitis vinifera under salt-stress response (Grimplet et al., 2017). NCoEGs of GhTLP11A contained ABC transporter-like protein, actively involved in salt-stress recovery in Populuseuphratica (Gu et al., 2004) and calcium protein, which was considered as one of the important molecules in response to salinity (Seifikalhor et al., 2019), and, in a seedling of rice, Ca 2+ induces antioxidant enzyme activity and retains cellular redox potential under salt stress (Rahman et al., 2016; Supplementary Tables 6, 7). In salt stress, PCoEGs of GhTLP12A.1 comprised the SANT/MYB domain and the sugarphosphate transporter domain. Sugarcane MYB18, containing the SANT/MYB DNA-binding domain, remarkably improved tolerance to salt stress (Shingote et al., 2015) and phosphate transporter PHT4;6 of A. thaliana function in cell wall biosynthesis and protein N-glycosylation, which are crucial to salt tolerance (Cubero et al., 2009). NCoEGs of GhTLP12A.1 contained a FYVE/PHD-type zinc finger and MADS-box in salt stress. A. thaliana RING/FYVE/PHD ZFP (AtRZFP) is found to bind with zinc and provides tolerance to salt stress (Zang et al., 2016), and MADS-box considered a positive regulator of saltstress response via regulating the maintenance of ABA signaling, primary metabolism, detoxification, and ROS homeostasis through antioxidant enzymatic activities (Castelán-Muñoz et al., 2019; Supplementary Tables 6, 7). Results demonstrated that PCoEGs and NCoEGs of GhTLP11A and GhTLP12A.1 genes might be crucial in salt-stress responses. In drought stress, PCoEGs of GhTLP11A comprised a protein kinase domain and haloacid dehalogenase-like hydrolase (HAD hydrolase). Calcium-dependent protein kinase may function as calcium sensors and have an important role in drought-stress response.
In A. thaliana, calcium-dependent protein kinase 10 (CPK10) provides tolerance under drought stress via ABA and Ca 2+mediated stomatal regulation (Zou et al., 2010). A. thaliana trehalose-6-phosphate phosphatases (AtTPPs) encodes a protein in the HAD hydrolase superfamily that is involved in the biosynthesis of trehalose. Overexpressed AtTPPF leads to the accumulation of trehalose in response to drought stress and can increase the tolerance under drought stress (Lin et al., 2019). NCoEGs of GhTLP11A contained a B3 domain, which improves drought-stress tolerance via reducing the stomatal density and changed the shape of stomata in Zea mays  and a late embryogenesis abundant (LEA) gene, whose higher expression provides tolerance under drought stress in upland cotton (Magwanga et al., 2018;Supplementary Table 7). PCoEGs of GhTLP12A.1 comprised expansin and a Dof-type zinc finger in drought-stress response. Transgenic wheat expansin 2 (TaEXPA2) positively regulates tolerance under drought stress (Yang et al., 2020), and Brassica rapa expansin-like B1 (BrEXLB1) also associated with drought stress tolerance (Muthusamy et al., 2020), while the overexpressed DOF zinc finger family provides resistance under drought stress in P. trichocarpa . NCoEGs of GhTLP12A.1 contained UBA-like superfamily and dirigent protein under drought stress response (Supplementary Table 7). In wheat, a UBA protein (TaUBA), a negative regulator of drought stress, might function via downregulating some stress responsive transcription factors , and, in Boeahygrometrica, dirigent proteins provide a protective role under drought-stress response via changing the physical characters of lignin, which further affects the flexibility and mechanical strength of the plant cell wall (Wu et al., 2009). Taken together, our results showed that PCoEGs and NCoEGs of GhTLP11A and GhTLP12A.1 genes might have a crucial role in drought-stress tolerance.
Moreover, we determined 41 miRNA target sites in 56 cotton TLP transcripts with an expectation score (E) varied from 0.5 to 4 (Supplementary Table 8). In this study, 15 miRNA families, comprising target sites in 28 cotton TLP genes, were detected ( Table 4). An earlier report showed that some of the miRNA families were conserved among the plants, which displayed their function in the adaptation of plants to various stress responses (Jones-Rhoades and Bartel, 2004;Yuan et al., 2013). In Vitis vinifera, miR7494 has a prominent role in plants under abiotic stresses, and, in Zea mays, the expression of miR399 gets induced during abiotic stress response Kumar, 2014;Pagliarani et al., 2017;Snyman et al., 2017;Inal et al., 2020). These miRNAs that have been detected in this study are with lower UPE value (7.27-15.77) ( Table 4). These outcomes suggested that cotton miRNAs might also be involved in abiotic stress responses to enhance drought-and salt-stress tolerance.
Moreover, cis-regulatory element analysis demonstrated that, among the selected putative genes for validation, only GhTLP12A.1 cis-regulatory elements comprised an MBSI cisregulatory element related to flavonoid biosynthetic regulatory genes, which are very crucial to provide drought tolerance in wheat (Ma D. Y. et al., 2014). Overexpressing many of the genes of flavonoid pathways also provides tolerance under salt stress (Ashraf, 2009;Yang et al., 2009;Matus et al., 2010;Le Martret et al., 2011;Bharti et al., 2015). GhTLP11A comprised the higher number of salicylic acid (SA)-responsive TCA elements. Salicylic acid was identified as a potential hormone to provide tolerance against salinity (Khan et al., 2012) and improves the drought tolerance in rice (Farooq et al., 2009).
GhTLP5A.2, GhTLP5D.2, GhTLP11A, and GhTLP12A.1 also showed significant relative expression in qRT-PCR. From these observations, it could be speculated that the proximal elements of GhTLP11A and GhTLP12A.1 might have an important role in controlling the regulation and improvement of salt-and droughtstress responses in cotton. The results of the metabolic pathway study of PCoEGs and NCoEGs of GhTLP11A and GhTLP12A.1 genes and cis-regulatory elements also provided evidence of the involvement of two of these genes in salt-and drought-stress responses. However, detailed molecular explorations are required to understand the structural-functional relationship of cotton TLP genes and the involvement of GhTLPs to enhance the tolerance against drought and salt stresses.

CONCLUSION
In this study, a total of 105 cotton TLP proteins with a highly conserved tubby domain at C-terminal and N-terminal F-box were identified in four cotton species (G. arboreum, G. raimondii, G. hirsutum, and G. barbadense). Their protein domains, conserved motifs, and gene structure within the same groups shared a notable similarity, which leads to some similar functions. Furthermore, the cotton TLP12 gene family members clustered into 4 and 8 groups and experienced higher evolutionary pressure in comparison to others, showing their functional divergence in Gossypium species. Several G. hirsutum TLP genes showed significantly high expression in both drought-and salt-stress conditions. Two genes GhTLP11A and GhTLP12A.1 demonstrated comparatively higher expression and provided strong evidence that these genes can play a predominant role during drought and salt stress. Our investigation enhances the understanding of TLP genes in cotton at the level of function, evolution, and structure, which further highlights the intriguing field of TLP genes that have immense prospects for future manipulation.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: NCBI (https://www.ncbi. nlm.nih.gov/geo/) under accession numbers PRJNA532694.

AUTHOR CONTRIBUTIONS
NB carried out the bioinformatics analysis and design and drafted the manuscript. SF performed quantitative expression analysis. CM and SB participated to supervise the study. All the authors have read and approved the final manuscript.