Genome-wide identi ﬁ cation and expression analysis of the HD2 protein family and its response to drought and salt stress in Gossypium species

Histone deacetylase 2 (HD2) proteins play an important role in the regulation of gene expression. This helps with the growth and development of plants and also plays a crucial role in responses to biotic and abiotic stress es. HD2s comprise a C 2 H 2 -type Zn 2+ ﬁ nger at their C-terminal and an HD2 label, deacetylation and phosphorylation sites, and NLS motifs at their N-terminal. In this study, a total of 27 HD2 members were identi ﬁ ed, using Hidden Markov model pro ﬁ les, in two diploid cotton genomes ( Gossypium raimondii and Gossypium arboretum ) and two tetraploid cotton genomes ( Gossypium hirsutum and Gossypium barbadense ). These cotton HD2 members were classi ﬁ ed into 10 major phylogenetic groups (I-X), of which group III was found to be the largest with 13 cotton HD2 members. An evolutionary investigation showed that the expansion of HD2 members primarily occurred as a result of segmental duplication in paralogous gene pairs. Further qRT-PCR validation of nine putative genes using RNA-Seq data suggested that GhHDT3D.2 exhibits signi ﬁ cantly higher levels of expression at 12h, 24h, 48h, and 72h of exposure to both drought and salt stress conditions compared to a control measure at 0h. Furthermore, gene ontology, pathways, and co-expression network study of GhHDT3D.2 gene af ﬁ rmed their signi ﬁ cance in drought and salt stress responses


Introduction
Plants are immobile organisms and encounter a variety of stresses throughout the course of their lives. To overcome these stressful conditions, plants usually adopt three mechanisms (Larcher et al., 1973). First, they avoid stress through temporal activity. Second, they develop greater resistance against stressors by increasing their tolerance, and reduce their sensitivity through greater plasticity. Third, they have recovery mechanisms that can be stimulated in case of damage, e.g., reconstruction of damaged tissues (Huey et al., 2002). Post-translational modifications (PTMs) in plants play a significant role in combating environmental stresses (Mazzucotelli et al., 2008;Hashiguchi and Komatsu, 2016;De Vega et al., 2018). These processes are governed by certain proteins and enzymes. Histone deacetylases (HDACs) are a group of proteins involved in post-translational histone modification (histone deacetylation). This group is essential in controlling gene expression, which in turn modifies biological processes by removing acetyl moieties from histone proteins, resulting in the restoration of a positive charge (Dangl et al., 2001;Han et al., 2016).
Globally, the most important crop for the production of natural textile fabric is cotton (Gossypium spp.). Gossypium is a good model for the study of the evolution, origins, and domestication of polyploid species (Hu et al., 2019). It contains 5 tetraploid and 45 diploid species (Li et al., 2014;Chen et al., 2020). Due to its broader adaptability, the cultivated species G. hirsutum produces high-yield cotton fiber with moderate fiber quality; this species alone is responsible for approximately 90% of annual world cotton output (Hu et al., 2019).
It has been shown that the identified HD2 genes play a crucial role in growth, development, and responses to biotic and abiotic stresses in plants. Previous investigation of the histone deacetylase (HDAC) gene family in diploid and allotetraploid cotton (Imran et al., 2020) has primarily emphasized the different fiber development stages (0-25 DPA). In the present study, we mainly focus on drought and salt stress conditions at different time scales.
Over the last several years, whole genome sequencing of cotton (G. hirsutum, G. barbadense, G. arboreum, and G. raimondii) has been completed (Wang et al., 2012a;Li et al., 2014;Liu et al., 2015b). The completion of genome sequencing has enabled the identification of HD2 members in these four cotton genomes, as well as enabling evolutionary, expression, and co-expression studies. In G. hirsutum, the patterns of expression of putative HD2 members under drought and salt stress conditions have also been examined. The present study aimed to establish a foundation for an understanding of the evolutionary history of these genes, their functional importance, and their significance in drought and salt stress responses. This study was expected to provide considerable novel findings on epigenetic regulation, representing insight into its functional and evolutionary roles in Gossypium species.

Methodology
Sequence retrieval and genome-wide identification of the HD2 gene family in Gossypium species The whole-genome protein sequence dataset for G. arboreum, G. barbaense, and G. hirsutum was downloaded from the CottonGen database (https://www.cottongen.org/), and the dataset for G. raimondii was obtained from Phytozome, version 12 (https:// phytozome.jgi.doe.gov/pz/portal.html) (Nordberg et al., 2014). Total protein sequences for other plant species from different taxonomic groups were downloaded from the website of the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/). A total of 58 HD2 protein sequences from 42 plant species were obtained from the NCBI and utilized to construct Hidden Markov model (HMM) profiles. This profile of the HD2 domains (the HD2 label and the catalytic, regulatory, and zinc-finger domains) was employed as a query to identify HD2 gene family members using HMMER (V3.0) (Finn et al., 2015). Protein sequences and CDSs for G. arboreum, G. raimondii, G. hirsutum, and G. barnadense were also downloaded from the CottonGen database (https://www.cottongen. org/) . All hits were queried in the Pfam (http://pfam. xfam.org/) and InterProScan (http://www.ebi.ac.uk/interpro/search/ sequence-search/) databases to verify the presence of conserved domains. The ProtParam (http://web.expasy.org/protparam/) tool offered by Expasy was used to estimate the physicochemical parameters of Gossypium HD2 proteins. The ProtParam tool was also used to estimate biophysical and biochemical properties, such as number of amino acids, molecular weight, grand average hydropathy (GRAVY), theoretical isoelectric point (pI), aliphatic index, and instability index. The cotton HD2 gene subfamilies were named as per the orthologous HD2 members in the A. thaliana genome.
Subcellular localization, HD2 protein domain structure, and nuclear localization signal prediction The CELLO server, v.2.5 (http://cello.life.nctu.edu.tw/), was used to predict the probable subcellular locations of all the cotton HD2 proteins identified. The Pfam database was used to analyze the protein domain structures, and the Illustrator for Biological Sequences software package (http://ibs.biocuckoo.org/) (Liu et al., 2015a) was subsequently utilized to construct a schematic diagram of protein functional domains. Nuclear localization signals were examined using Motif Scan (https://myhits.isb-sib.ch/cgi-bin/ motif_scan).
Multiple sequence alignment, classification, and phylogenetic tree construction for HD2 members HD2 protein sequences were extracted from the complete genome sequences of 39 plant species. Multiple sequence alignments (MSAs) of full-length HD2 protein sequences with identified cotton HD2 proteins were performed using the Clustal X program (http://www. clustal.org/) (Larkin et al., 2007) with default parameters. Subsequently, these aligned sequences were used to construct a phylogenetic tree. The MEGA 7 software package (http://www. megasoftware.net/) (Kumar et al., 2016) was utilized to build an unrooted phylogenetic tree using the maximum likelihood (ML) method with the following parameters: pairwise gap deletion, JTT matrix-based model, and 1000 bootstrap values (Felsenstein, 1985).

Chromosomal mapping and gene duplication analysis
The physical (genomic) locations on chromosomes of all Gossypium HD2 genes were obtained by cross-referencing the results of searching for HD32 CDS sequences using the BLASTN search program against the resources of the CottonGen cotton database (https://www.cottongen.org/) and the Phytozome database, version 12 (https://phytozome.jgi.doe.gov/pz/portal.html). All HD2 genes of Gossypium species were mapped on the chromosomes using the MapInspect software (http://mapinspect.software.informer. com/).
Paralogous HD2 genes were identified through reciprocal BLAST analysis with e-value <10 -5 in order to deduce the evolutionary mechanism of the HD2 gene family in Gossypium species (G. arboreum, G. raimondii, G. hirsutum, and G. barbadense). As per the reciprocal BLAST output, duplication events in Gossypium HD2 genes were determined using the MCScanX toolkit (Wang et al., 2012b). Additionally, Ka/Ks analysis of ortholog and paralog sequences was carried out using the PAL2NAL program. Subsequently, Ks values were used to compute the approximate dates of duplication and speciation events via the formula T = Ks/ 2l, where l is the assumed value of the clock-like rate. In addition, the Ka/Ks ratio was used as an indicator of selection pressure for the duplicated HD2 genes. Specifically, Ka/Ks ratios greater than 1, less than 1, and equal to 1 were taken to suggest positive, negative, and neutral (purifying selection) evolution, respectively (Zhang et al., 2006).

Gene structure and conserved motif analysis
In G. arboreum, G. raimondii, G. hirsutum, and G. barbadense species, the gene structures of identified HD2 genes were examined via comparison of the predicted HD2 coding sequences with their corresponding genomic sequences using the GSDS online tool (Gene Structure Display Server; http://gsds.cbi.pku.edu.cn/) (Hu et al., 2015).
A MEME (Multiple Expectation maximization for Motif Elicitation) tool (http://meme.nbcr.net/meme/) (Bailey et al., 2009) was used for the identification of conserved protein motifs in identified cotton HD2 proteins. The following parameters were used in this analysis: optimum width = 6 to 300; one per sequence; maximum number of motifs to find = 20. Furthermore, these motifs were annotated using the InterProScan program (Quevillon et al., 2005).

Patterns of expression of cotton HD2 family members under drought and salt stress
To investigate the expression pattern of cotton HD2 members under salt and drought stress conditions, raw Illumina RNA-Seq data for G. hirsutum (accession number: PRJNA532694) were obtained from the Sequence Read Archive of the National Center for Biotechnology Information (the NCBI-SRA database). The raw reads were filtered using the Trimmomatic tool (Bolger et al., 2014), and high-quality reads were subsequently mapped on the G. hirsutum genome  using the STAR aligner (Dobin et al., 2013) with default parameters. Transcript abundance was estimated using the StringTie software (Pertea et al., 2015), and the differential expression of genes was evaluated using the edgeR Bioconductor package (Robinson et al., 2010).
RNA isolation, cDNA synthesis, and quantitative real-time PCR validation of HD2 family members in G. hirsutum The regulatory mRNA sequences of HD2 family members were validated via qRT-PCR. Stress conditions were employed on eightweek-old G. hirsutum plants. The plants were raised in pots (30 cm diameter, 25 cm height, 12 liter capacity) at the CSIR-NBRI garden, with soil made up of 40% silt, 20% clay, and 40% sand. They were exposed to a minimum temperature of 27°C and a maximum of 34°C; there were 14 hours of light and 10 hours of darkness per day. To create conditions of drought and salt stress, plants (in triplicate) were administered a single treatment of 20% PEG8000 solution (Shafiq et al., 2015) and 300-mM NaCl (Wei et al., 2017). Untreated plants were taken as controls. During the experiment, leaves were taken from the topmost juvenile section of each plant at the 0h, 12h, 24h, 48h, and 72 h time points after administration of the stress treatment. RNA was extracted from these samples using a Spectrum ™ Plant Total RNA Isolation kit (Sigma, USA). cDNA was prepared with 1µg RNA using the Verso cDNA synthesis kit (Thermo Scientific) following the provided protocol. 2× diluted cDNA was utilized to examine the expression of nine selected putative genes via a fluorescent quantitative detection system using qRT-PCR (HiMedia Insta Q48 M4); a 20 µl reaction was prepared with 10 µl SYBER ™ green master mix (Applied Biosystems), along with 5 pmol of primer concentration. Primers were designed with the help of Primer-BLAST, and ubiquitin was taken as a normalizing control. The qRT-PCR program was set to apply a temperature of 95°C for 2 min, followed by 40 cycles of denaturation at 95°C for 15 s and annealing at 60°C for 1 min. Gene expression levels were calculated with reference to ubiquitin for normalization, and the relative expression levels of the selected genes at each time point were compared with their expression levels at 0h. CT values were calculated with mean ± SD by the2-ΔCT method (Livak and Schmittgen, 2001).
Co-expression network, metabolic pathway, and gene ontology analyses of genes positively and negatively co-expressed with GhHDT3D.2 Based on the higher expression of GhHDT3D.2 observed in RNA-Seq and in qRT-PCR values, the co-expression network of GhHDT3D.2 was constructed for both stress conditions (drought and salt). This co-expression network was built using FPKM values using the "expression correlation networks" module of Cytoscape version 3.8.2 (Otasek et al., 2019). Specifically, this module was used to identify positive and negative Pearson correlations (r ≥ 0.95 and r ≤ -0.95) between interacting members of the network. Co-expressed genes and networks were visualized using Cytoscape in a circular force-directed layout. The key metabolic pathways of genes positively and negatively co-expressed with GhHDT3D.2 (PCoEGs and NCoEGs) were evaluated using the PageMan software, version 3.5.1 (Thimm et al., 2004). To determine their functional categories, the average statistical test Benjamini Hochberg test were employed. Gene ontology (GO) analysis for three categories (biological processes, molecular functions, and cellular components) was carried out using agriGO, v2.0 (http://systemsbiology.cau.edu.cn/agriGOv2/). Singular enrichment analysis (SEA), with statistical testing at a pvalue threshold of < 0.05, was used to retrieve GO annotations.

Identification of HD2 members in Gossypium species
Four cotton species, viz., G. arboreum, G. raimondii, G. hirsutum, and G. barbadense, were used to identify HD2 members. HD2 domain sequences were obtained from the NCBI and used to construct Hidden Markov model (HMM) profiles. The HMMER search program was used to identify HD2 orthologs by querying against these four Gossypium species. The HD2 gene was later analyzed for similarity and conserved domains by comparing data from the Pfam and InterproScan databases. A total of 27 HD2s were identified (Table 1): 4 GaHD2s (G. arboreum), 5 GrHD2s (G. raimondii), 9 GbHD2s (G. barbadense), and 9 GhHD2s (G. hirsutum). Number of amino acids, instability index, grand average hydropathy (GRAVY), theoretical isoelectric point (pI), molecular weight, and aliphatic index are important physiological parameters in determining the primary structure of proteins. The properties of the identified cotton HD2s, including protein length (aa), gene name, locus ID, isoelectric point (pI), molecular weight (Da), chromosome location, number of introns, sub-cellular localization, aliphatic index, instability index, and GRAVY, were analyzed using the ProtParam tool. Additionally, the subcellular localizations of predicted HD2 members were identified using the CELLO v2.5. There was wide variation in the length of cotton HD2 proteins, ranging from 201 to 299 amino acids in G. arboreum, 272 to 304 amino acids in G. raimondii, 240 to 304 amino acids in G. hirsutum, and 285 to 376 amino acids in G. barbadense. Isoelectric points ranged from 4.52 to 6.02, 4.19 to 5.06, 4.50 to 4.86, and 4.86 to 5.29 in G. arboreum, G. raimondii, G. barbadense, and G. hirsutum, respectively. Polypeptide molecular weight ranged from 21.588 kDa to 32.484 kDa in G. arboreum, 29.604 kDa to 32.838 kDa in G. raimondii, 26.563 kDa to 32.875 kDa in G. hirsutum, and 30.424 kDa to 40.694 kDa in G. barbadense. Interestingly, Gossypium HD2s contained a large number of introns; this count varied from 5 to 8, 7 to 9, 5 to 9, and 7 to 11 in G. arboreum, G. raimondii, G. hirsutum, and G. barbadense, respectively. Cotton HD2 members were found to be located in the nucleus (Table 1).
Proteins with an instability index < 40 are considered stable, whereas those with a value > 40 are considered unstable. The volume occupied by aliphatic side chains relative to the overall volume of the protein is termed the "aliphatic index". In G. arboreum, G. raimondii, G. hirsutum, and G.

Domain structure analysis of cotton HD2s
In order to perform multiple sequence alignment (MSA) of cotton HD2 protein sequences, ClustalX 2.1 was used. Comparison of cotton HD2s via the Pfam and InterPro protein domain databases revealed that they contained highly conserved domains relative to the structure of a typical HD2 protein ( Figure 1). These results indicated that GaHDT1.1, GaHDT1.2, GaHDT3.1, and GaHDT3.2 contained the HD2 label, a deacetyl/catalytic domain, phosphorylation sites, mono-and bipartite NLS motifs, and a zinc finger domain from N-terminus to C-terminus. Similarly, .1, and GbHDT3D.2 contained all the conserved domains. In contrast, GrHDT3.1, GrHDT4, GhHDT2A, and GbHDT2A did not contain a bipartite NLS motif or zinc finger domain, and GhHDT1D.1 did not contain a zinc finger domain at the C-terminus end (Figures 2,  3). A monopartite NLS motif was predicted to be present in all the cotton HD2 proteins, suggesting that this might generate a signal that drives cotton HD2s to the nucleus. Most HD2 proteins were between 232 and 384 amino acids in length. A highly correlated relationship exists between amino acid sequence variation and the length of the highly variable regulatory domain, which resides in the center (Bourque et al., 2016). These results suggest a high degree of homology between the amino acids in these conserved cotton HD2s and AtHD2s (Figure 3).

Multiple sequence alignment and evolutionary analysis
To evaluate the classification of HD2 genes in cotton, the sequence features of 44 plant species were analyzed, and an unrooted phylogenetic tree of the HD2 genes was constructed ( Figure 4). Subsequently, an investigation of the evolutionary relationships between HD2 proteins in cotton and those in other plant species was carried out. For this purpose, MSA was conducted on 27 cotton HD2 proteins and 198 HD2 proteins from other plant species (basal angiosperm, bryophytes, lower plants, Lycopodiophyta, monocots, and eudicots) ( Table 3). The maximum likelihood (ML) method was used for phylogenetic tree construction. The nomenclature of cotton HD2s was done as per A. thaliana HD2s.
Based on the phylogenetic relationships plant HD2 genes were clustered into ten major groups (I to X), and cotton HD2 proteins were classified into four of these groups, with none belonging to Group II, IV, VI, VIII, or IX. Groups I, II, III, VII, and VIII were further divided into two, two, two, three, and two subgroups respectively (labeled a, b, and c). Group III contained the largest number of cotton HD2 genes (13 members) followed by Group I with 11 and Groups V, VIII, and IX, with one HD2 member each (Figure 4). To understand the evolutionary relationships and gain insight into the structural diversity and similarity of Gossypium HD2s, a separate maximum likelihood tree was constructed with 1000 bootstrap values using the HD2 protein sequences of Gossypium species. The topology of the tree, HD2 duplicating nodes, conserved motifs, and exon/intron distribution enabled the categorization of the cotton HD2s into six sub-families (I -VI, shown in different background colors in Figure 5). Genes within the same sub-family with the highest levels of identity (>80%), indicate divergent evolution from a common ancestor, in the orthologous gene pairs ( Figure 5A).
To understand the structural variations in cotton HD2s, coding sequences and the corresponding genome sequences were compared to determine their exon/intron arrangements. Interestingly, the majority of cotton HD2 gene structures within the same clade were similar, i.e., group-specific exon/intron patterns were observed. The majority of cotton HD2 members had introns ranging from 5 to 1. Specifically, the number of introns varied from 5 to 8 (GaHD2s), 7 to 9 (GrHD2s), 5 to 9 (GhHD2s), or 7 to 11 (GbHD2s) ( Figure 5B). The gene structure of all cotton HD2s was analyzed and compared in order to examine the stability of the exon/intron members with respect to the structure of the phylogenetic tree. It was found that the majority of cotton HD2s falling under the same clade shared similar arrangements of exons/introns. For example, in subfamily I, the HD2 gene comprised 9 introns with 10 CDS (with the exceptions of GrHDT1.1 and GaHDT1.1), while members of the V group comprised 7 introns with 8 CDS (with the exceptions of GaHDT3.2, GhHDT3A.2, and GrHDT4, which consisted of 5, 8, and 7 introns with 0, 8, and 7 CDS and 6, 0, and 0 exons, respectively).
A motif-based sequence analysis tool (MEME) was utilized to examine the conserved motifs in HD2 proteins. The InterProScan was used to annotate these motifs. A total of 20 conserved motifs were determined in the Gossypium HD2 members. Motifs 1 and 4 were found in the database and annotated as nucleoplasmin-like domains. These have been reported to be important players in governing the dynamic architecture of the chromatin (Singh et al., 2022) (Figure 5C  and Supplementary Table 1). Motif 4 (HD2 pentapeptide, shown in violet) was approximately present in all the cotton HD2s and showed a conserved N-terminus pentapeptide sequence. Motif 2, annotated as a zinc finger domain, was found in all the cotton HD2 proteins except GhHDT1D.1, GbHDT2A, GhHDT2A, GrHDT3.1, and GrHDT4 ( Figure 5C). Additionally, motifs 3 and 5 were annotated as histone deacetylase-like domains and were found to be present in all cotton HD2 proteins except GaHDT1.2, GbHDT2A, GhHDT3A.1, and GhHDT3D.1 members. Most of the cotton HD2s exhibited a similar motif composition to their close evolutionary relatives; therefore, it is speculated that they might have similar functions ( Figure 5C). Results of multiple sequence alignment (MSA) of cotton HD2 proteins with Arabidopsis HD2s. MSA of 27 putative Gossypium HD2 proteins and four Arabidopsis HD2 proteins performed by Clustal X. Orange, blue, brown, violet, and pink-colored shading indicates the conserved sites (HD2 label, phosphorylation sites, bipartite NLS, monopartite NLS, and zinc finger, respectively) of a typical HD2 protein. Green, yellow, and blue letters with an asterisk represent amino acid conservation in all the domains. Schematic depiction of GaHD2s, GrHD2s, GhHD2s, and GbHD2s. The domain structures of GaHD2s, GrHD2s, GhHD2s, and GbHD2s, drawn using Illustrator for Biological Sequences.
Analysis of gene duplication events identified two pairs of paralogous HD2 members in G. arboreum ( Figure 7A), while no paralogous genes were detected in G. raimondii ( Figure 7B). Furthermore, four pairs of paralogous HD2 members were identified in G. hirsutum and G. barbadense (Figures 7C, D). As shown in Figure 7, all paralogous gene pairs were found to be located on distinct chromosomes, providing a considerably strong indication that Gossypium HD2 members were expanded through segmental duplication. In G. arborium, two segmental duplications (GaHDT1.  Phylogenetic relationships of cotton HD2s with other plant species and Arabidopsis thaliana. GbHDT1D.1, and GbHDT3A.2/GbHDT3D.1) were detected before 1.76, 1.44, 2.09, and 1.60 MYA (Table 4). The ratio of non-synonymous to synonymous substitutions (Ka/ Ks) provides an important measure of the pressure of evolutionary assortment on the relevant amino acids (Hurst, 2002). This ratio was measured for the duplicated Gossypium HD2 gene pairs (paralogous). The result of this calculation showed that all the duplicated HD2s had a Ka/Ks ratio < 1 (Table 4). Therefore, duplicated gene pairs in cotton HD2 members were under powerful purifying selection pressure.
A pair of homologous genes that have diverged in dissimilar species during a speciation event is known as an ortholog. The orthologous associations among the HD2 members in the four cotton species were determined. Specifically, orthologous genes among the HD2 members were identified via sequence similarity. Orthologs having sequence similarity equal to or greater than 90% in both cDNA and amino acid (aa) composition were selected for further evolutionary study. The selection pressure and probable functional divergence of Gossypium HD2s genes were examined through the calculation of Ka, Ks, and Ka/Ks ratio among the orthologs (A vs. D, A T vs. A, D T vs. D, A T vs. A T , and D T vs. D T ) and within the homeologs (A T vs. D T ). Notably, the Ka values of Gossypium HDT3 orthologs (group IV and V HD2s; GaHDT3.2/ GrHDT3.2, GhHDT3A.1/GhHDT3D.1, GbHDT3D.1/GrHDT3.2, GhHDT3A.2/GbHDT3A.2, and GhHDT3D.2/GbHDT3D.1) and HDT1 orthologs (GbHDT1A.1/GaHDT1.2) were higher than those of other HD2 gene pairs, suggesting that these ortholog pairs underwent more rapid evolution of the relevant genes. With a Ka/ Ks ratio <1, this analysis revealed that negative selection had been exerted on HD2 orthologous genes (Supplementary Table 2), and some orthologous gene pairs had experienced directional selection.

The Ka/Ks ratio was higher for HDT3 orthologous gene pairs, in A vs D, A T vs D T , A T vs A, D T vs D, A T vs A T , and D T vs D T .
There were three HDT3 orthologous gene pairs (GaHDT3.2/GrHDT3.2, GbHDT3A.1/GaHDT3.1, and GhHDT3D.2/GrHDT3.2) that had a Ka/Ks ratio > 1, meaning that they might have undergone adaptation to certain advantageous alleles that might play an important role in cotton. These outcomes suggest that the diploid cotton HDT3 member was under greater evolutionary pressure, and that the evolution of the A genome might have been faster than that of the D sub-genome (Figure 8 and Supplementary Table 2). Phylogenetic tree, conserved protein motifs, and gene structure analysis of HD2 members found in G. arboreum, G. raimondii, G. barbadense, and G. hirsutum. (A) Phylogenetic tree of G. arboreum, G. raimondii, G. hirsutum, and G. barbadense constructed via the ML method using 1000 bootstrap values. Different dot colors represent different Gossypium species (black: G. arborium; yellow: G. raimondii; red: G. hirsutum; sky blue: G. barbadense. Sub-families I, II, III, IV, V, and VI are shown in green, red, yellow, sky blue, and orange, respectively. (B) Representation of the gene structure of cotton HD2 members. Cyan boxes and black lines represent exons and introns, respectively. (C) Conserved protein motifs of HD2s, as determined using the MEME tool. Each motif is denoted by a specific color.

Cisregulatory elements in the promoter region of GhHD2s
Promoters are regions of DNA that drive the initiation of transcription of a certain gene; these promoters are situated near the transcription start site (TSS) of the corresponding gene. In the present study, we searched for cis-regulatory elements in the 1.5 kb upstream promoter region of the identified GhHD2s. The outcomes of this analysis are listed in Supplementary Tables 3A, B. The results revealed that cis-regulatory elements like ABRE were most abundant, followed by TGACG and CGTCA motifs. Additionally, TCA-element was abundant in the promoter regions of identified GhHD2s (Figure 9). These outcomes indicate that GhHD2 genes are associated with plant resistance to environmental stress conditions, including drought and salt stress.
It is interesting that the GhHD2 genes, namely GhHDT1A.1, GhHDT1A.2, GhHDT1D.1, GhHDT1D.2, GhHDT2A, GhHDT3A.2, GhHDT3D.1, and GhHDT3D.2, were found to comprise several hormone-responsive cis-elements, such as TCA-element (salicylic acid responsiveness), TGA-element (auxin-responsive element), ABRE (abscisic acid responsiveness), CGTCA-motif (MeJAresponsiveness), P-box (gibberellin-responsive), TGACG-motif (MeJA-responsiveness), and GARE-motif (gibberellin-responsive), in their promoter region (Figure 9). MeJA is reported to stimulate the production of defensive compounds that may be used against pathogens, drought stress, heavy metal stress, low temperatures, and salt stress (Yu et al., 2019), while auxin plays a crucial role in plant responses to adverse biotic and abiotic conditions (Sharma et al., 2015). In addition to pathogenesis-related resistance, drought and heat stress also trigger the production of salicylic acid (SA), and this also reduces the concentrations of Na+ and Clˉions caused by salt stress conditions (Yuan and Lin, 2008;Emamverdian et al., 2020). Plant growth hormones, such as gibberellins (GAs) are important in increasing resistance under abiotic stresses (Emamverdian et al., 2020), and plants respond to downstream stress by integrating various stress signals via the abscisic acid (ABA) hormone (Tuteja, 2007). Overall, these promoter analyses show that these genes play an important role in providing tolerance to drought and salt stress conditions.

HD2 expression profiles under salt and drought stress
In order to determine the expression pattern of HD2 genes under drought and salt stress, HD2 genes were analyzed via the leaf RNA-Seq data under both types of stress. Nine identified GhHD2 genes showed differential expression, with three genes (GhHDT1A.2, GhHDT3D.1, and GhHDT3D.2) exhibiting higher expression during drought ( Figure 10B) and salt stress ( Figure 10D) compared to a control. Furthermore, these nine putative genes (GhHDT2A, GhHDT3A.1, GhHDT3A.2, GhHDT1D.2, GhHDT1A.1, GhHDT1A.2, GhHDT1D.1, GhHDT3D.1, and GhHDT3D.2) were validated through qRT-PCR. The list of primers is provided in Paralogous duplicated HD2 gene pairs in Gossypium genomes (G. arboreum, (G) raimondii, (G) barbadense, and G. hirsutum). Depiction of duplicated genes in (A) G. arborium, (B) G. raimondii, (C) G. hirsutum, and (D) G. barbadense. Duplicated HD2 genes are indicated by dark blue lines; collinear blocks are indicated by grey lines. Table 5. In dought, the majority of GhHD2s exhibited stronger responses at 12h ( Figure 10A), with the exception of GhHDT2A, while the strongest response was that of GhHDT3D.2 with respect to fold change (FC) compared with the control at 12 h (FC >29), 24h (FC >33), 48h (>16), and 72h (>15) ( Figure 10A). Under salt stress conditions, the entire set of GhHD2 members exhibited higher levels of expression at 24h compared with the control (Figure 10C), while GhHDT3D.2 exhibited the highest level of expression in comparison to the control at 12 h (FC >4), 24h (FC >9), 48h (>14), and 72h (> 6). Overall, this analysis shows that these HD2 members might play a crucial role in regulating responses to drought and salt stress conditions. Co-expression network, pathways, and gene ontology analysis of GhHDT3D.2, a gene strongly expressed under drought and salt stress conditions at different time scales The GhHDT3D.2 gene was selected for analysis of the coexpression network, pathways, and gene ontology due to its higher levels of expression under both types of stress. Co-expressed genes with GhHDT3D.2 were explored using log2FPKM expression values. A total of 15 genes positively and 14 negatively co-expressed with GhHDT3D.2 (PCoEGs and NCoEGs) were identified under drought stress conditions (Supplementary Figures 1A, B; Supplementary  Table 4).
In order to understand the functional pathways and molecular importance of the PCoEGs and NCoEGs of GhHDT3D.2, a PageMan pathway analysis was performed. For drought stress conditions, this analysis demonstrated that there was higher expression of fatty acid metabolism and auxin response factors (ARFs) in relation to NCoEGs of GhHDT3D.2 and vacuolar-sorting protein (SNF7) in relation to PCoEGs of GhHDT3D.2 (Supplementary Figure 1E). Moreover, for salt stress conditions, diacylglycerol kinase (Supplementary Figure 1F) exhibited higher expression at all time intervals, while the highest level of expression among the PCoEGs of GhHDT3D.2 at 12h was that of brassinosteroid hormone metabolism; among the NCoEGs of GhHDT3D.2, the highest level of expression was that of phenypropanoid and flavonoids (Supplementary Figure 1G). Transcription factors such as basic leucine zipper (bZIP), WRKY, plant homeo-domain (PHD), and heat shock factors (HSF) exhibited expression with PCoEGs of GhHDT3D.2, and abiotic stressresponsive genes exhibited expression with NCoEGs of GhHDT3D.2 under salt stress conditions (Supplementary Figure 1H).
A gene ontology (GO) analysis was carried out to examine all the identified co-expressed genes and identify their functional roles. GO divided into three categories: biological processes (BPs), cellular components (CCs), and molecular functions (MFs). The significant PCoEGs and NCoEGs of GhHDT3D.2 were selected for gene ontology Syntenic associations in orthologous HD2 gene pairs in four Gossypium species (green: G. arborium; blue: G. raimondii; pink: G. hirsutum; violet: G. barbadense). Bano et al. 10.3389/fpls.2023.1109031 Frontiers in Plant Science frontiersin.org analysis. Under drought stress, enriched MFs were hydrolase activity, carbohydrate derivation binding, and purine nucleotide binding; enriched BPs were establishment and localization; and enriched CCs were the membrane, endomembrane system, and cytoplasmic part (Supplementary Figure 1I and Supplementary Table 5). Likewise, under salt stress conditions, enriched BPs were response to water deprivation, abiotic stimulus, salt stress, ethylene, the auxin-activated signaling pathway, and gibberellins (Supplementary Figure 1J and Supplementary Table 5); enriched MFs were transcription factor activity, transporter activity, calmodulin binding, phosphor-transferase activity, and hydrolase activity; and enriched CCs were the vacuolar membrane, golgi apparatus, plasma membrane, cytoskeletal part, and cell part (Supplementary Figure 1J and Supplementary Table 5).

Discussion
Epigenetics is primarily concerned with changes in heritable gene expression without the involvement of variations in DNA sequences. Such changes are essential to the development and growth of plants in response to several environmental stimuli, and modification of histone is closely associated with the control of gene expression. HDACs are also called lysine deacetylases; they regulate gene expression, with core histones (H2A, H2B, H3, and H4) and acetyl groups being removed by HDACs to repress their transcription (Makarevitch et al., 2015). Earlier studies have demonstrated that HDACs are crucial in enabling plants to respond to a variety of environmental stresses (Liu et al., 2014), and for growth, development (Ma et al., 2013), and genome stability . There are three members of the set of HDACs, identified as RPD3/HDA1, SIR2, and HD2, in which HD2 proteins have been found to differ from those of the other two types of histone deacetylases.
In the present study, four, nine, nine, and five HD2 members were identified in the four sequenced species of Gossypium (G. arboreum, G. hirsutum, G. barbadense, and G. raimondii, respectively) ( Table 1). The larger numbers of HD2 members in G. hirsutum and G. barbadense might be due to their larger genomes (~2.30 Gb and~2.22 Gb, respectively; Hu et al., 2019) as compared to G. arborum (1,746 Mb) and G. raimondii (885 Mb) (Li et al., 2014). All cotton HD2 members are located in the nucleus, and many earlier reports have also revealed that HD2 members are nucleus-localized (Zhou et al., 2004;Sridha and Wu, 2006), while only a few members are present in the nucleolus (Lusser et al., 1997;Earley et al., 2006).
As per the domain analysis (Figure 1), all conserved cotton HD2 members contained the HD2 label, a deacetyl/catalytic domain, phosphorylation sites, mono-and bipartite NLS motifs, and a zinc finger domain from N-terminus to C-terminus. In contrast, GrHDT3.1, 4, GhHDT2A, and GbHDT2A did not contain a bipartite NLS motif or zinc finger domain, and GhHDT1D.1 did not contain a zinc finger at the C-terminus end (Figure 2). It has been previously reported that Arabidopsis thaliana and Medicago truncatula are also devoid of zinc finger at their C-terminal end (Bourque et al., 2016). On the basis of the presence or absence of Zn 2+ finger at the C-terminus, the HD2 family can be divided into two groups: the group with a conserved Zn 2+ -finger domain at the Cterminal end of HD2 is referred to as the Gr1 group, and the group of Results of investigation of the cis-regulatory elements of GhHD2 members. Micro-parts shown in distinct colors represent the sequence of the putative elements.
HD2s that lack a conserved Zn 2+ -finger domain at the C-terminal end is the Gr2 group (Bourque et al., 2016). This result indicates that Gr2 HD2 genes might contribute to the expansion of cotton HD2s. Overall, cotton HD2s have a similar domain organization to that reported previously (Demetriou et al., 2009). HD2s are a part of a small gene family (Grandperret et al., 2014) whose members number between one (in longan and potato) and four (in Arabidopsis and maize). The N-terminal domain of the protein sequence includes conserved octapeptide MEFWGVEV; previously, the MEFWG sequence has been reported to be an explicit structural feature of the HD2 gene family (Aravind and Koonin, 1998), and these starting five residues might play a significant role in gene regulation. The catalytic domain of the HD2 gene family is made up of 100 amino acid residues and contains the conserved regions (Bourque et al., 2016). These residues comprise two highly conserved acidic amino acids that have histidine residue at position 25, surrounded by the hydrophobic amino acids, and aspartic acid residue at position 69. The catalytic activity of HD2s depends on these two residues, which represent conserved motifs (Aravind and Koonin, 1998;Dangl et al., 2001;Ding et al., 2012). Residues such as leucine, present at position 26, are also highly conserved and play an important role in catalytic machinery or ligand binding (Bourque et al., 2016) (Figure 3). A central large acidic domain is highly variable in length as well as in sequence, also known as the regulatory domain The qRT-PCR expression profiles of putative HD2 genes in G. hirsutum. Bars represent expression levels at 12, 24, 48, and 72 h of (A) drought (20% PEG solutions PEG8000) and (C) salt (300 MM) stress treatment, as compared to control levels at 0 h. Ubiquitin was taken as a normalizing control. Three replicates were used for each experiment. Two-tailed Student's t-tests were used for statistical analysis; data are plotted in the form mean ± s.d, with error bars representing standard deviations. Significant differences are denoted with asterisks: *P < 0 0.1; **P < 0 0.01; ***P < 0 0.001. Right panels show expression profiles of HD2 genes under (B) drought and (D) salt conditions according to RNA-Seq data, using log2FPKM values.
of HD2s. Aspartic acid residue detected in the entire HD2 domain comprises Ser and/or Thr residues as phosphorylation sites for mitogen-activated protein kinases (MAPKs) and casein kinase 2a (CK2a) (Grandperret et al., 2014;Bourque et al., 2016) (Figure 3). CK2a is located in the nucleus and regulates many biological processes through phosphorylation of numerous distinct proteins. All HD2s comprise a conserved and well-characterized monopartite NLS domain along with the bipartite NLS sequence KK(K/R) that is found in ten to twelve residues before the monopartite NLS motif (Bourque et al., 2016).
The putative N-terminal HD2 label that are MEFWG sequence (well-conserved motif) amino acid regions require to control the gene expression. To see the domain conservation, cotton HD2 members were aligned with A. thaliana HD2s (Figure 3 and Supplementary Table 1). The outcomes of the motif analyses suggested that the amino acids at these conserved regions of cotton HD2s exhibit a very high degree of homology with AtHD2s. All the cotton HD2s have a conserved pentapeptide HD2 label at the N-terminus region and conserved deacetylase sites, as these are present in all AtHD2s. Subsequently, the zinc finger properties of cotton HD2s were examined; GbHD2A, GhHD2A, GrHD3.1, and GrHD4 were found to lack zinc fingers at their C-terminus region, indicating that these are more similar to HDT2 and HDT4 of Arabidopsis, each of which is devoid of zinc finger (Figure 3).
The phylogenetic tree indicated the arrangement of the cotton HD2 members. The set of HD2 genes in cotton seems to contain numerous members that have orthologs as distant as the common ancestor of Arabidopsis, T. cacao, and in some cases other plant species (Bourque et al., 2016). Previous reports have indicated that Gr2 HD2s evolved only in angiosperms. The absence of zinc fingers only in angiosperm indicates that Gr1 HD2s represent the ancestral form of the gene (Bourque et al., 2016). Group IIIa and X members, which contain mostly Gr2 HD2s, were thus the latest group to evolve. HD2s from lycophytes were found to cluster into Groups Ia and VIIIb; bryophytes clustered into Group VIIc; and lower plants fell into Groups IV, VIIIb, and X; none of the cotton HD2 members belonged to Group IV, VIIIb, or VIIc. Basal angiosperms clustered into Group Ia, while eleven cotton HD2 genes belonged to this group (Figure 4). Furthermore, only higher plants were represented in Group V, VI, and X and in five subgroups (Ia, IIa, IIb, IIIa, IIIb, VIIa, VIIb,VIIIa) of Groups I, II, III, VII, and VIII, while HD2s were found to exist in IIa, IIb, V, VIIb, and VIIIa, which were entirely monocot-specific (Figure 4). With the exception of Group VIIc, HD2 members were present in both monocot and eudicot plants.
The evolutionary paths of all these groups' evolutionary paths diverged before monocots and dicots, with a common ancestor.
The motif arrangements and gene patterns of cotton HD2 members were conserved in most of the groups, with some exceptions (Figure 5), suggesting their functional conservation. GaHDT3.2 comprised a smaller number of introns (5), while GbHDT2A contained the largest number (11) across all cotton HD2 genes. As per previous reports, more advanced species contained smaller numbers of introns in their genomes (Roy and Gilbert, 2005), while an increased number of introns results in new functions (Qanmber et al., 2019).

Drought
Forward primer (5' to 3') Reverse primer (5' to 3') The duplication of genes plays a crucial role in the functional divergence of genes (Takata and Taniguchi, 2015). Potential gene duplication events in the cotton HD2 gene family were evaluated to assess the probable relationships among members of the HD2 group (Table 4). The results with regard to gene duplication events among Gossypium species suggested the remarkable hypothesis that recent duplication has taken place in cotton HD2s after the divergence of G. riamondii and G. arboreum and led to the formation of G. barbadense and G. hirsutum (Li et al., 2014). In the present study, segmental duplication (as opposed to tandem duplication) was found to play a dominant role in the enlargement of Gossypium HD2 members (Figure 7), as these gene pairs were present in different chromosomes. Positive (directional or Darwinian) selection encourages the extent of advantageous alleles, and negative or purifying selection hampers the proliferation of deleterious alleles (Altenhoff and Dessimoz, 2009). In this study, a Ka/Ks ratio < 1 was observed for all paralogous HD2 gene pairs in cotton, implying strong purifying selection pressure, meaning that this contributes to maintaining their function in the Gossypium HD2 gene. Exploration of these duplicated genes showed that the functional role of HD2 genes in cotton has not greatly diverged in the course of subsequent evolution. However, some of the orthologous gene pairs in Gossypium exhibited signs of directional selection (Figure 8 and  Supplementary Table 2), which plays a crucial role in the expansion of beneficial alleles that might be playing important role in cotton. These results support further functional analysis of this gene family.
Analysis of expression profiles in G. hirsutum via qRT-PCR revealed higher levels of expression of GhHDT3D.2 at all time intervals under drought stress conditions ( Figure 10A). Under salt stress, higher levels of expression of GhHDT1A.1, GhHDT2A, GhHDT3A.1, GhHDT3A.2, GhHDT3D.1, and GhHDT3D.2 were observed at all time intervals, with the exception of GhHDT1A.2, GhHDT1D.1, and GhHDT1D.2 ( Figure 10B). GhHDT3D.2 was the common gene that exhibited significantly higher levels of expression under both types of stress. These outcomes of this analysis suggest that the GhHDT3D.2 gene might play a crucial role in providing resistance against drought and salt stress, and would probably be a suitable target to protect the cotton plant from both these abiotic stresses.
To further elucidate the function of the GhHDT3D.2 gene, the coexpression network of this gene was studied (Supplementary Table 4). This analysis showed the presence of SNF7 transcription factor (TF) in PCoEGs of GhHDT3D.2, whose transcript accumulates during drought stress in Populus davidiana (Mun et al., 2017). PCoEGs of GhHDT3D.2 under drought conditions also comprised zinc finger-RING-types, a tify domain, and a GTP binding domain. Zinc-finger proteins play a crucial role in responses to abiotic stimuli, such as drought, extreme temperatures, reactive oxygen species, salt, and toxic metals, in plants. They mostly operate as E3 ubiquitin ligases and contain a conserved RING domain (Han et al., 2022); tify and GTP binding domain functions have also been identified in drought stress responses in cotton and Chinese cabbage Chen et al., 2021) (Supplementary  Figure 1A and Supplementary Table 4). NCoEGs of GhHDT3D.2 contained a pentatricopeptide repeat, which plays an important role in drought, salt, and cold stress responses in Arabidopsis (Jiang et al., 2015) (Supplementary Figure 1B and Supplementary Table 4). Finally, SNF7 transcription factor and auxin response factor (ARF) were identified in the PageMan pathways of PCoEGs of GhHDT3D.2 (Supplementary Figure 1E); these play a crucial role in providing tolerance to drought stress conditions. Moreover, PCoEGs of GhHDT3D.2 under salt stress comprised WD40 repeats, a WRKY domain, glycoside hydrolase, a thioredoxin domain, and armadillo-like and leucine-rich repeats (Supplementary Figure 1C). O f these PCoEGs, WD40 protein is regulated via salt stress in rice, and suggests a crucial role in providing tolerance to salt stress (Huang et al., 2008). One of the largest TF families, the WRKY family is known to participate in a number of abiotic stress responses (Xu et al., 2018). The expression of thioredoxin improves salt tolerance in Brassica napus (Ji et al., 2020); additionally, in Solanum lycopersicum, glycoside hydrolase functions as a putative biomarker for salt stress tolerance (Reyes-Perez et al., 2019). PageMan pathways analysis of the PCoEGs of GhHDT3D.2 showed higher levels of expression of phospholipid synthesis, which plays a regulatory role in the salt stress response (Han and Yang, 2021) (Supplementary Figure 1F), while metabolism of brassinosteroid hormones reduces the negative consequences of salt stress  (Supplementary Figure 1G). Furthermore, the NCoEGs of GhHDT3D.2 under salt stress contained GRAS TF, lateral organ boundaries, and tubby-like protein. In transgenic Arabidopsis, over-expression of GRAS improves salt stress tolerance and plant growth . The expression of lateral organ boundaries (LOB) has been found to be upregulated in Vitis vinifera under salt stress via treatment with ABA (Grimplet et al., 2017), while tubby-like protein transcription factor (TLP TF) has been reported to play an important role in increasing tolerance to salt stress (Bano et al., 2021) (Supplementary Figure 1D). PageMan pathways analysis of the NCoEGs of GhHDT3D.2 revealed the presence of phenylpropanoid, resulting in elevated salt stress tolerance (Sharma et al., 2019). Additionally, the flavonoid pathway secondary metabolism enhances salt stress tolerance by scavenging free radicals (Jan et al., 2021) (Supplementary Figure 1G). NCoEGs of GhHDT3D.2 also exhibited higher expression during abiotic drought and salt stress conditions, as did GATA transcription factor (Supplementary Figure 1H), which exhibits higher levels of expression in rice under salt stress .
In terms of the gene ontology functions of significant PCoEGs and NCoEGs of GhHDT3D.2 under drought conditions, enriched molecular functions were hydrolase activity and carbohydrate derivation binding (Supplementary Figure 1I). In response to drought stress, hydrolase acts as a negative regulator in peach (He et al., 2022). By participating in NADPH oxidase-mediated ROS generation, carbohydrate-binding protein mediates drought-stress tolerance in rice (Jing et al., 2022). The biological processes involved in salt stress are enhanced by responses to water deprivation, abiotic stimulation, salt stress, ethylene, auxinactivated signaling, and gibberellin pathways (Supplementary Figure 1J), in which ethylene modulates salinity stress responses primarily by maintaining Na+/K+ homeostasis, reactive oxygen species (ROS), and nutrients, through stimulation of antioxidant defense (Riyazuddin et al., 2020). Additionally, enriched molecular functions were transporter activity, calmodulin binding, and phosphotransferase activity (Supplementary Figure 1J); in this regard, it has been reported that ion transporters are crucial for salt stress tolerance and help in the breeding of crop cultivars with high salt tolerance . Additionally, Ca 2+ sensor calmodulin1 (CaM1) negatively regulates salt stress tolerance through interaction with calmodulin binding transcription activator 4 (CAMTA4) in Hordeum vulgare (Shen et al., 2020). Overall, our research demonstrates the significance of the PCoEGs and NCoEGs of GhHDT3D.2 under salt and drought conditions. Thus, the GhHDT3D.2 gene can be used to improve the performance of cotton under drought and salt stress conditions.

Conclusion
The cotton HD2 gene family was comprehensively analyzed in the present study. A total of 27 cotton HD2 proteins with a conserved HD2 label, deacetyl/catalytic domain, phosphorylation sites, mono-and bipartite NLS motifs, and a zinc finger domain from N-terminus to Cterminus were identified in four cotton genomes (G. arboreum, G. raimondii, G. barbadense, and G. hirsutum). These proteins showed noticeable similarities in terms of their common conserved motifs, protein domains, and gene structures, with related functions. A total of ten pairs of duplicated genes were identified; these occurred in paralogous gene pairs in four cotton species, and all had been underwent purifying selection pressure. The expression profiles of GhHD2s showed significantly higher levels of expression of GhHDT3D.2 underwent drought and salt stress conditions at all time intervals (0, 12, 24, 48, and 72h). Finally, the PCoEGs and NCoEGs of GhHDT3D.2 revealed the important functions and pathways that play a crucial role in both of these stress conditions. The present study strengthens the field's understanding of HD2 genes in cotton at the levels of structure, functions, pathways, evolution, and expression. This study offers an improved understanding of the biological involvement of cotton HD2 members, and this will provide benefits in the form of improved cotton production in the presence of drought and salt stress conditions.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author contributions
NB carried out the bioinformatics analysis, designed the study, and drafted the manuscript. SF and RL performed qRT-PCR validation. CM and SB participated in supervision of the study. All authors contributed to the article and approved the submitted version.

Funding
This work was supported by a grant (OLP-112) from Council of Scientific and Industrial Research (CSIR).