Genome-wide identification of bHLH transcription factors and their response to salt stress in Cyclocarya paliurus

As a highly valued and multiple function tree species, the leaves of Cyclocarya paliurus are enriched in diverse bioactive substances with healthy function. To meet the requirement for its leaf production and medical use, the land with salt stress would be a potential resource for developing C. paliurus plantations due to the limitation of land resources in China. The basic helix-loop-helix (bHLH) transcription factor protein family, the second largest protein family in plants, has been found to play essential roles in the response to multiple abiotic stresses, especially salt stress. However, the bHLH gene family in C.paliurus has not been investigated. In this study, 159 CpbHLH genes were successfully identified from the whole-genome sequence data, and were classified into 26 subfamilies. Meanwhile, the 159 members were also analyzed from the aspects of protein sequences alignment, evolution, motif prediction, promoter cis-acting elements analysis and DNA binding ability. Based on transcriptome profiling under a hydroponic experiment with four salt concentrations (0%, 0.15%, 0.3%, and 0.45% NaCl), 9 significantly up- or down-regulated genes were screened, while 3 genes associated with salt response were selected in term of the GO annotation results. Totally 12 candidate genes were selected in response to salt stress. Moreover, based on expression analysis of the 12 candidate genes sampled from a pot experiment with three salt concentrations (0%, 0.2% and 0.4% NaCl), CpbHLH36/68/146 were further verified to be involved in the regulation of salt tolerance genes, which is also confirmed by protein interaction network analysis. This study was the first analysis of the transcription factor family at the genome-wide level of C. paliurus, and our findings would not only provide insight into the function of the CpbHLH gene family members involved in salt stress but also drive progress in genetic improvement for the salt tolerance of C. paliurus.


Introduction
Plants are constantly challenged by the environmental stresses, and it is estimated that up to 70% of plants can be affected by diverse abiotic stresses from which they cannot escape (Mantri et al., 2012a;Chen et al., 2021). Salinity stress, which affects 8.31 billion hm 2 of land, is one of the major abiotic stresses to impair plant growth (Li et al., 2020a;Zhang et al., 2020a;Liao et al., 2022). In order to maintain normal growth and survival, plants turned on or suppressed many genes by transcription factors (TFs) to regulate physiological and biochemical processes in response to changes in the external environment (Agarwal et al., 2006;Bhatnagar-Mathur et al., 2008). It has been noted that six major families of transcription factors (TFs) have vital regulatory functions in plant resistance to various abiotic stresses, including MYBs, basic helixloop-helix (bHLHs), ethylene responsive element binding factor (ERFs), dehydration responsive element-binding (DREBs), WRKYs and basic region/leucine zipper motif members (bZIPs) (Kavas et al., 2016;Mao et al., 2017;. As reported, the bHLH TFs are widespread in all eukaryotes and own the second largest number of TF families in plants (Pires and Dolan, 2010;Feller et al., 2011), while the bHLH members possess highly conserved bHLH domain constituted by two functionally diverse regions with approximately 60 amino acids (Toledo-Ortiz et al., 2003). The basic region, containing approximately 10-17 amino acids and a binding site to bind the specific E-box (CANNTG) DNA sequence, is located at the N-terminus. Inversely, at the C-terminus, the helix-loop-helix (HLH) region, consisting of roughly 40 amino acids and acting as a dimerization domain, is responsible for facilitating the dimerization between proteins (Atchley et al., 1999). On account of diverse binding elements, bHLH transcription factors in animals were organized into six groups (Wang et al., 2018b), whereas the classification of plant bHLH proteins has not been determined though 15-32 groups were suggested according to current studies (Pires and Dolan, 2010).
Over the years, many plant bHLH proteins have been identified and characterized. For example, there are 162 bHLH genes in Arabidopsis thaliana (Toledo-Ortiz et al., 2003), 188 in apple (Malus × domestica) (Mao et al., 2017), and 113 in strawberry (Fragaria × ananassa) (Zhao et al., 2018), 115 in spine grapes (Vitis davidii)  and 206 in sweet osmanthus (Osmanthus fragrans) . Furthermore, some studies on the role of bHLH proteins revealed that the plant bHLH family participated in numerous processes including anthocyanin biosynthesis (Hou et al., 2017;Lim et al., 2017), growth and development (Sorensen et al., 2003;Carretero-Paulet et al., 2010) and response to stress (Babitha et al., 2013;Ji et al., 2016;Sum et al., 2021). Among the functions, regulating the stress tolerance by binding to the promoters of downstream genes has been well characterized in bHLH proteins (Cui et al., 2016). For instance, MdbHLH104 was recognized to response to iron deficiency stress in apple by immediately binding to the P3 cis-acting element of the MdAHA8 promoter , while ICE1 (AtbHLH116), could increase the cold tolerance in A. thaliana by activating expression the cold-responsive (COR) genes (Chinnusamy et al., 2003). Besides, gene AtbHLH92 has been shown to have function in responses to osmotic stresses of plants (Jiang et al., 2009) and AtbHLH17 (AtAIB), a nuclear-localized bHLH-type protein, could confer the drought tolerance of transgenic plants via regulating of ABA signaling (Li et al., 2007). More interesting is some bHLH TFs could play essential role in the regulation of multiple abiotic stresses signaling simultaneously. For instance, SlICE1a (a tomato bHLH transcription factor) could enhance the resistance of cold, osmotic and salt stresses (Feng et al., 2013), while overexpressed TabHLH39 in A. thaliana could increase freezing, salt, and drought tolerance (Zhai et al., 2016). It was also reported that ATNIG1 regulates downstream gene expression by specifically binding to E-box motifs (CANNTG) of salt stress-related gene promoters, thereby enhanced plant tolerance to salt stress (Kim and Kim, 2006).
Wheel wingnut (Cyclocarya paliurus), a multiple-fuction tree species, belongs to Juglandaceae family (Fang, 2022). Although now naturally distributed in sub-tropical mountain areas of China, Cyclocarya has a long fossil record of fruits in North America, Europe and eastern Asia, while went extinct in North America and Europe during the Cenozoic (Manchester et al., 2009;Wu et al., 2017). The leaves of C. paliurus has been used as tea, traditional food and medicine for thousands of years in China (Fang et al., 2006), and the leaves have been listed as new food raw material by National Health and Family Planning Commission of China since 2013 (Qin et al., 2021). Many studies have demonstrated that the extractives from C. paliurus leaves possess antioxidant activities, antiproliferative activities and antidiabetic activities (Kurihara et al., 2003;Yao et al., 2015;Zhai et al., 2018;Zhou et al., 2021), and some products derived from the leaves have been developed and put into the market. However, at present, the resources of C. paliurus are mainly distributed in natural forests whereas its plantations can only be established at the sites where the soil is relatively deep and loose, well-drained and moist fertile (Fang et al., 2011;Fang, 2022), resulting in that the amount of its leaves cannot meet the market demand (Qin et al., 2021). Therefore, a feasible option is to develop C. paliurus plantation with oriented cultivation on potential land resources such as coastal saline areas due to the limitation of land resources in China in order to meet the requirement for its leaf production and medical use. Our previous studies found that R2R3-MYB transcription factor family affected salt tolerance of C. paliurus (Zhang et al., 2022b), while some bHLH proteins could regulate the accumulation of flavonoid compounds under salt stress by promoting the expression of genes encoding related enzymes in C. paliurus , which provide some evidences for the crucial role of TFs in plant resistance to salt stress. However, so far, no TF family has been systematically identified in the whole genome of C. paliurus. The recent release of high quality wholegenome sequence data of C. paliurus gives us the opportunity to investigate the bHLH gene family and to identify salt-responsive members. In this study, 159 bHLH transcription factors in C. paliurus were analyzed comprehensively and systematically, and some key bHLH genes associated with salt tolerance were identified. Results from this study would not only provide insight into the function of the CpbHLH gene family members involved in salt stress, but also drive progress in genetic improvement for the salt tolerance of C. paliurus to develop C. paliurus plantation in the coastal saline areas of south-east China.
2 Materials and methods 2.1 Identification and sequence analysis of CpbHLH genes The whole genome data of C. paliurus were available from the Genome Sequence Archive (GSA) database (https://ngdc.cncb.ac.cn/ gsa) provided by our research group. The Hidden Markov Model (HMM) profile of the HLH domain (PF00010) was obtained from the Pfam database (version 30.0) (Finn et al., 2014), and was used as a query to search for all protein sequences with default E-values in the whole genome and to identify genes with specific conserved domains by HMMER software (version 3.3; http://hmmer.org/) (Johnson et al., 2010). All screened sequences were aligned and checked with the online tools Batch CD-search (https://www.ncbi.nlm.nih.gov/ Structure/bwrpsb/bwrpsb.cgi) (Marchler-Bauer and Bryant, 2004), Pfam, and SMART (http://smart.embl-heidelberg.de) (Letunic et al., 2021) to verify the existence of the conserved bHLH domain. The ExPASy software (https://web.expasy.org/protparam/) was used to obtain basic physical and chemical characteristics of these bHLH genes respectively.

Phylogenetic analysis, multiple alignment analysis and chromosomal locations
The A. thaliana MYB sequences data were download from PlantTFDB database (http://planttfdb.cbi.pku.edu.cn/index). The construction of a phylogenetic tree consisted of proteins from A. thaliana and C. paliurus was performed with MEGA X (version 6.0) (Kumar et al., 2018) software using the neighbor-joining (NJ) method with 1000 bootstrap replicates. Multiple sequence alignment (MSA) of C. paliurus and A. thaliana bHLH proteins was performed using ClustalX 2.11 software (Thompson et al., 1997), and Weblogo3 (http:// weblogo.threeplusone.com/create.cgi), while Jalview software (http:// www.jalview.org/) was used to visualize and analyze the sequences of conserved domains in CpbHLH proteins. The GFF3 (Generic Feature Format Version 3) file, containing the positional and gene structure information of genes on the chromosomes, was obtained from whole genome data of C. paliurus. The TBtools software (version 1.098774) (Chen et al., 2020) was adopted to map the CpbHLH genes onto specific chromosomes.

Gene structure, conserved motif, and promoter analysis
The exon/intron structures of CpbHLH genes were visualized by TBtools software (version 1.098774) (Chen et al., 2018), whereas fifteen conserved motifs were obtained using the online software MEME (http://MEME-suite.org/) (upper limit of the recognition motif was 20, minimum motif width was 6, and maximum motif width was 50, zoops) (Bailey et al., 2009). The online tool PLACE (Higo et al., 1999) was used to analyze the cis-acting elements of CpbHLH genes.

RNA-seq data analysis, GO annotation and prediction of the protein interaction network
Raw data were obtained via RNA sequencing of leaves treated with different salt concentration in hydroponic experiment . CpbHLHs with reads per kilobase of transcript per million mapped reads or fragments per kilobase of transcript per million mapped reads (RPKM and FPKM, respectively) > 1 were collected for further analyses of all of the transcriptome data. TBtools was performed to generate the heatmap (Chen et al., 2020). Gene ontology (GO) analysis was carried out by the Blast2GO program (Conesa et al., 2005), with selecting the NCBI database as the reference database. The results were divided into three categories, namely molecular function, biological process, and c e l l u l a r c o m p o n e n t . T h e N C B I d a t a b a s e ( h t t p s : / / www.ncbi.nlm.nih.gov/) were used to search the functions of AtbHLHs, which were predicted to be orthologous genes of CpbHLHs. STRING (https://string-db.org/) (Szklarczyk et al., 2019) was performed to predict the functional interaction network of candidate genes with option value>0.7.
Hydroponic experiment: After three months, uniform size seedlings (height: 40 ± 2.79 cm) were selected and transplanted to polypropylene containers (50L) with 1/2-strength Hoagland's nutrient solution (pH 6.0 ± 0.2). Two weeks after hydroponic transplanting, four salt concentration (0%, 0.15%, 0.3%, and 0.45% NaCl) regimes were implemented in completely randomized design with three biological replicates for each treatment. The detailed information has been described in our previous study .
Pot experiment: After one-year growth in the nonwoven containers, the seedlings were transplanted into the big nonwoven containers (25 cm height, 20 cm diameter) and cut into 3-5 cm height in early spring in 2020. In February 2022, saplings with similar size were selected and all their stems were cut to 120 cm height, whereas in early April 2022, the selected saplings were transplanted from the nonwoven containers into plastic pots (26 cm height, 26 cm top diameter and 20 cm bottom diameter) containing peat: substrates of perlite: rotten bird dung: soil =5: 2:2:1 (v/v/v/v). The plastic pots were placed in plastic trays to prevent NaCl leaching. The substrate was a loam with pH 6.4, and the contents of total N, total P, and total K in the soil were 79.7, 66.5, 2.40, and 9.7 g kg −1 , respectively.
Salt treatments were conducted in early May 2022, and a completely randomized design was adopted with three replications per treatment and six plants per replication. Based on previous research (Zhang et al., 2022b), three levels of NaCl concentration were set up: CK (control, distilled water), T1 (0.2% NaCl) and T2 (0.4% NaCl). 1L solution were gradually add to the soil every three days , and electrical conductivity in the substrate was also monitored to keep the soil salt concentration relatively stable. Six complete and mature leaves were respectively collected from the upper, middle and lower positions of each sampled tree at the 45 days after the treatments (obvious differences were observed) ( Figure 1) and were immediately frozen in liquid nitrogen and stored at −80°C until needed for further analysis.

RNA extraction and real-time quantitative RT-PCR analysis
Plant materials were ground under RNase-free conditions. Trizol reagent kit (Invitrogen, Carlsbad, CA, USA) was used to extract RNA from 9 samples of the 3 treatments (CK, 0.2% NaCl and 0.4% NaCl); subsequently, MonScript RTIII All-in-One Mix with dsDNase kits (Monad, Nanjing, China) was used to acquire cDNA, following the manufacturer's instructions. The qRTPCRs were performed on BiosystemsTM 7500 Real-Time PCR Systems (Monad, China). Primer Premier 6.0 (Premier Biosoft International, Palo Alto CA, USA) was used to design qRT-PCR primers for 12 genes (Supplementary Table S1). SYBR Premix Ex Taq kit (Takara Biotechnology, Dalian, China) was applied to conduct qRT-PCR analysis. The cDNA diluted 20 times and an 18sRNA gene (Chen et al., 2019) were selected as the template and the internal standard, respectively. PCR reaction conditions were as 95°C for 3 min; denaturation 5 s at 95°C; 60°C for 30 s; 40 cycles. Three technical and three biological replicates were used for each sample. After reaction, the relative expression levels of target gene and internal reference gene were calculated with the 2 −DDCT method (Penfield, 2001).

Statistical analysis
One-way analysis of variance (ANOVA) was conducted to identify significant differences in the related gene expression among the treatments, followed by Duncan's test for multiple comparisons. All statistical analyses were performed using IBM SPSS Statistics Version 22 software package (SPSS Inc., IBM Company Headquarters, Chicago, IL, USA). Data were presented as means ± standard deviation (SD).

Identification and sequence analysis of CpbHLH genes
Based on the Genome-wide data of C. paliurus, a total of 174 supposed CpbHLH proteins were discovered by using the HMMER software with default parameters. Subsequently, SMART and CD-

Conserved residues and DNA-binding ability prediction of the CpbHLH genes
To gain in-depth knowledge of the function of CpbHLH family, the bHLH domains of the CpbHLH proteins were searched and the presence of the conserved amino acid residues were analyzed based on multiple sequence alignment. The alignment results ( Figure 3) showed that the CpbHLH domains were composed of four conserved regions, namely one basic region, two helix regions and a loop region. Consistent with previous studies (Heim et al., 2003), the conservation of basic region and helix region is higher than that of the loop region. The bHLH domains of C. paliurus were made up of 79 amino acid residues, of which 24 were highly conserved (> 50% consensus ratio) and 8 were extremely conservative (> 75% consensus ratio). Among the 24 highly conserved amino acid residues, six conserved residues were found in the basic region (His-9, Ala-12, Glu-13, Arg-14, Arg-16, Arg-17), seven conserved residues were found in the first helix region (Ile-20, Asn-21, Arg-23, Leu-27, Leu-30, Val-31, Pro-32), one conserved residues were found in the loop region (Asp-64), and ten conserved residues were found in the second helix region (Lys-65, Ala-66, Ser-67, Leu-69, Ala-72, . It is generally believed that the basic region performs DNA binding functions, and is critical for the bHLH family to achieve its biological function (Carretero-Paulet et al., 2010). Therefore, the DNA-binding ability of the 159 CpbHLH proteins were predicted based on the conserved amino acid residues in the basic region (Supplementary Table S3). The remaining 159 CpbHLH members were classified into three categories: G-box (His/Lys-9, Glu-13 and Arg-17), E-box (Glu-13 and Arg-16) and non-E-box (Glu-13 and Arg-16 do not appear together) in accordance with the classification method reported previously (Katiyar et al., 2012). The predicted results revealed there were 93 G-box-binding proteins, 43 non-Gbox-binding proteins and 23 non-E-box-binding proteins in 159 CpbHLHs (Supplementary Table S3).

Phylogenetic analysis and classification of the CpbHLH genes
In order to explore the evolutionary relationship among the CpbHLH members, the 159 CpbHLH proteins were aligned with 140 bHLH proteins from Arabidopsis, afterwards the phylogenetic tree was constructed using total 299 bHLH proteins based on the alignment (Figure 4). In accordance with the classification of bHLH proteins from Arabidopsis and other plants (Heim et al., 2003;Li et al., 2020b;Li et al., 2021), 299 bHLH protein sequences were classified into 26 subfamilies, and were named from Ia to XV on the basis of the nomenclature of AtbHLHs proposed by Heim et al. (Heim et al., 2003). Figure 4 showed that the XII subfamily was the largest (contained 35 CpbHLH proteins), while the smallest subfamily (VI) contained only one CpbHLH protein. According to results from Heim et al. (Heim et al., 2003), CpbHLH proteins in the same subfamily would have similar functions, consequently, the clustering results of phylogenetic tree could contribute to predict the function of CpbHLH proteins.

Gene structure and conserved motif analysis of CpbHLH genes
Diversity of exon-intron structures, which could cause divergences in coding regions, is significant to the evolution of multiple gene families (Xu et al., 2012b). Hence, the gene structural characteristics of the CpbHLH family were investigated. The number of exons in the 159 CpbHLH genes varied from 1 to 13 ( Figure 5C). In addition, 20 (12.6%) genes were intronless and Multiple sequence alignments of the bHLH domains in CpbHLH proteins. (A) Visualization of conserved amino acids of bHLH domains of CpbHLH proteins. Amino acids with a conserved degree of more than 50 and their conserved degree were labeled using red and black colors for easy recognition which had no special meaning. (B) Multiple sequence alignments of the bHLH domains of 159 CpbHLH proteins, using the Clustal color scheme.
distributed across subfamilies IIId, IIIe, VIIIa, VIIIb and VIIIc(2), while 13 (8.2%) genes contained one intron, and certainly the remaining genes had two or more introns. The 159 genes in different families varied widely in structure, including the number and relative location of introns and exons ( Figure 5C). On the contrary, the intron/exon patterns of genes in the same subfamily had highly similarity, such as in subfamilies Ib(1) (five three-exon genes), Ib(2) (five three-exon genes), III(d+e) (nine one-exon genes), IVc (eight five-exon genes), and VIIIb (seven one-exon genes) ( Figure 5C).
It is generally accepted that motifs figure prominently in interaction and signal transduction between different modules of the gene transcription process (Toledo-Ortiz et al., 2003). To further understand the evolutionary relationships among these CpbHLH proteins, the conserved motifs were analyzed by using MEME. Twenty motifs were identified and their sequences and length were counted ( Figure 5B, Supplementary Table S4). In addition, eight of twenty motifs were annotated by Pfam and CDsearch (Supplementary Table S4). Obviously, the composition patterns tended to be consistent with the results from our phylogenetic tree and gene structures, being resemble among genes within the same group, but varying greatly between groups ( Figure 5). The number of motifs in 159 CpbHLHs ranged from one (CpbHLH66) to nine (CpbHLH50). All 159 CpbHLH genes contained motif 1 and motif 2, except CpbHLH66, only containing motif 1 ( Figure 5B). Interestingly, some conserved motifs were nested in specific groups. For example, motif 13 only existed in group Ia, motif 16 in group VIIIb, motif 18 in group XII, and motif 19 in group IX respectively ( Figure 5B). This phenomenon might be the reason why functions for CpbHLH proteins tend to be specific to a particular group.

GO annotation and cis-element analyses of the CpbHLHs
The highly differentiated sequences outside the conserved bHLH domain suggest that CpbHLH proteins may have a variety of biological functions. GO annotation of these 159 proteins was performed to understand the biological processes associated with CpbHLH genes. The results are shown in (Figure 6; Supplementary Table S5). The identified CpbHLH proteins were classified into three main Gene ontology (GO) terms, which were CC (cellular component), MF (molecular function), and BP (biological process). Within MF category, the majority of CpbHLH proteins were annotated for "molecular function" (139/159), "nucleic acid binding" and "DNA binding", respectively. These functions were closely related to the primary roles that TFs have. As for CC category, most of the CpbHLH proteins were assigned to cellular components and the nucleus (139/159). However, there were also a small number of CpbHLH proteins distributed in cytoplasm (8/ 159), organelle part (7/159), cytosol (4/159), symplast (CpbHLH37/ 117/132) and chloroplast (CpbHLH68/109) ( Figure 6; Supplementary Table S5). Furthermore, the BP aspect showed that CpbHLH proteins participated in various biological processes. Proteins annotated to be related to multiple biosynthetic and metabolic possessed the largest number of CpbHLHs (141/159). Besides, CpbHLH proteins may function in regulating biological processed, such as regulation of cellular process (111/159), transcription (109/159), DNA-templated (109/ 159) and gene expression (109/159). The BP analysis also showed that many CpbHLHs could respond to stimuli (46/159), including different types of biotic and abiotic stressors, while CpbHLH38/68/ 109 were predicted to be involved in respond to salt stress ( Figure 6; Supplementary Table S5).
Conserved motifs located in gene promoter regions are recognition and binding sites for proteins. In this study, a large number of cis-regulatory elements (CREs) of CpbHLH genes were identified, and they were classified into three main categories (plant growth and development, phytohormone responsive, as well as abiotic and biotic stresses) according to their roles (Figure 7). Our result showed that CAT-box (105) and O2-site (86), which were involved in the meristem expression and zein metabolism regulation respectively were most frequently found motifs related to plant growth and development. On the contrary, the number of HD-Zip 1 (the differentiation of the palisade mesophyll cells), AACA-motif (involved in endosperm-specific negative expression) and MSA-like (cell cycle regulation) elements were 8, 3 and 2 respectively. Additionally, RY-element (seed-specific regulation) and GCN4_motif (endosperm expression) were also identified in the promoters of the CpbHLH genes (Figure 7). The most common elements in phytohormone responsive category were ABRE (the abscisic acid-responsive element), CGTCA-motif and TGACG-motif (elements involved in MeJA responsiveness) and the TCA element (SA-responsive element) (Figure 7). In the last category, a lot of important CREs related to plant abiotic stress were detected. Most abundant of these were the ABRE (drought response element), ARE (anaerobic induced response element), MBS (drought induced response element) and LTR (low temperature response element). Other stress response CREs, such as GC-motif (anoxic specific inducibility element), TC-rich (defense and stress response element) and ERE elements (oxidative stress responsive elements were also identified (Figure 7).

Expression profiles of CpbHLH genes in salt stress under hydroponic experiment
Analysis of gene expression profiles is an effective way to determine gene functions. Hence, the leaves of C. paliurus treated with different salt concentrations (0%, 0.15%, 0.3%, and 0.45% NaCl) for 30 days in hydroponic experiment were sequenced and analyzed . The raw sequencing data were submitted to the NCBI BioProject database under project number PRJNA700136. The RPKM (Reads Per Kilobase per Million mapped reads) values of 159 CpbHLH genes were obtained from the transcriptome data to estimate the expression levels of bHLH family members. However, CpbHLH119/121/138/151 were not analyzed because of the absence or low level of expression in the transcriptome data. Figure 8 showed that 155 of these genes were expressed in all concentrations of NaCl treatments with different expression patterns, providing evidence that CpbHLH genes are significantly affected by salt stress.
Based on the similarity of expression patterns, the 155 CpbHLH genes were clustered into 8 clusters, named A1-A8 (Figure 8). The genes in cluster A1 were mainly expressed in the middle (0.30% NaCl) or high (0.45% NaCl) salinity condition and did not change significantly under low (0.15% NaCl) salinity condition. In contrast, CpbHLHs in cluster A6, A7 and A8 was strongly and preferentially expressed under low salt concentrations and down-regulated under high salt concentration. In cluster A2, the expression of CpbHLH genes did not change significantly under low and middle salt stress, but reached its highest value at high salinity treatment. However, expressions of most genes in cluster A4 varied with salt concentration treatments, and expression of these genes were all down-regulated under salt treatments and reached its lowest value at 0.45% NaCl treatment. However, very low expression levels of these genes in cluster A3 and A5 were observed at middle and high salt concentrations, respectively (Figure 8). In particular, among these 155 genes, the expression of some genes were strongly induced or inhibited under salt stress. For example, compared with the CK, the expressions of CpbHLH36/74/75 in cluster A4 were down regulated by nearly folds of 3 in the low salinity treatment (0.15% NaCl), especially CpbHLH74 down regulated by nearly folds of 9 in the high salinity treatment (0.45% NaCl). Similarly, seven differentially expressed genes (DEGs) (CpbHLH68/69/71/108/146/152/158) were identified in the A5, A6 and A7, indicating a response to salt stress (Figure 8).

Expression analysis of candidate genes in response to salt in pot experiment
Combining the results from both GO annotation and expression profiles analysis in hydroponic experiment, twelve saltinduced candidate genes (CpbHLH36/38/68/69/71/74/75/108/109/ 146/152/158) were selected for further qRT-PCR analysis using templates from pot experiment with three salt concentrations (0% NaCl, 0.2% NaCl and 0.4% NaCl) (Figure 9). Notably, eight candidate genes (CpbHLH36/68/71/75/109/146/152/158) were up regulated or decreased dramatically under different salt treatments, indicating that the expression of these genes was significantly induced or inhibited under salt stress (Figure 9). Among the eight genes, four genes (CpbHLH36/146/152/158) were down-regulated under salt stress, with three of these genes (CpbHLH146/152/158) being lowest expressed at 0.4% NaCl and one gene (CpbHLH36) being lowest expressed at 0.2% NaCl. On the contrary, three genes (CpbHLH68/71/109) were significantly induced by salt stress (Figure 9). In particular, three genes (CpbHLH36/68/146) responded strongly to salt treatments. Compared to the control, the variation trend of their expression in the pot experiment was highly consistent with that in the hydroponic experiment (Figure 8; Figure 9), indicating their vital functions in response to salt stress. For example, the expression level of CpbHLH36 in both experiments was strongly inhibited under salt stress, whereas the inhibition degree was greater in low salt concentration than in high salt concentration.

Interaction network prediction of candidate genes
It was reported that bHLH proteins exert regulatory effects by forming homodimers or heterodimers between bHLH proteins or between bHLH and non-bHLH proteins (Herold et al., 2002;Hernandez et al., 2007). Thus, the interaction network of three candidate genes was predicted by STRING (Figure 10), based on the CpbHLH homologous genes in A. thaliana. The investigation of CpbHLH146 (MYC2 ortholog) showed that it was involved in light, abscisic acid (ABA), and jasmonic acid (JA) signaling pathways and controlled additively subsets of JA-dependent responses with MYC3 and MYC4 ( Figure 10B, Supplementary Table S6). Among the proteins interacting with MYC2, those related to JA signaling pathway accounted for the majority, including JAZ1, JAZ3, JAZ5, JAZ8, JAZ10, JAZ12 and TILY7. Besides, PFT1 was determined as phytochrome and flowering time regulatory protein and the EIN3 Gene ontology (GO) distribution of CpbHLH proteins. GO annotation using a cut-off value of p ≤ 0.05 showed that GO items including molecular function (MF), biological process (BP), and cellular component (CC), while predominant GO items was selected to visualize the result.
probablely acted as a positive regulator in the ethylene response pathway ( Figure 10A, Supplementary Table S6). The predicted network for CpbHLH36 (NIG ortholog) showed that it plays central roles in regulating various proteins, and coincidently several of which were also involved in the jasmonic acid signaling pathway (JAZ1 and JAZ10) ( Figure 10A, Supplementary Table S6). Other proteins, GSTU1 and GSTU2, could be involved in the conjugation of reduced glutathione to a wide number of exogenous and endogenous hydrophobic electrophiles and have a detoxification role against certain herbicides, whereas bHLH11 and TRFL8 both function in DNA binding ( Figure 10A, Supplementary  Table S6). Finally, the results of predicted network ( Figure 10C, Supplementary Table S6) also indicated that CpbHLH68 (ortholog of bHLH106) has crucial roles in DNA binding, whose function is the same as most of the proteins that interact with it. In addition, several interacting genes possibly regulate light responses, for example CRY1 and CPY2 are cryptochromes, and UVR2 and UVR3 involved in repair of UV radiation-induced DNA damage. However, PRMT4B has been identified as a positive regulator of oxidative stress tolerance that promotes the expression of antioxidant enzymes such as APX1 and GPX1 ( Figure 10A, Supplementary Table S6). Overall, the results of the protein interaction network analysis indicated that the three candidate genes interact with proteins of various functions, making them crucial players in regulating plant growth and stress responses.

Systematic and comprehensive genome-wide detection of CpbHLHs in C paliurus
Based on the whole genome of C. paliurus, 159 bHLH genes were systematically identified in the present study (Supplementary Table S2). The number of CpbHLH genes was the same as that identified in tomato (Sun et al., 2015), but smaller than that in Arabidopsis (162 genes) (Toledo-Ortiz et al., 2003) and apple (175 genes) (Yang et al., 2017), whereas greater than that in grape (94 genes) (Wang et al., 2018a), strawberry (113 genes) (Zhao et al., 2018) and jujube (92 genes) (Li et al., 2019). Overall, 159 CpbHLH proteins were further categorized into 26 subfamilies (Figure 4), according to the phylogenetic tree with the nomenclature protocol of bHLH proteins in C. paliurus and Arabidopsis (Heim et al., 2003), in agreement with results from previous studies (Pires and Dolan, 2010;Sun et al., 2015;Chu et al., 2018). However, the Cis-regulatory elements in the promoter region of CpbHLH genes. The figure represents the number of each type of motifs identified in the promoter sequence of CpbHLH genes. Clustering expression analysis of 159 CpbHLH genes in salt stress based on hydroponic experiments. The CK, LS, MS and HS represent the NaCl concentrations of 0%, 0.15%, 0.3% and 0.45% respectively. The transcript abundance level was normalized and hierarchically clustered by using the log 2 (FPKM + 1) comparison among genes of different treatments. The expression value is presented on the color scale, with red representing high expression and blue representing low expression. A1-A8 represent different clusters. In order to distinguish A1-A8 clusters more intuitively, lines of different colours were used in the right.
CpbHLHs were distributed almost evenly across 20 subfamilies, similar to Camellia sinensis (Sun et al., 2015) and O. fragrans . Moreover, our result indicated that no CpbHLHs were found in subfamily X, whereas the most CpbHLH members were detected in subfamily XII (Figure 4), with the number of members in this family increasing from 17 in Arabidopsis to 22 in C. paliurus. Differences in the numbers of bHLH genes among plant species may be due to gene replication events or genome size or gene loss during evolution (Flagel and Wendel, 2009;Li et al., 2020b).
Based on the analysis of the conserved motif and intron/exon ( Figure 5B, C), the results showed that CpbHLHs in the same subfamily of the phylogenetic tree were similar in genetic and motif structures, further confirming the accuracy of subgroup classification of phylogenetic tree (Figure 4; Figure 5). Totally twenty motifs were identified in 159 CpbHLH proteins ( Figure 5B). However, among them, motifs 1 and 2 existed in almost every CpbHLH protein and represented main components of the bHLH domain with high capability of conserved DNA binding, suggesting that the two motifs had very important implications about the functioning of bHLH genes (Zhang et al., 2020b). Nonetheless, the remaining 18 conserved non-bHLH domains can also feature separately in CpbHLHs in their B C A

FIGURE 10
Interaction network analysis for CpbHLH36 (A), CpbHLH146 (B) and CpbHLH68 (C). The predicted results are based on the orthologous gene in Arabidopsis. CpbHLH genes are shown in brackets.

FIGURE 9
Expression profiles of the 12 candidate CpbHLH genes responding to salt stress treatments in pot experiment. The standard errors from three biological and three technical replications are presented as error bars. Following analysis of variance, significant differences identified by Duncan's test (p < 0.05), using SPSS v.22, are represented by different letters. respective subfamilies, similar to the other plant species (Chu et al., 2018;Li et al., 2020b). For example, most bHLH genes of subfamily III(d+e) in Panax ginseng (Chu et al., 2018) contained MYC-N structures (bHLH-MYC_N domain, Pfam : PF14215), which have been proved functioning in regulating the biosynthesis of phenylpropane. In this study, all CpbHLHs of III(d+e) also contained MYC-N structures (motif 5, 8, 10) ( Figure 5B; Supplementary Table S4), implying that CpbHLHs of the same subgroup may have the similar roles. It was reported that gain/loss of exons and introns may result in the functional diversification of gene families (Xu et al., 2012a), whereas introns are related to gene evolution, and especieally the genes with few or no introns are more highly expressed in plants (Chung et al., 2006;Ren et al., 2006). In the present study, the intron-less CpbHLHs were distributed across subfamilies III (d+e) and VIIIb ( Figure 5C), in accordance with the phenomenon in P. ginseng (Chu et al., 2018), apple (Yang et al., 2017) and Osmanthus , suggesting CpbHLHs of these subgroups could facilitates rapid and timely response to various stresses (Jeffares et al., 2008).

Functional prediction and identification of salt tolerance genes of CpbHLHs
Transcriptional regulation is a basic process of gene regulation in response to stress signals and a mass of TFs are involved in regulating plant responses to a given stress (Riechmann et al., 2000). The results of GO annotation in this study showed the functions of the CpbHLH genes are diverse ( Figure 6; Supplementary Table S5), supporting that the bHLH TFs plays a crucial role in regulating plant growth, development and stress response (Shen et al., 2021). Several lines of evidence showed that salt stress had adverse effects on photosynthesis and the accumulation of secondary metabolites in C. paliurus . Therefore, the detection of salt stress response genes from CpbHLHs will be helpful to achieve salt-tolerant breeding of C. paliurus.
The transcriptome sequencing analysis of salt treatments in the hydroponics provided specific expression data for the CpbHLHs, which makes it possible to further study the function of these genes. The RPKM values from our hydroponics showed that a large number of CpbHLH genes were induced/repressed under NaCl stress (Figure 8). According to the RPKM data, ten significantly differentially expressed genes (CpbHLH36/68/69/71/74/75/108/146/ 152/158) were predicted to function in responding to salt stress ( Figure 8). Moreover, the molecular function annotations of 159 CpbHLHs indicated that three genes (CpbHLH38/68/109) strongly responded to salt stress (Supplementary Table S5). Thus, the 12 genes mentioned above were predicted to be candidate genes in response to salt stress and were selected for further qRT-PCR analysis, using salt-treated templates collected from our pot experiment. The qRT-PCR results showed that the expression of three genes (CpbHLH36/68/146) strongly responded to the salt treatments (Figure 9), and the variation trend of their expression levels was highly similar in the two salt stress experiments (Figure 8; Figure 9), indicating that these genes were specific for the regulation of salt tolerance in C. paliurus.
Phylogenetic analysis can be used to derive orthogonal relationships based on sequence similarity and protein structure, while the most closely related bHLH genes in the phylogenetic tree may share a similar function (Wang et al., 2021). The existed research indicated that AtbHLH106 could enhance salt tolerance of plant by directly interacting with the G-box of salt tolerant genes (Ahmad et al., 2015), whereas the CpbHLH68 was clustered in the same clade that possess high bootstrap value with AtbHLH106 (Figure 4), suggesting CpbHLH68 may be involved in response to salt stress. In addition, DNA sequences are decisive factors of the binding specificity between transcription factors and their genomic targets (Gordan et al., 2013), and our results from the DNA-binding ability of 159 CpbHLHs showed that CpbHLH68 was G-boxbinding protein (Supplementary Table S3), which further suggests that CpbHLH68, similar to AtbHLH106, may respond to salt stress by binding to G-box of target genes. Moreover, AtbHLH6 (ATMYC2) has been reported to exhibit a significant response to salt and drought stresses (Abe et al., 1997;Aleman et al., 2016), while AtNIG1 (a salt stress-responsive gene) was the first known TF participating in salt stress signal by binding calcium ions and bound to the E-box sequence (CANNTG) (Kim and Kim, 2006). Our study showed that CpbHLH36 and CpbHLH146 were clustered in the same clade with AtbHLH6(MYC2) and AtbHLH28(AtNIG1) (Figure 4), suggesting that CpbHLH36 and CpbHLH146 are also E-box proteins (Supplementary Table S3), and very likely to be involved in the regulation of salt stress signaling pathways.
In general, the function of a given gene can be inferred from its homologous genes (Yue et al., 2016;Qu et al., 2022). Therefore, Arabidopsis orthologs were used to predict the regulatory network of these three candidate genes (CpbHLH36/68/146) in this study. Some previous researches showed that AtMYC2 was involved in the regulation of ABA-inducible genes under drought stress conditions (Gordan et al., 2013) and could provide a possible mechanistic link between ABA signaling and JA signaling (Abe et al., 1997;Zhang et al., 2021). The predicted interaction genes of CpbHLH146 (MYC2 ortholog) were mainly involved in the regulation of JA signaling ( Figure 10A, Supplementary Table S6). The interaction of plant hormone ABA and JA played a major role in abiotic stress tolerance (Xiong et al., 2002;Zhang et al., 2012b) and ABA-dependent pathways the was one of important abiotic stress response signaling transduction pathways . The promoter region of most ABA regulatory genes contains many ABA responsive elements (Leonhardt et al., 2004;Yamaguchi-Shinozaki and Shinozaki, 2005;Fujita et al., 2011). In this study, a high occurrence of ABRE (ABA-responsive element) and CGTCAmotif (MeJA-responsive element) cis-acting elements was detected in the promoters of CpbHLH146 (Figure 7). Thus, it can be inferred that this gene may have an important role in regulating stress resistance by regulating the expression of key genes in the ABA signaling pathway. Furthermore, most interaction genes of CpbHLH36 (AtNIG1 ortholog) and CpbHLH68 (bHLH106 ortholog) were mainly involved in DNA binding ( Figure 10A, Supplementary Table S6), which further supports our hypothesis that these two genes regulate plant salt stress mainly via recognizing G-box of target genes. Moreover, CpbHLH36 (AtNIG1 ortholog) was also interacted with some JA signaling pathway proteins ( Figure 10A, Supplementary Table S6), and it was in the same cluster of the phylogenetic tree with CpbHLH146 (MYC2 ortholog) ( Figure 4). Besides, the similar expression trend of CpbHLH36 was observed between pot experiment and hydroponic experiment, the same as to CpbHLH146 (Figure 8; Figure 9). Therefore, it could be concluded that there is an indirect interaction between CpbHLH36 and CpbHLH146 at the protein level and these two genes coordinately control the expression of downstream genes, whereas the plant salt tolerance may depend upon the co-expression of these two genes.
In short, combined with the above results, CpbHLH36/68/146 could be the key putative candidates in response to salt stress in C. paliurus. However, characterizations of these three genes involved in the regulation of salt tolerance varied. CpbHLH36/68/146 are all G-box proteins, and may respond to salt stress by binding to G-box of target genes. Secondly, CpbHLH36 may participate in salt stress signal by binding calcium ions and regulating the expression of key genes in the JA signaling pathway. Thirdly, CpbHLH146 was very likely to be involved in the regulation of salt stress in ABA signaling pathways. Moreover, it is noted that there exists an indirect interaction between CpbHLH36 and CpbHLH146 at the protein level, thus we guess the salt tolerance of C. paliurus may depend upon the co-expression of these two genes.
In conclusion, it is the first report to identify the TF family based on the whole genome of C. paliurus. A total of 159 CpbHLH genes were detected and divided into 26 subfamilies, according to their evolutionary characteristics. In addition to investigating their structures and DNA-binding abilities, expression analysis from both the pot and hydroponic experiments and the regulatory network were also performed to determine which genes are most active for salt stress responses in this species. A total of 12 candidate genes were selected in response to salt stress, whereas the 3 genes (CpbHLH36/68/146) were further verified to be involved in regulating the salt tolerance of C. paliurus based on a pot experiment and protein interaction network analysis. Our findings would not only provide a basis for further understanding regulatory mechanisms of bHLH proteins TFs, but also drive progress in genetic improvement for the salt tolerance of C. paliurus.

Data availability statement
The whole genome sequencing raw data including Illumina short reads, PacBio long reads, Hi-C interaction reads, and transcriptome data have been submitted to the Genome Sequence Archive at the National Genomics Data Center (NGDC), Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS) / China National Center for Bioinformation (CNCB) (GSA: CRA004671 and BioProject: PRJCA005987), and are publicly accessible at https://ngdc.cncb.ac.cn/gsa/.

Author contributions
ZZ: Conceptualization, writing-original draft, visualization, data analysis, bioinformatics analysis. JF: Participated in the pot experiment. SF: Methodology, writing-review & editing, funding acquisition. LZ: Participated in the hydroponic experiment HJ: Participated in the pot experiment. All authors contributed to the article and approved the submitted version.

Funding
This work was funded by The Key Research and Development Program of Jiangsu Province (BE2019388), and the National Natural Science Foundation of China (32071750).