The BAHD Gene Family in Cacao (Theobroma cacao, Malvaceae): Genome-Wide Identification and Expression Analysis

The benzyl alcohol O-acetyl transferase, anthocyanin O-hydroxycinnamoyl transferase, N-hydroxycinnamoyl anthranilate benzoyl transferase, and deacetylvindoline 4-O-acetyltransferase (BAHD) enzymes play a critical role in regulating plant metabolites and affecting cell stability. In the present study, members of the BAHD gene family were recognized in the genome of Theobroma cacao and characterized using various bioinformatics tools. We found 27 non-redundant putative tcBAHD genes in cacao for the first time. Our findings indicate that tcBAHD genes are diverse based on sequence structure, physiochemical properties, and function. When analyzed with BAHDs of Gossypium raimondii and Corchorus capsularis clustered into four main groups. According to phylogenetic analysis, BAHD genes probably evolved drastically after their divergence. The divergence time of duplication events with purifying selection pressure was predicted to range from 1.82 to 15.50 MYA. Pocket analysis revealed that serine amino acid is more common in the binding site than other residuals, reflecting its key role in regulating the activity of tcBAHDs. Furthermore, cis-acting elements related to the responsiveness of stress and hormone, particularly ABA and MeJA, were frequently observed in the promoter region of tcBAHD genes. RNA-seq analysis further illustrated that tcBAHD13 and tcBAHD26 are involved in response to Phytophthora megakarya fungi. In conclusion, it is likely that evolutionary processes, such as duplication events, have caused high diversity in the structure and function of tcBAHD genes.


INTRODUCTION
Theobroma cacao L. is an economically important species of the plant family Malvaceae (Purseglove, 1968) due to its use in chocolate production, cosmetics, and confectionery (Litz et al., 2020). The chocolate of cacao contains 45-55% fats, and its quality is determined by the aroma (Mustiga et al., 2019), which differs among varieties due to the presence and quantity of specific compounds such as ethyl phenylacetate, ethyl octanoate, phenylethyl alcohol, 3-methylbutanal, and 2-heptanol (Castro-Alayo et al., 2019). The cacao tree grows in up to fifty countries in the humid tropics, providing an important source of income to these economies (Motamayor et al., 2013). However, the high humidity of the growing regions predisposed this plant to various fungal diseases (McElroy et al., 2018). For example, the species of Phytophthora fungus (Phytophthora palmivora, Phytophthora megakarya, and Phytophthora capsici) cause black rod/pod rot, which leads to a 20-30% loss in yield and 10% mortality of cacao (Bridgemohan and Mohammed, 2019).
Several BAHD genes that catalyze the formation of diverse metabolites have been characterized in plant species. Wholegenome analyses provide information about the genetic basis of response to abiotic and biotic stresses (Motamayor et al., 2013). In particular, genome-wide studies of the BAHD family have been reported in various species, including Populus (Yu et al., 2009), Arabidopsis (Yu et al., 2009), soybean (Ahmad et al., 2020b), Cynara cardunculus (Moglia et al., 2016), and in some species of Rosaceae Liu et al., 2020a). The chromosome-level genome assembly of T. cacao (Motamayor et al., 2013;Argout et al., 2017) provides resources for the characterization of gene families, which has led to the identification and characterization of WRKY (Dayanne et al., 2017), NAC (Shen et al., 2020), GASA , and MGT (Heidari et al., 2021a) in cacao. However, genome-wide characterization of the BAHD gene family has not been reported to date.
The aim of the current study is to identify and characterize BAHD genes of T. cacao, since BAHDs play an important role in fat biosynthesis, and chocolate an important product of the cacao seed, comprises up to 55% fats. Here, we provide the first insight into the chromosome-wide distribution of BAHD genes, chemical properties, cis-regulatory elements of promoter regions, subcellular localization, and protein structure in cacao. We also aim to explore the possible role of BAHDs against P. megakarya that causes black rot.

Identification of BAHD Genes in the T. cacao Genome and the Analysis of BAHD Conserved Domains
In the present study, the BAHD family characteristic domain (Pfam: PF02458) was used as a query using the BLAST tool in Ensembl database with an expected value of E −10 to identify BAHD genes in the T. cacao genome (T. cacao Belizian Criollo B97-61/B2) (Argout et al., 2017) and some identified BAHD genes of cacao for further confirmation and validations of previously identified genes. The same procedure was employed to identify BAHD genes in Gossypium raimondii and Corchorus capsularis for phylogenetic analysis. All protein sequences were further analyzed to confirm the presence of conserved domains (Pfam: PF02458) using the Pfam server (http://pfam.xfam.org/) and the Conserved Domains Database (CDD: https://www.ncbi. nlm.nih.gov/Structure/cdd/cdd.shtml) of the National Center for Biotechnology (NCBI). The presence of two other important domains (HXXXD and DFGWG) was confirmed by performing multiple alignments using ClustalW (Larkin et al., 2007) and visualizing in MEGA X (Kumar et al., 2018). Finally, all the sequences that lacked these domains were not included for downstream analyses following the previous study (Liu et al., 2020a) as these are important for the function of BAHD enzymes. We also retrieved coding DNA sequences (CDS), genomic sequences, promoter sequences (1,500 bp upstream of the gene), and protein sequences to study the sequence structure of BAHD genes.

Chromosome Mapping and Characterization of Physiochemical Properties
The position of each BAHD gene and chromosome number has been recorded. We renamed all genes based on the chromosome number and position, as shown in Supplementary Table 1. A phenogram was constructed to show the position of each gene on the chromosome along with duplicated genes using TBtools . We also determined physiochemical properties of proteins including protein length, isoelectric point (pI), and molecular weight (MW) using the ExPASy tool (Gasteiger et al., 2005). In addition, we predicted the subcellular localization of BAHD genes using the BUSCA webserver (Savojardo et al., 2018).

Promoter Site Analysis
We retrieved regions 1,500 bp upstream of BAHD genes and considered these to be promoter sites. PlantCare (Lescot et al., 2002) was used to analyze sequences of these promoter regions to study cis-regulatory elements. Each cis-regulatory element was classified into either hormone responsive elements (REs), stress REs, growth REs, or light REs based on its annotation in the PlantCare database.

Phylogenetic Inference and Analyses of Conserved Proteins Motif
A maximum likelihood tree was constructed for BAHD genes, first in cacao alone and then combined with BAHD genes of two other species (Gossypium raimondii and Corchorus capsularis). The maximum likelihood tree was constructed using IQ-tree (Nguyen et al., 2015) with default parameters and the best fit model JTT+I+G4 based on predictions of ModelFinder (Kalyaanamoorthy et al., 2017). The visualization of both the trees was improved using the interactive tree of life (Letunic and Bork, 2019) and MEGA X (Kumar et al., 2018). The distribution of conserved protein motifs was elucidated in BAHD proteins using MEME v5.3.0 server (http://meme-suite.org/tools/meme) (Bailey et al., 2009), which searched for 10 conserved motifs with a minimum width of motif 6 and a maximum width of motif 30.

Gene Duplications and Synteny Analyses
The BAHD genes of cacao were pairwise aligned in Geneious R8.1 (Kearse et al., 2012), and gene pairs that had similarity of 85% or higher were considered duplicated genes following previous studies (Zheng et al., 2010;Musavizadeh et al., 2021). Duplicated genes that occurred within 200 kb region were considered tandemly duplicated genes, whereas those that were separated >200 kb region or located on different chromosomes were considered segmentally duplicated genes following a recent study (Ahmad et al., 2020a). The rate of synonymous (Ks) and non-synonymous substitutions (Ka) and events of gene duplication were determined using DnaSP v.6 (Rozas et al., 2017). The selection pressure on duplicated genes was determined based on the ratio of Ka/Ks and interpreted as negative (<1), neutral (=1), and positive (>1) (Lawrie et al., 2013). The divergence time of duplication was calculated by a synonymous mutation rate of λ substitutions per synonymous site per year as T = (Ks/2λ) (λ = 6.5 × 10 −9 ) × 10 −6 (Yang et al., 2008). We also analyzed synteny relationships of cacao BAHD genes with G. raimondii and C. capsularis and drawn at chromosome level using Circos software (Krzywinski et al., 2009).

Three-Dimensional Protein Modeling and Molecular Docking
The three-dimensional structure of BAHD proteins was estimated by the protein homology/analogy Recognition Engine Version 2.0 (Phyre2) server (Kelley et al., 2015). The predicted structure of proteins was validated using the Ramachandran plot (Lovell et al., 2003) following a previous study . The Beta Cavity webserver (Kim et al., 2015) was used to estimate molecular voids and pockets in proteins. We also used the ProSA server (Wiederstein and Sippl, 2007) to estimate errors in protein structure and validate 3D-modeled proteins. The P2Rank in PrankWeb software (Jendele et al., 2019) and CASTp tool (Tian et al., 2018) were used for docking analysis of the ligand-binding regions in the modeled proteins. Finally, the results were analyzed using PyMOL (DeLano, 2002).

Expression Analysis of BAHD Genes Under Biotic Stress
The available already processed RNAseq data of biotic stress related to P. megakarya infection is available in GEO DataSets under accession number GSE116041 for tolerant and susceptible cultivars of cacao. The data have been reported in the National Center for Biotechnology Information (NCBI) by Pokou et al. (2019) after doing RNA sequencing of both cultivars after inoculating with fungus at 0, 6, 24, and 72 h. They trimmed the raw reads by trimmomatic (Bolger et al., 2014) and mapped the high-quality trimmed reads by HISAT2 (Kim et al., 2015) to the reference genome of cacao (Criollo genome v2.0) (Argout et al., 2017). The differential expression patterns of genes were determined using DESeq2 with the default setting. The complete details can be seen at Pokou et al. (2019). We downloaded the processed RNAseq data of the aforementioned method and analyzed that to extract the data of BAHD using their gene IDs. The heatmaps of all expressed BAHD genes were drawn by the TBtools package  after log2 transformation for 0, 6, 24, and 72 h fungus inoculation. In the heatmaps, the control condition was 0 h (before infection).

Coexpression Network of BAHD Genes
A coexpression network of tcBAHDs was constructed by String database (Szklarczyk et al., 2019) using their orthologs in Arabidopsis thaliana. The network was created with an interaction score of more than 0.30 and a number of interactors was set to >20 in the first shell and >5 in the second shell. Finally, the network was provided by Cytoscap software (Franz et al., 2016).

Identification of tcBAHD Genes and Their Distributions on Chromosomes Within Genomes
We spotted 27 BAHD genes in cacao genomes (tcBAHD) with functional domains of HXXXD and DFGWG ( Figure 1A) distributed across nine of ten chromosomes ( Figure 1B). The gene ID was renamed from tcBAHD1 to tcBAHD27 based on the existence on the chromosome and the position starting from chromosome 1. If, two or more genes were located on the same chromosome, the gene located first was renamed as first (Supplementary Table 1). Among 27 genes, eight were located on chromosome V, four each on chromosomes III and chromosome IX, three on chromosome II, two each on chromosome IV, VI, and VIII, and one each on chromosome I and X. No BAHD gene was found on chromosome VII. This data revealed the unequal distribution of tcBAHD genes across the cacao genome (Supplementary Table 1, Figure 1B).

Physiochemical Characterization of tcBAHD Proteins
The molecular weight of BAHDs ranged from 46.83 kDa (tcBAHD01) to 55.74 kDa (tcBAHD14) with a length of 364 (tcBAHD11) to 504 (tcBAHD14) amino acids. The isoelectric point (pI), as an indicator to determine the optimal pH, varied among tcBAHD proteins from 4.69 (tcBAHD07) to 8.72 (tcBAHD25). Overall, 14 proteins predicted with pI <6.5, highlighting that some BAHD enzymes are alkaline and some are acidic in nature. Subcellular localization analysis showed the localization of these enzymes in the cytoplasm, chloroplast, mitochondria, and organellar membrane (Supplementary Table 1). Overall, these results illustrate that tcBAHDs are diverse based on their sequence and physiochemical properties.

Evolutionary Analyses of tcBAHD Proteins
Maximum likelihood analysis demonstrated that 65 BAHD sequences of three species were clustered into four groups, including 27 sequences of cacao, 18 of G. raimondii, and 20 of C. capsularis (Figure 2). Groups II and IV were further divided into three and four subgroups, respectively. The tcBAHD proteins of cacao were found distributed in all four groups. Group I comprised three tcBAHDs, group II included five tcBAHDs, one each in II-a and II-b, and three in II-c, group III had six tcBAHDs, and group IV included 13 tcBAHDs one each in IV-a and IV-c, and other eleven in IV-d. Most of the cacao sequences showed sister relationships with the sequences of C. capsularis rather than G. raimondii. Each group contained enzymes of the same functions or of diverse functions, such as group I contained all benzyl alcohol O-benzoyltransferase, whereas group II-a contained putative 10-deacetylbaccatin III 10-O-acetyltransferase (tcBAH01), II-b contained putative omega-hydroxypalmitate O-feruloyl transferase (tcBAHD26), and II-c had all omega-hydroxypalmitate O-feruloyl transferase BAHD enzymes (tcBAHD03, tcBAHD07, and tcBAHD09) as shown in Supplementary Table 1. Group IV was highly diverse, in which IV-a included putative shikimate Ohydroxycinnamoyltransferase (tcBAHD20), IV-c comprised putative brassino steroid-related acyltransferase 1 (tcBAHD04), and IV-d comprised putative vinorine synthase (tcBAHD06,08,12,13,22,27), putative salutaridinol 7-O-acetyltransferase (tcBAHD05, 17, 18), putative acetyl-CoA-benzylalcohol acetyltransferase (tcBAHD02), and putative acylsugar acyltransferase 3 (tcBAHD21). The phylogenetic relationship was also correlated with the gain and loss of protein motifs and introns. Hence, a separate tree of tcBAHD was constructed ( Figure 3A). The analysis of protein motifs also revealed high similarities among the sequences that cluster together (Figure 3). Ten conserved motifs were recognized in the protein sequence of tcBAHDs ( Figure 3B). The sequences of group I, II-c, and III contained the same motif patterns and included all other motifs, except motif 7 which was only found in group IV-b and IV-d and was absent in the sequences of all other groups. In group II-a, tcBAHD01 lacked four motifs (motifs 7, 8, 9, and 10), whereas in group II-b tcBAHD26 lacked three motifs (motifs 7, 8, and 10). In group IV, nine sequences contained all 10 motifs. Hence, the four sequences that lacked some protein motifs are tcBAHD20 of IV-a (lacked motifs 7, 9, and 10), tcBAHD04 of IV-b (lacked motifs 9 and 10), and tcBAHD08 and tcBAHD13 of IV-d lacked motif 10 and 9, respectively. The conserved motifs were more distributed in the region of the conserved transferase domain ( Table 1). The gain and loss of introns showed different results across the four clusters of the maximum likelihood tree (Figure 3C). The gene of group I and IV showed high similarities i.e., all three genes of group I contained one intron, whereas single genes of group IV-a and IV-c contained one and two intron(s), respectively. All the 11 genes of group IV-d lacked introns. The genes of group II and III showed some diversity as the number of introns varied from 0 to 2 within the same cluster, such as single gene of group II-a contains one intron, single gene of II-b lacked intron, and genes of II-c had either one or two introns. The five genes of group III varied not only in the number of introns (0-2 introns) but also in the pattern of distribution (Figure 2C and Supplementary Table 1).

BAHD Genes Duplication and Synteny Analyses
The paralogous relationships among tcBAHD genes were analyzed along with orthologous relationships by comparing them with the BAHD sequences of G. raimondii and C. capsularis. The duplication events were recorded for 14 pairs of genes, ten of which led to the generation of gene with different functions based on predicted physicochemical properties and their reported annotation ( Table 2). These results showed that gene duplications are responsible for the expansion and diversification of function in BAHD genes. Evidence of purifying selection pressure on these genes was seen by estimating the ratio of non-synonymous and synonymous substitutions in DnaSP v.6. The purifying selection pressure indicates that these genes are expressed regularly under high selection pressure. Hence avoiding incorporation of any new amino acid that may either cause malfunctioning or completely disturb the protein structure. The divergence time analyses showed that duplication events mainly occurred recently and ranged from 1.82 to 15.50 MYA ( Table 2). The intraspecies synteny of BAHDs was drawn between cacao and G. raimondii and between cacao and C. capsularis (Figure 4). The 27 tcBAHDs in cacao showed eight syntenic block relationships with BAHDs in G. raimondii ( Figure 4A) and 12 syntenic block relationships with BAHDs in C. capsularis ( Figure 4B). These results showed that tcBAHDs have more syntenic relationships with BAHDs of C. capsularis than G. raimondii.

Pocket Analysis of tcBAHD Proteins
In the present study, the predicted 3D structure analysis of tcBAHDs showed diverse structures (Supplementary Figure 1). Pocket sites related to activation or binding site were highlighted in the structure of tcBAHDs ( Figure 5A). The amino acid residues serine (SER), glycine (GLY), proline (PRO), lysine (LYS), threonine (THR), cysteine (CYS), and arginine (ARG) were more commonly recognized as the critical binding sites in the pocket sites of tcBAHDs (Figures 5A,B). In particular, SER amino acid was more abundant in the binding site of proteins, indicating that it may have potential roles that affect the function of tcBAHDs.

Promoter Regions and Structure Analyses of tcBAHD
We identified several key responsive elements in the promoter region of tcBAHDs. The most prominent responsive elements included those related to stress (45%), hormone (27%), light (21%), and growth (7%) (Figure 6A). Elements related to DNA and protein-binding sites were also recorded FIGURE 2 | Phylogeny analysis of 65 BAHDs in three species including Theobroma cacao, Gossypium raimondii, and Corchorus capsularis. Overall, 27 tcBAHDs (black circles), 20 BAHDs of C. capsularis (yellow circles), and 18 BAHDs of G. raimondii (red circles) were classified into four main groups. The percentage of bootstrap values is provided in the branches. Table 2). Regulatory sites were found for numerous hormones, such as salicylic acid, auxin, gibberellin, MeJA, and ABA (Supplementary Table 2, Figure 6B). We found that tcBAHD genes may be more induced by ABA and MeJA based on the distribution of cis-acting elements in their promoter site. Similarly, regulatory elements were identified for drought, anoxic inducibility, elicitor, seed-specific regulation, anaerobic induction, low temperature, circadian control, and plant defense/stress (Supplementary Table 2, Figure 6C).

Expression Analyses of tcBAHDs in Biotic Stress (Response to P. megakarya)
The role of tcBAHDs was also elucidated against P. megakarya using RNA-seq data of cacao at 0 hours (0 h), 6, 24, and 72 h in fungal resistant cultivars (Scavina; SCA6) and susceptible cultivars (Nanay; NA32) (Figure 7). In the susceptible cultivars, tcBAHD01 was up-regulated after 6 h compared to 0 h (as a control condition), while after 24 h, tcBAHD25 was more induced (Figure 7A).  In the fungal resistant cultivars, tcBAHD13 showed upregulation after 6 h, and 24 h compared to 0 h; after 72 h, tcBAHD26 was more up-regulated in response to P. megakarya infection (Figure 7B). The expression profile also confirmed that tcBAHDs have diverse functions as well as physicochemical properties.

Coexpression Analyses of tcBAHDs
Co-expression analysis revealed that orthologs of tcBAHD genes interact with genes involved in the phenylpropanoid pathway, lignin metabolic process, coumarin biosynthetic process, flavonoid biosynthetic process, response to karrikin and abiotic stress (Figure 8 and Supplementary Table 3).
The hydroxycinnamoyl-Coenzyme A shikimate/quinate hydroxycinnamoyltransferase (HCT), as an ortholog of tcBAHDs, showed a high interaction score in co-expression network with 4-coumarate: CoA ligase (4CL), and phenylalanine ammonia-lyases (PALs), which are both involved in phenylpropanoid metabolic process and response to UV (Figure 8). These findings reveal that BAHDs are associated with several metabolic pathways, which may increase the resistance of plants to abiotic stresses. In addition, the endoplasmic reticulum was identified as a cellular component for the activity of BAHDs and their interactors (Supplementary Table 3).

DISCUSSION
Transfer of acetyl to cellular metabolites can affect their activity and stability. BAHD is an important plant gene family that affects the acetylation of many metabolites (D'Auria, 2006). In the present study, 27 non-redundant putative tcBAHD genes were recognized in the cacao genome for the first time. These FIGURE 4 | Synteny analysis of BAHD genes. The syntenic blocks of Theobroma cacao BAHD genes are compared with Gossypium raimondii (A) and Corchorus capsularis (B). The percentage identity of syntenic blocks shows under the color spectrum that red to blue, the %identity is reducing. The gray stack bars with green and red tips indicate genes of Gossypium raimondii (A) and Corchorus capsularis (B), whereas the white stack bars with orange and green tips indicate genes of T. cacao. The presence of red, orange, or green stack bars on T. cacao and the presence of one or two types of stack bars on genes of G. raimondii and C. capsularis are to show the cover rate between syntenic blocks with the genes of other species.
genes are lowered in number as compared to previous reports of the Rosaceae family, in which 69-141 genes were reported Liu et al., 2020a). However, the number of genes within a gene family can vary among species (Rezaee et al., 2020;Song et al., 2020;Abdullah et al., 2021a;Faraji et al., 2021). The identified tcBAHDs showed high diversity based  on sequence structure, physicochemical properties, functions, and distribution across the chromosome, which is in agreement with previous reports of the BAHD gene family (Moglia et al., 2016;Zhang et al., 2019;Ahmad et al., 2020b;Liu et al., 2020a). The annotation retrieved from the Ensembl indicates the identified tcBAHD enzymes have high diversity in the function of tcBAHD. The benzyl alcohol benzoyl transferase synthesizes the minor constituent of floral aroma benzyl benzoate and other volatile esters in esters, and also provides support during leaf damage and phytopathogenic bacteria stress (D'Auria et al., 2002). The omega-hydroxypalmitate Oferuloyl transferase synthesizes suberin aromatics and also reinforce barrier against the pathogen (Balestrini et al., 2020). Brassino steroid-related acyltransferase is important for plant development and regulation of various biological pathways including the development of flowers and seeds (Singh and Savaldi-Goldstein, 2015) and also protect against various biotic and abiotic stresses (Krishna, 2003). Shikimate O-hydroxycinnamoyl transferase accepts p-coumaroyl-CoA and caffeoyl-CoA as substrates and transfers the acyl group on both quinate and shikimate acceptors (Levsh et al., 2016) and is involved specifically in lignin and phenylpropanoid biosynthesis to support plant growth (Hoffmann et al., 2004). These examples indicate the diverse role of the identified tcBAHDs in cacao growth and development. The recognized BAHDs in cacao, G. raimondii, and C. capsularis clustered into four groups that each contained enzymes with similar or diverse functions. Previous studies also revealed that phylogenetic clustering in the same group is not an indication of the same function (Nawaz et al., 2019;Abdullah et al., 2021a). Interestingly, unlike G. raimondii, the BAHD genes of C. capsularis and T. cacao were closely related in each cluster of the tree. This finding agrees with previous phylogenetic studies of the family Malvaceae which indicate cacao among the earlier diverged species while Gossypium is recently diverged (Abdullah et al., 2019(Abdullah et al., , 2020(Abdullah et al., , 2021b. The tcBAHD genes also differed in terms of intron number and protein motifs. These variations may be responsible for the diverse functions, as they can affect the function of homologous genes and protein-protein interactions Faraji et al., 2021). Hence, it has been suggested that BAHD genes rapidly evolved after divergence (Yu et al., 2009). Tandem and segmental duplication are helpful for domestication, survival, and resistance to biotic and abiotic stresses in plants as they generate structural and functional diversity within genes (Liu et al., 2020b;Schilling et al., 2020;Zan et al., 2020). We identified 14 duplication events within BAHD genes, which mainly led to the generation of genes with diverse functions ( Table 2). This genes duplication may also play a significant role in the evolution and domestication of cacao. The purifying selection on duplicated genes indicates that these genes play important functions in cacao growth and development. Hence, they are expressed regularly and avoid deleterious mutations that cause malfunctions/structure modifications that can terminate/decrease the function of these genes (Page and Holmes, 2009;Cvijović et al., 2018). Moreover, the prediction of subcellular localization revealed that tcBAHDs localize in the cytoplasm, chloroplast, and mitochondria, which further supports the diverse function of these enzymes within the different cell compartments, important for the regulation of cacao.
Surface pocket analysis of tcBAHDs is considered important because this provide insight into the key binding site of protein structures that affecting the enzymatic activity and proteinprotein interaction (Stank et al., 2016). Serine, glycine, and proline were highly observed amino acids at pocket sites, revealing that tcBAHDs may also respond to adverse conditions, including abiotic and biotic stresses (Faraji et al., 2020;Heidari et al., 2021b). Serine was observed in the pocket site of most BAHDs suggesting its key role in regulating the activity of tcBAHDs and cellular pathways belonging to this gene family. Furthermore, a high amount of cis-acting elements related to stress-response in the promoter region of tcBAHD genes suggests that members of this gene family may be induced by transcription factors related to stress stimuli, as observed in other studies (Ahmadizadeh and Heidari, 2014;Heidari et al., 2019). Cis-acting elements related to ABA and MeJA hormones are frequently observed in promoter sites, indicating that tcBAHD genes are more induced by hormones associated with response to stress stimuli. Previous studies also reported that BAHD acyltransferases are involved in diverse pathways related to regulation of plant structure and function, cell stability, and the production of secondary metabolites (Luo et al., 2007;Grienenberger et al., 2009;Li et al., 2018;Kusano et al., 2019). Besides, the coexpression network illustrated that BAHDs are involved in the biosynthesis of secondary metabolites, which may relate to response to abiotic stress. Hence, these may also be important in resistance to biotic and abiotic stresses. We further studied the expression pattern of tcBAHD genes in resistant (Scavina; SCA6) and susceptible (Nanay; NA32) cultivars of cacao in response to fungal infection (P. megakarya). Our results revealed two BAHD genes, tcBAHD13 (a vinorine synthase) and tcBAHD26 (an Omega-hydroxypalmitate Oferuloyl transferase) that were upregulated specifically in resistant cultivars, indicating that these two genes are potentially involved in the P. megakarya fungi response. Although, these results were not validated from qualitative PCR (qPCR), previous studies reported high consistency between the result of RNAseq and qPCR, i.e., in the GASA gene family in soybean (Ahmad et al., 2019) and apple (Fan et al., 2017), extensin genes in tomato (Ding et al., 2020), papain-like cysteine proteases in rice (Niño et al., 2020) and cotton , and auxin/indole-3-acetic acid in pepper (Waseem et al., 2018). Nevertheless, a functional study is required to draw a complete conclusion.

CONCLUSIONS
In the present study, we identified and characterized 27 tcBAHD genes in the cacao genome using bioinformatics tools. Our findings indicated that tcBAHDs have a high degree of structural diversity and a wide range of functions. Various duplication events can be attributed to evolutionary process that produce this increased diversity. Further investigations are required to confirm the role of tcBAHDs in cacao growth and response to biotic/abiotic stresses.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
A and PH: manuscript drafting. A and SF: data analyses and data curation. A, SF, PH, and PP: data interpretation. A, PP, and PH: conceptualization. PH and PP: review and editing of the first draft and supervision. All authors contributed to the article and approved the submitted version.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fevo. 2021.707708/full#supplementary-material Supplementary Table 2 | Promoter important cis-elements engaged in various developmental and stress-responsive pathways in the tcBAHD genes.
Supplementary Table 3 | List of significant GO terms based on the co-expression network of orthologs of tcBAHD genes in the Arabidopsis.