The Evolutionary History and Functional Divergence of Trehalase (treh) Genes in Insects

Trehalases (treh) have been found in different organisms, such as bacteria, fungi, yeast, nematodes, insects, vertebrates, and plants. Their biochemical properties are extremely variable and not yet fully understood. Gene expression patterns have shown differences among insect species suggesting a potential functional diversification of trehalase enzymes during their evolution. A second gene family encoding for enzymes with hypothetical trehalase activity has been repeatedly annotated in insect genome as acid trehalases/acid trehalase-like (ath), but its functional role is still not clear. The currently available large amount of genomic data from many insect species may enable a better understanding of the evolutionary history, phylogenetic relationships and possible roles of trehalase encoding genes in this taxon. The aim of the present study is to infer the evolutionary history of trehalases and acid trehalase genes in insects and analyze the trehalase functional divergence during their evolution, combining phylogenetic and genomic synteny/colinearity analyses.

In view of the consideration that progresses in understanding the molecular characterization of trehalase could favor the use of these proteins as a novel target for insecticides, in the present study we analyzed the evolutionary history of the treh/trehlike and ath/ath-like genes in insects with particular emphasis on phytophagous species, combining genomic analyses (related to the sequence presence/conservation and to synteny and/or co-linearity) to the phylogenetic distribution of the observed gene duplications.

MATERIALS AND METHODS
The study of treh/treh-like and ath/ath-like gene families has been performed looking at 40 insect species representatives of five different orders. For each species, genes coding for trehalase, trehalase-like, acid trehalase and acid trehalase-like genes have been identified by retrieving the available coding sequences (even if it was not previously annotated) from NCBI online databases 1 . When trehalase genes were not annotated, transcript sequence from the phylogenetically nearest species has been used to find trehalase genes in the genome of the target species, i.e., Drosophila pseudoobscura (Diptera, Drosophilidae) using annotated D. melanogaster treh gene. M. persicae sequences were retrieved from Aphidbase databases 2 .
Nucleotide sequences of exons from NCBI predictions have been used to build treh and ath gene family phylogenies. For the construction of the trehalase phylogenetic tree, treh/treh-like genes sequences from different ecdysozoans other than insects were also analyzed, including the nematode Caenorhabiditis elegans (Nematoda: Rhabditidae), the tardigrad Ramazzottius varieornatus (Tardigrada, Hypsibiidae) and the crustacean Artemia franciscana (Crustacea: Branchiopoda).
Exons sequences were analyzed with Virtual Ribosome (Wernersson, 2006) and the predicted coding sequences were aligned with Muscle in MEGA5 (Tamura et al., 2007). The alignment was trimmed to contain only the region between the first and the last conserved domains: VIVPGGR, QWDYPNAWPP, DSKTFVDM, RSQPPL, PRPESYREDY, and ELKAA and glycine rich domain GGGEYE (Barraza and Sánchez, 2013;Xie et al., 2013). The Maximum Likelihood phylogenetic tree was constructed with raxmlGUI (Silvestro and Michalak, 2011) (ML + throught bootstrap, 10 runs, 1000 reps, jModeltest and GTRGAMMAI, outgroup Escherichia coli treh EU893513.1). The phylogenetic tree was visualized and edited in iTOL (Letunic and Bork, 2006). For each conserved domain in each sequence, the p-distance with the protein consensus sequence was computed and visualized on the tree to show the level of conservation of the different trehalase genes.
Since trehalase enzymes have been described in insects in two distinct forms (soluble and membrane bound) (i.e., Gu et al., 2009), the amino-acid sequences have been analyzed with Signalp 4.1 (Petersen et al., 2011) to predict the presence of signal peptides and TMHMM Server v. 2.0 to predict transmembrane domains (Sonnhammer et al., 1998).
Genomic scaffolds containing treh and ath genes were compared looking for synteny and co-linearity among insect species by analysis of the neighboring genes located in the same scaffold/contigs hosting treh and ath genes.

Identification of treh and ath Genes Currently Available in DNA Databases
The search of Genbank databases allowed us to identify 160 treh/treh-like genes among 40 insect species and 31 ath/athlike genes in 14 species (Table 1). Except for Dipterans, the other insect taxa have experienced specific treh gene duplications and maintained multiple treh gene copies in their genomes. Hemipteran genome showed the highest number of treh gene duplications (54 treh copies in 7 species). The pea aphid A. pisum possessed the highest number of treh gene copies with 13 treh genes, followed by Aethina tumida (Coleoptera: Nitidulidae) with 11 treh genes and the two aphid species D. noxia and M. persicae with 8 treh copies, respectively (Table 1 and Supplementary Table S1). A. pisum genome possesses a treh pseudogene with a high similarity to plant trehalases, but with a partial coding sequence due to a large deletion in the gene. P. xylostella possessed two treh genes (LOC105397091 and LOC105395616) with high level of sequence similarity with a treh gene encoded by Enterobacter cloacae (Bacteria: Enterobacteriacea; scaffold CP015227).
Diptera, Lepidoptera, Hymenoptera and some Hemipteran species mostly possess a unique gene coding for acid trehalase, that resulted not phylogenetically related to bacterial or fungal acid trehalase ath genes. Differently, multiple genes coding for acid trehalases have been identified in A. pisum, M. persicae, P. xylostella, and Bactrocera oleae (Diptera:

Trehalase Phylogenetic Tree
Phylogenetic analysis evidenced that at least one member of the treh-1 and treh-2 subfamily was present in each species considered, except for Diptera that didn't possess any treh-1 gene (Figure 1 and Supplementary Figure S1). Treh-1 subfamily is represented by a higher number of members (110/160 treh-1 genes/n • of treh-1 and treh-2 genes) which encode for lesser conserved trehalase isoforms, in FIGURE 1 | Phylogenetic tree of treh genes and protein isoform diversity in insects and other taxa. For each paralog only one transcript variant has been considered while protein diversity represents all the transcript variants predicted by NCBI algorithms. respect to treh-2 subfamily. Coleopteran and hemipteran treh genes, for instance, had species-specific duplication events involving treh-1. Conversely, Hymenoptera and Lepidoptera didn't show such significant differences between treh-1 and treh-2. A. tumida represented an interesting exception possessing unique treh paralogs that results more similar to trehalase genes of the outgroup species than to insects ones.

Gene Synteny and Co-linearity
The comparison of the genomic regions containing the treh and ath genes evidenced high levels of synteny and co-linearity among species belonging to the same insect order, but not among them (Figure 7).
The highest level of genomic synteny was found in Hymenoptera with 3 treh neighboring genes shared by 15 species on 30 genomic scaffolds (Figure 2), Coleoptera (Figure 3), and Lepidoptera (Figure 3). Hemipterans, on the opposite, show very low synteny with no treh neighboring genes shared by all species studied (Figure 4). However, within the super-family Aphidoidea, 13 shared treh neighboring genes have been found between A. pisum and D. noxia, considering 44 genomic scaffolds and 54 treh genes (Figure 4).

Paralogs With Protein Functional Specialization
To evaluate the solubility of trehalase and acid trehalase enzymes, the predicted protein sequences were tested with Signal p 4.1 to identify potential signal peptides and TMHMM Server v. 2.0 to predict transmembrane domains (Figure 6).
In order to test if treh gene duplications were accompanied by functional divergence, the number of genes, transcript variants and protein isoforms has been compared among insect species (Figure 7). Isoform diversity has been analyzed firstly considering complete protein amino acid sequences and their conservation at the whole sequence level (isoforms sensu stricto) and secondly focusing only on the conservation of the amino acidic sequence of the functional domains occurring in these proteins (isoforms sensu lato).
All acid trehalases genes, on the contrary, encoded for only one isoform, apparently lacking exons and introns. Amino acidic sequences of predicted ATH isoforms were conserved only within insect orders and never possessed typical bacterial or fungal ath domains ( Table 2).

Treh Genes Duplication and Trehalase Sub-Functionalization in Insects
Trehalose is commonly present in the haemolymph of most insects and it has been suggested a role of this sugar in osmoregulation in some insects (Iturriaga et al., 2009).
The analysis of the evolutionary history of the treh/treh-like and ath/ath-like genes and their functional divergence during the insect evolution evidenced that at least two treh paralogs are present in most of studied species, except for Dipterans, that probably never duplicated the treh gene. Many insect species have experienced specific treh gene duplications and maintained the multiple treh genes as functional copies (paralogs) in their genomes (Avonce et al., 2006;Tang et al., 2018). This is interesting since, according to literature (Zhang, 2003;Zhang et al., 1998), a loss of function generally occurs for most of the paralogs. The treh gene family in insects represents therefore an interesting exception for studying the adaptive effect of duplications. Gene duplication is a major evolutionary mechanism that can confer adaptive advantages to organisms through the occurrence of mutations in paralogs resulting in new genetic variants (Duda and Palumbi, 1999;Lynch and Conery, 2003;Conant and Wolfe, 2008;Warren et al., 2014). Indeed, paralogous genes may have a decreased purifying selective pressure resulting in the fixation of mutations. In some rare cases, no deleterious mutations occur so that paralogs are maintained functionally active and may undergo processes of sub-functionalization (with paralogous and orthologous genes cooperating to the same function) or a neofunctionalization, based on the gaining of new functions of paralogous genes in respect to orthologous ones (Hughes, 1994;Zhang et al., 1998;Force et al., 1999;Kellogg, 2003;Zhang, 2003;Innan and Kondrashov, 2010).
The presence of low sequence conservation at C and N-termini of the amino acidic sequences of trehalase isoforms in Hemiptera and Coleoptera, in comparison to other insects, suggests that a functional divergence occurred in treh family during the evolution of these taxa. Interestingly, most of the Coleopteran treh gene duplications involved the treh-1 gene only and paralogs are clustered in the same scaffold. The presence of multiple copies of the treh genes in Coleoptera could be explained as an adaptation to a trehalose rich diet in insectivorous, detritivorous, and mycophagous species since this sugar is present at high concentration in fungi (Thevelein, 1984). A. thumida, on the contrary, represents an exception since, despite its ecological adaptation as a beehive parasite, it possesses a higher number of treh copies encoding for a high number of trehalase isoforms. In P. xylostella, the finding of trehalase similar to E. cloacae treh probably derives from a bacterial DNA contamination of genomic database considering that E. cloacae is widely adopted in agriculture as a bio-control agent against pathogens (Duponnois et al., 1999;Watanabe et al., 2001).
Trehalase genes have been duplicated more frequently in Hemipterans than in other insects, but numerous rearrangements (including inversion and conspicuous genomic insertions or deletions) seem to have occurred in their genomes so that the multiple treh genes were not clustered in the same scaffold. This result is not surprising considering the holocentric nature of their chromosomes that can confer the ability to retain chromosomal rearrangements, such as intrachromosomal translocations and/or chromosomal fission/fusions (Manicardi et al., 2015). Furthermore, hemipteran trehalases have the longest glycine-rich domain and the higher rate of fixed mutations. Both these aspects seem to be functionally relevant since the additional amino acids enriched in the glycine rich regions are likely to influence the interactions of these regions with other proteins or RNA and may facilitate homoand hetero-meric interactions (Wang et al., 1997;Gsponer et al., 2008). The high diversification of treh gene family in Hemipterans is particularly interesting, since it suggests that the presence of multiple copies of these enzymes is not the result of an adaptation to a sugar-rich diet (that should favor the presence of multiple copies of highly similar genes), but could be due to the occurrence of different roles of trehalase in these species. Indeed, according to literature data, defective or inhibited trehalases may be associated in insects to altered sugar metabolism (Wegener et al., 2003) or to morphological abnormalities  suggesting that treh gene duplication could result in a sub-functionalization of trehalases in Hemiptera.
The role of acid trehalases in insects is still unknown and horizontal gene transfer events from bacteria or fungi to insects could be involved. Horizontal gene transfer has been indeed already suggested, for instance, to explain the presence of carotenoids genes in aphids (Moran and Jarvik, 2010;Mandrioli et al., 2016) and the occurrence of seven highly expressed trehalase genes with strong similarity for bacterial trehalases in the rotifer Adineta vaga (Hespeels et al., 2015). However, insect acid trehalases were not phylogenetically related to bacterial and fungal ath and they don't possess the functional domains typically observed in the ATH proteins so that a different origin of these genes should be evaluated. Differently, the presence of bacterial Trehalase enzymes play important roles in the insect metabolism so that they are related to the insect survival (Iturriaga et al., 2009). For this reason, in phloem sap sucking insects, trehalase enzymes represent molecules that host plants can target to establish efficient defensive strategies. For instance, plants produce trehalose as a signal molecule in response to aphid infestation (Smith and Boyko, 2007;Louis et al., 2012) and the presence of trehalase in aphid saliva may be relevant to modulate the trehalose-based defensive plant pathways (Cooper et al., 2010(Cooper et al., , 2011Cui et al., 2012;Vandermoten et al., 2013;Chaudhary et al., 2015). In particular, trehalases in aphid saliva could act as PAD4 suppressor blocking the local accumulation of trehalose in the wounded plant tissue Bansal et al., 2013). At the same time, however, plants evolved in their turn trehalase inhibitors (Tatun et al., 2014) in a true arms race against phytophagous insects resulting in a strong selective pressure on the treh gene family that resulted in the maintenance of duplicated treh copies and in their divergence in order to allow aphids and other sap sucking insects to escape the plant defensive strategies.
The presence of co-evolution between plant trehalase inhibition and duplication/fixation of mutations in the treh genes could be particularly relevant in aphids in view of their peculiar reproductive mode. Indeed, the reproduction of aphids is mainly based (with the exception of a unique generation in autumn) on apomictic parthenogenesis consisting in several thelytokous parthenogenetic generations, in which unfertilized eggs develop into females (Nardelli et al., 2017). In the absence of an amphygonic reproduction, aphids cannot have any recombination between female and male genomes during spring and summer causing a reduced gene flow that will delay the spread of advantageous alleles. In this case, this means that treh alleles with favorable mutations couldn't be spread in aphid populations during the parthenogenetic phase of their life cycle. Interestingly, the occurrence of multiple copies of the treh genes within the aphid genomes could allow the presence of multiple alleles in the same genome making gene duplication and mutations a sort of alternative pathway (in respect to genome recombination) to favor the presence of advantageous alleles.
Aphids seems to be particularly unusual in term of presence of duplications since they possess four times the gene duplications observed on average in other arthropods (The International Aphid Genomics Consortium, 2010;Mathers et al., 2017), a feature that is in common, together with the reproduction based on parthenogenesis, with the water flea Daphnia pulex (Crustacea: Cladocera). In aphids and D. pulex most of identified duplications are clade-specific and it has been suggested that duplicates were involved in rapid adaptation to environment (Pennisi, 2009;Simon et al., 2011). From this view, treh gene duplications in aphid could be an effective tool in a molecular adaptive strategy evolved to adapt to host plants, in absence of sexual reproduction and allelic recombination. The treh gene family has been indeed involved in many aphid-specific duplication events and treh paralogous genes (possessing different mutations) could be retained to face the evolution of plant trehalase inhibitors in a sort of aphid-host plant arm race (Shcherbakov and Wegierek, 1991;Hong et al., 2009;Szwedo and Nel, 2011;Liu et al., 2014).
In view of the relevant role that treh genes could play in aphids, the understanding of the biochemical nature and physiological function of trehalases could be therefore useful not only from an evolutionary point of view, but also at an applicative level, since a better understanding of trehalase could be crucial to develop new insectcides (based on trehalase inhibitors) or plant cultivars more resistant to aphids and/or to other sap sucking agricultural pest insects.

AUTHOR CONTRIBUTIONS
All the authors contributed to the data analysis and interpretation, drafting and revising the manuscript, and approved the final version of the manuscript. The original study design was made by AN and discussed with the other authors.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphys. 2019.00062/full#supplementary-material FIGURE S1 | Phylogenetic distribution of the identified treh genes.