Genome-Wide Identification of Aldehyde Oxidase Genes in Moths and Butterflies Suggests New Insights Into Their Function as Odorant-Degrading Enzymes

Aldehyde oxidases (AOXs) are common detoxifying enzymes in several organisms. In insects, AOXs act in xenobiotic metabolism and as odorant-degrading enzymes (ODEs). These last appear as crucial enzymes in the life cycle of insects, helping to reset their olfactory system, particularly in lepidopterans, which fulfill important ecological roles (e.g., pollination or destructive life cycles). A comprehensive understanding of their olfactory system has provided opportunities to study key chemosensory proteins. However, no significant advance has been made around lepidopteran AOXs research, and even less around butterflies, a recently evolved lineage. In this study we have identified novel AOX gene families in moths and butterflies in order to understand their role as ODEs. Eighteen genomes from both moths and butterflies were used for phylogenetics, molecular evolution and sequence analyses. We identified 164 AOXs, from which 91 are new. Their phylogeny showed two main clades that are potentially related to odorant-degrading function, where both moths and butterflies have AOXs. A first ODE-related clade seems to have a non-ditrysian origin, likely related to plant volatiles. A second ODE-related clade could be more pheromone-biased. Molecular evolution analysis suggests a slight purifying selection process, though a number of sites appeared under positive selection. ODE-related AOXs have changed a phenylalanine residue by proline in the active site. Finally, this study could serve as a reference for further evolutionary and functional studies around Lepidopteran AOXs.


INTRODUCTION
The study of gene evolution in insects has provided outstanding advances in the understanding of evolutionary processes, such as expansion or contraction of gene families (Li et al., 2019). Particularly, lepidopterans represent an extraordinary target due to a clear diversification into moth and butterflies lineages (Kawahara et al., 2019). Thus, the impact of gene evolution can be seen even within moths. For instance, regulation of desaturase genes in two sibling Helicoverpa species (i.e., H. armigera and H. assulta) results in reproductive isolation . Nowadays, the enormous amount of genomic and transcriptomic datasets for insects has provided an opportunity to elucidate novel genes and their evolutionary relationships (Oppenheim et al., 2015), something that can support our understanding of ecological aspects of insects, such as behavior. For many insect species, behavior is mainly driven by olfaction. Olfaction is primarily processed by insect antennae and their small hair-like structures called sensilla, in which a set of proteins work synergistically to maintain an extremely sensitive and dynamic system (Hansson and Stensmyr, 2011;Leal, 2013;He et al., 2019). For instance, odorant-binding proteins (OBPs) and chemosensory proteins (CSPs) function as transporters that carry odorants across the sensillar lymph (Zhou, 2010;Leal, 2013). These odorants reach an heteromeric complex of receptors, such as odorant receptors (ORs), an odorant receptor co-receptor (Orco) and a sensory neuron membrane protein (SNMP), as recently reported (Zhang et al., 2020), to unleash depolarization in olfactory neuron membranes that triggers a behavioral response (Kaissling, 2013). Along with these olfactory proteins, odorant-degrading enzymes (ODEs), such as carboxylesterases (CXEs), glutathione-S-transferases (GSTs) and aldehyde oxidases (AOXs), are responsible for resetting the insect olfactory system through the degradation of odorant molecules (Chertemps and Meïbèche, 2021;Godoy et al., 2021).
Among insects, lepidopterans have attracted special attention due to their establishment as crop pests, some with worldwide distribution. It is known that moths rely heavily on the sense of smell (Weiss, 2001), developing long distance attraction based on volatile chemicals (e.g., sex pheromones) (Chemnitz et al., 2015). In fact, hundreds of these volatiles have been identified since the first one reported for B. mori, the sex pheromone (E,Z)-10,12-hexadecadien-1-ol (bombykol) (Butenandt et al., 1959). On the contrary, butterflies rely heavily on visual cues and short-range chemical communication, understood as a multisensory integration (Costanzo and Monteiro, 2007), and hence have received less attention in terms of olfaction. Moreover, butterflies represent an interesting group for comparative studies considering their transition from moths approximately 98 Mya (million years ago) (Kawahara et al., 2019). Thus, it is believed that comparing AOXs between moths and butterflies might deepen our understanding of their odorant-degrading function.
Considering the difference in olfactory integration during the life cycle of moths and butterflies, we hypothesize that there is a specific clade of AOXs for both moths and butterflies that could be related to odorant-degrading function as well as both moth-and butterfly-specific gene expansions. Therefore, the objective of this study was to identify novel AOX genes from moths and butterflies using genomic and transcriptomic data and analyze them in terms of gene location, phylogeny, evolutionary processes, and structure.

Data Collection
Publicly available genomic data were retrieved from NCBI Genome database 1 and InsectBase 2 for major lepidopteran families, such as Bombycidae, Sphingidae, Noctuidae, Pyralidae, Crambidae, and Plutellidae for moths, whereas Nymphalidae, Pieridae and Papilionidae were used for butterflies (Table 1). Each fully represented genome assembly with Reference Sequence (RefSeq) was downloaded from NCBI Assembly database at either contig, scaffold or chromosome level (Supplementary Table 1).

Identification of Aldehyde Oxidase Family
Bioinformatics pipeline BITACORA (Vizueta et al., 2020) was used to identify already annotated AOX genes and potentially novel related genes from both moth and butterfly genomes. A database for AOX gene family was built using reported protein sequences for lepidopterans (Rybczynski et al., 1989;Merlin et al., 2005;Pelletier et al., 2007;Choo et al., 2013;Ou et al., 2014;Zhang et al., 2014Zhang et al., , 2017bYang et al., 2015;Huang et al., 2016;He et al., 2017;Xu and Liao, 2017;Wang et al., 2021b). To identify family and structural domains for AOXs, InterPro server was used 3 . The identified profile was used to search for HMM profile in PFAM database 4 (PF01315; ID Ald_Xan_dh_C). This process increased the likelihood of identifying sequences encoding members of the AOX gene family. Further processing included the trimming of isoforms (98% cutoff) using a provided script in BITACORA pipeline. Subsequently, BLAST searches were run with the identified proteins for manual annotation. Protein domain finder CDvist 5 (Adebali et al., 2015) was used to identify conserved domains of AOXs, namely two (2Fe-2S), one flavin-containing region (FAD-binding domain) and one molybdenum cofactor/substrate-binding domain. All proteins identified in this study are provided in Supplementary Table 1.

Sequence Analysis and Genome Structure
The genomic organization of identified AOX genes from both moth and butterfly species that use aldehyde-based semiochemicals was analyzed based on Vogt et al. (2015) and Xu and Liao (2017) including some modifications. Moths Bombyx mandarina, P. xylostella, and A. transitella, and butterflies Heliconius melpomene and Bicyclus anynana, were selected for this task. Annotated gene features (in GFF3 format) were retrieved from the gene identification protocol based on the BITACORA pipeline and analyzed manually. Thus, species name as well as source, start, end, strand and attributes were used for each AOX gene. Finally, gene localization was prepared in image editor Inkscape 0.48 software.

Data Preprocessing and Transcriptome Assembly
To both take advantage of transcriptomic data and include Tortricidae and Eriocraniidae families, we retrieved antennal RNA-seq data for moth Lobesia botrana (data from our laboratory) and non-ditrysian moth Eriocrania semipurpurella (SRR5328787). FASTQ files for both moths, one containing left-pair reads and other the right-pair reads, were used for assembly. Ribosomal RNA reads were removed by mapping the libraries using Bowtie2 v.2.3.3.1 (Langmead et al., 2009) against a custom rRNA database created from insect ribosomal sequences downloaded from NCBI 6 , and keeping non-mapped reads using SAMtools v.1.6 (Li et al., 2009). Low-quality reads were removed based on their q-score composition using NGSQC Toolkit v.2.3 (Patel and Jain, 2012), and high-quality reads were concatenated to build de novo transcriptomes using Trinity v.2.6.5 (Grabherr et al., 2011) with a P-value of 0.05 and fold-change value of 2.

Phylogenetic Analysis
A phylogeny for the identified AOX genes in moths and butterflies, including XDHs and AOXs from mosquitoes, beetles and bees as outgroups, was built. Full-length amino acid sequences that include conserved domains were aligned using MAFFT server 7 (Katoh et al., 2019). GUIDANCE2 server 8 was used to check consistency of the multiple sequence alignment (Sela et al., 2015). Briefly, the consistency of the alignment was measured with a score less than 0.5, in which sequences were deleted. It is worth noting that confidence scores near 1 and 0, suggest a highly and poorly consistent alignment, respectively. Finally, phylogenetic analysis was performed using maximumlikelihood method with FastTree software (Price et al., 2010). To highlight clades, specific taxa and functional evidence, the phylogenetic tree was edited using FigTree software 9 and image editor Inkscape 0.48 software.

Molecular Evolution Analysis
In order to identify putative selective pressures on AOXs, a molecular evolution analysis was performed based on the methodologies reported by Engsontia et al. (2014) and Soffan et al. (2018). Two models were used through EasyCodeML software (Gao et al., 2019) to elucidate selective pressures acting on the evolutionary process of 93 lepidopteran AOX genes, 9 XDHs and 11 AOXs from other insect orders. First, site model was applied to detect positive selection for a set of 113 sequences (Yang et al., 2000). Additionally, a branch-site model was applied to test the presence of amino acids that evolved under positive selection in a specific clade represented by 8 AOX sequences (most of them functionally studied). All the amino acid sequences were aligned by ClustalW 10 , and converted to DNA alignment with PAL2NAL server 11 . A maximum likelihood tree was prepared using the DNA alignment by FastTree software under default parameters. Briefly, the software estimated the ratio of normalized non-synonymous (d N ) to synonymous (d S ) (e.g., d N /d S or ω) substitution rate via the maximum likelihood method. The ω value indicates the mode of evolution, where ω > 1 suggests evidence of positive selection with amino acid replacement, whereas ω < 1 refers to purifying selection, and ω = 0 indicates neutral selection. The specific models (M0, M3, M1a, M2a, M7, M8, and M8a) used under the "site model" method are described in detail in previous reports (Yang et al., 2000;Yang and Nielsen, 2002;Swanson et al., 2003). For the branch-site model, the 8 AOX sequences were labeled in the phylogenetic tree as foreground branch with the remaining clades as background branches. The change in ω was evaluated for a set of sites in each foreground branch through an alternative model, whereas neutral evolution was evaluated through a null model. Likelihood ratio tests (LRTs) were used to compare both models and significant results were determined using χ 2 -tests. Finally, Bayes Empirical Bayes (BEB) analysis was used when LRT was significant to identify positive selected sites (PSSs) within each amino acid sequence (Yang et al., 2005).

Sequence Analysis and Protein Structure Prediction
First, a multiple sequence alignment (MSA) was built with 7 AOX sequences also used in molecular evolution analyses, belonging to A. transitella, B. mori, P. xylostella, H. armigera, Papilio xuthus, Papilio machaon, and B. anynana, in which PSSs were identified and indicated (Supplementary Table 1). Sequences from B. mandarina AOX5 (BmanAOX5), Drosophila melanogaster AOX2 (DmelAOX2) and mammal AOXs, such as Mus musculus AOX2 (MmAOX2) and AOX3 (MmAOX3), and Homo sapiens AOX1 (HsAOX1), were also included. MSA was built in Multalin server 12 and ESPript 3.0 13 (Corpet, 1988). The amino acid sequence of AtraAOX2 and BanyAOX2 were submitted to BLASTp available on the NCBI website 14 for template selection. To optimize the structural information available for AOXs, a multiple template-based homology modeling approach was considered as it was reported to increase accuracy in predicted protein models (Sokkar et al., 2011). First, multiple structure alignments were generated by SALIGN command, which is implemented in Modeler 10.1. Five hundred models of each AOX were obtained using Modeler 10.1 15 . The best models were selected according to the lowest discrete optimized protein energy (DOPE) score provided by the software. The coordinates were analyzed via ProCheck 16 to check stereochemical quality. Lepidopteran AOXs were visualized through PyMOL software 17 . The 3D structure of mammal AOXs were retrieved from Protein Data Bank 18 and DmelAOX2 was downloaded from AlphaFold database 19 .

Identification and Annotation of Aldehyde Oxidase Genes
Eighteen genome assemblies were retrieved from NCBI assembly database and InsectBase server. From those, 6 were assembled at chromosome level, whereas 11 at scaffold and 1 at contig level (Supplementary Table 1). The use of BITACORA pipeline resulted in a raw amount of 163 putative AOX genes for moths and 100 for butterflies. After homology searches through BLAST followed by conserved domain analyses, 99 AOX genes were left for moths and 65 for butterflies. The average amino acid length of AOXs is 1272 and 1407 for moths and butterflies, respectively. On the other hand, the moth species that showed a higher number of AOX were Spodoptera frugiperda with 20 sequences, M. sexta with 19 sequences and Trichoplusia ni with 14. In butterflies, B. anynana and Danaus plexippus showed 11 and 10 AOX sequences, respectively. The specific number of AOX genes for each species is summarized in Table 1. Overall, 58 novel AOX genes were identified for moths whereas 33 AOX genes were identified for butterflies. BLAST hits for most of the novel AOX genes were either AOXs from other lepidopteran species or XDHs from the same species. In that sense, 6 new AOXs were identified for B. mandarina and B. mori, 16 for M. sexta, 17 for S. frugiperda and 10 for D. plexippus, as the greater numbers found. No novel AOX genes were found for butterflies Papilio polytes and P. machaon. Although most of the lepidopteran species studied here have a few annotated AOX genes, several of them are not fully annotated nor studied in terms of function. It is worth noting that the amount and length of AOX genes might be dependent on genome sequencing and annotation quality, therefore, previous estimates should be taken into account with caution.

Phylogenetic Relationships and Gene Clusters Between Moths and Butterflies
FIGURE 1 | Phylogenetic tree of AOXs identified from moth and butterfly genomes as well as some AOXs and XDHs from other insect species. Clades A (light blue) and B (yellow) are highlighted as lineage with putative ODE function and lineage with ODE function, respectively. Species in green correspond to butterflies, species in black are moths, and species in blue are AOX outgroups. The clade shaded in red color corresponds to butterfly specific AOXs with no ODE described function, and the clade shaded in gray correspond to moth specific AOXs with no ODE described function. Red circles next to the sequence name represent AOXs that have been functionally studied [MsexAOX1, (E,Z)-10,12-hexadecadienal; AtraAOX2, (Z,Z)-11,13-hexadecadienal; PxylAOX3, (Z)-11-hexadecenal; BmorAOX5, benzaldehyde, salicylaldehyde, vanillic aldehyde, propanal, and heptanal]. Black circles next to the sequence name indicate antennae-or sex-biased expression. Confidence scores are indicated as circles (> 70%) in nodes. All annotated genes and their amino acid sequences are in Supplementary Table 1. 9 moth AOX genes are reported to be enriched in antennae (Figure 1, species indicated with black circles next to their name), from which only M. sexta AOX1 (MsexAOX1) has been related to aldehyde-degrading function (Rybczynski et al., 1989). A secondary odorant degrading-related clade (labeled as B in Figure 1) seems to have evolved by gene duplication. This clade includes 5 moth AOX genes enriched in antennae (Figure 1, species indicated with black circles next to their name), and includes A. transitella (AtraAOX2), P. xylostella (PxylAOX3), and B. mori (BmorAOX5) which were functionally studied (Choo et al., 2013;Zhang et al., 2020;Wang et al., 2021b). Furthermore, butterfly-and moth-specific AOX lineages were identified (highlighted in red and gray, respectively, in Figure 1), but no reported odorant-degrading function was found for these.
Interestingly, butterflies that have aldehyde-related pheromones have AOXs present in at least one of the odorantrelated clades (A or B). There are AOXs of some butterflies that are in these clades, but no aldehyde-related semiochemical has been reported yet, such as those present in D. plexippus, Pararge aegeria, Pieris rapae, P. polytes, P. xuthus, P. machaon, and Vanessa tameamea. On the contrary, H. melpomene, that uses aldehyde-based semiochemicals, showed only 1 AOX (HmelAOX2) in clade A. Likewise, B. anynana (with hexadecanal as a pheromone component), has 2 AOXs in clade A, while 5 in clade B.
In terms of gene location, B. mandarina, A. transitella and P. xylostella, that use aldehydes as semiochemicals and have AOX genes related to ODE function, are far from other AOX genes, with the exception of PxylAOX3 (Figure 2). H. melpomene has 2 grouped AOX genes that suggest the same origin for both. Likewise, B. anynana has two big clusters of 4 and 7 AOX genes. Interestingly, from the bigger cluster of B. anynana, the 7 AOX genes were distributed in odorantdegrading clades A and B. Similarly, AOX1, AOX2 and AOX5 of B. mandarina that are grouped in a single cluster, are distributed in clades A and B. Besides HmelAOX2 present in clade A, gene HMEL011718-PA clustered with HmelAOX2, which appeared in the previously mentioned butterfly-specific clade (red clade in Figure 1).

Selective Pressures on Aldehyde Oxidase Genes
Positive selection was first evaluated for a set of 113 sequences that included XDHs and AOXs of not only butterflies and moths, but also beetles, mosquitoes, and flies ( Table 2). The four models implemented (e.g., M3 vs. M0, M1a vs. M2a, M7 vs. M8 and M8a vs. M8) showed significant differences according to LRT analysis. Interestingly, a purifying selection was suggested as site model (M0) resulted in ω = 0.89.
Additionally, a branch-site model was used to test selective pressures on specific sites (i.e., codons) among 8 closely related AOX sequences, including moths A. transitella AtraAOX2, B. mori BmorAOX5, P. xylostella PxylAOX3, H. armigera HarmAOX2 and Sesamia inferens SinfAOX3, and butterflies B. anynana BanyAOX2, P. xuthus PxutAOX2 and P. machaon PmacAOX2 ( Table 3). As expected, most of the enzymes were found to be under positive selection at many sites. For instance, AtraAOX2 resulted in 23.5% of their amino acids as PSSs, from which 105 sites showed either P < 0.01 or P < 0.001. Similarly, 22.2% of residues in BmorAOX5, 11.9% in HarmAOX, and 11.2% in SinfAOX3 were PSSs, with more than 20 sites identified with P < 0.001. In terms of butterflies, BanyAOX2 resulted in PSSs distributed in 40% of the entire sequence. However, less PSSs resulted for PxutAOX2 and PmacAOX2, representing only a 2-3% of the amino acid sequence length.

Link Between Function, Primary Sequence, and Protein Structure
To complement our previous methods that included annotation, phylogeny and molecular evolution analyses, a MSA was built followed by AOX structure prediction. The MSA was based on   (Terao et al., 2020). From those, Gln772 and Glu1266 at positions equivalent to 739 and 1209 in Figure 3, were found to be conserved between all AOXs (red triangles in Figure 3). Interestingly, Phe919 (in vertebrates) at position 884 (in Lepidoptera) in Figure 3 was conserved among mammal AOXs, but replaced by Pro in all lepidopteran AOXs as well as DmelAOX2 (blue triangle in Figure 3). Lys889 (at position 855 in Figure 3) was also not conserved, with SinfAOX3, HarmAOX2, AtraAOX2, PmacAOX2, PxutAOX2 and PxylAOX3 having Gly, and BmorAOX5, BmanAOX5 and BanyAOX5 having Ser instead (purple triangle in Figure 3).
In terms of structure, we could predict the 3D arrangements for AtraAOX2 and BanyAOX2, which were used to corroborate the identified conserved residues at the active site (Figure 4). The active sites equivalent to MmAOX3 Gln772 and Glu1266 were identified in both Lepidoptera species as well as in DmelAOX2 and vertebrate HsAOX1. Differences in conformation were observed, which are found in large structures that have not been relaxed through molecular dynamics. Nevertheless, our results are consistent with residue locations, supporting, for instance, the role that the insect specific Pro884 plays in the active site.

DISCUSSION
In this study we identified a total of 164 AOX sequences from both moths and butterflies. In the context of an increasing amount of data from genomic studies, we have taken advantage of publicly available genome assemblies to identify and analyze AOX gene families in Lepidoptera. Particularly, AOXs are metal-containing enzymes that metabolize aldehydes into their corresponding carboxylic acids and other sub products (Krenitsky et al., 1972). Their role in insect chemosensation has been studied since 1989 when M. sexta AOX (MsexAOX1) was reported to catalyze (E,Z)-10,12-hexadecadienal (bombykal), the sex pheromone of this species (Rybczynski et al., 1989). However, reports about insect AOXs and their function toward aldehydes took more than 20 years to be published again, when A. transitella AOX2 (AtraAOX2) was comprehensively studied (Choo et al., 2013).
Although AOX genes have been related to metabolism of xenobiotics in mammals as well as in Culex mosquitoes (Hemingway et al., 2000;Coleman et al., 2002;Terao et al., 2020), recent efforts have been focused on insect AOXs that can act as ODEs in olfactory organs, such as antennae and maxillary palps. Here, we report a profile of sequences related to AOX gene family that provides new data sets for several lepidopterans (Table 1 and Supplementary Table 1). For instance, our analyses revealed 5 full-length and 1 partial AOX sequences for P. xylostella, including the only identified AOX so far (PxylAOX3) (Wang et al., 2021b). Similarly, 9 full-length sequences were identified for B. mori, including BmorAOX1, BmorAOX2, and BmorAOX5, the only AOXs reported so far (Pelletier et al., 2007;Zhang et al., 2020). Likewise, we report 2 new AOXs for H. armigera, in which 6 AOXs had previously been reported, including HarmAOX2, suggested to be a candidate pheromone-degrading enzyme (Xu and Liao, 2017). For butterflies, no AOX-related studies have been published to our knowledge. Hence, this study would be the first to report such enzymes in this group.
It can be argued that moths with functionally studied AOXs, namely BmorAOX5, PxylAOX3 and AtraAOX2, are not strictly related to sex pheromone degradation. In fact, AtraAOX2 has not showed specificity for A. transitella sex pheromone [(Z,Z)-11,13-hexadecadienal], being also able to catalyze aldehyde-related plant volatiles (Choo et al., 2013). Similarly, PxylAOX3 was reported able to degrade sex pheromone (Z)-11-hexadecenal as well as plant-derived aldehydes, such as phenylacetaldehyde and non-anal (Wang et al., 2021b). Butterflies, H. melpomene with (Z)-9-octadecenal, octadecanal, (Z)-11-icosenal, icosanal and (Z)-13-docosenal as sex pheromone components (Darragh et al., 2017), and B. anynana with hexadecanal (Nieberding et al., 2008), represent the only butterflies that use aldehydes as semiochemicals in our data sets. Interestingly, one AOX (i.e., HmelAOX2) was present in an ODE-related clade according to our phylogenetic analysis, while B. anynana had seven. It is worth noting that H. melpomene as pollinator (Andersson and Dobson, 2003) and B. anynana as a fruit-feeding butterfly (Lewis and Wedell, 2007), are both exposed to more aldehydes emitted by plants and fruits. For B. anynana, it is noticeable that the seven AOXs (from a total of 12) appear to have emerged independently from the rest.
Something similar to what is found in the gene location of moth species, such as B. mandarina, A. transitella and P. xylostella, where their AOX genes potentially related to ODE function, are far from other AOX genes, with the exception of PxylAOX3.
From the two ODE-related clades in our phylogenetic analysis, clade B resulted highly supported by both functional studies and antennal-enriched expression (Figure 1). The fact that AOXs from butterflies were also present in this clade, further suggests that these could use aldehyde-based volatiles as semiochemicals. Although fewer studies have exploited the semiochemistry of butterflies compared with moths, increasing evidence suggests that several species of butterflies, including H. melpomene and B. anynana, use volatiles as semiochemicals. For example, an early study reported strong antennal responses of H. melpomene to several tropical plant-derived volatiles, such as linalool, linalool oxide I and II, oxoisophoroneoxide and phenylacetaldehyde (Andersson and Dobson, 2003). Recently, 55 compounds exclusive of androconia (specialized units where secretory glands are found) in sympatric Pieridae butterflies that would play a role in mating orientation, were reported (Nobre et al., 2021). On the other hand, some moths and butterflies can share pheromone biosynthetic pathways. It has been reported that in B. anynana the synthesis of hexadecanal and (Z)-9-tetradecenol is mediated by conserved fatty acyl 11-desaturases (Liénard et al., 2014). In that sense, it is expected that other enzymes, such as AOXs, could be conserved between moths and butterflies. Therefore, it appears that AOXs in ODE-related clades could function for aldehyde-related semiochemicals whether derived from plants or conspecific species. Thus, more functional studies focused on both moths and butterflies would be necessary to support a monophyletic pheromone-degrading clade.
In general, the function of AOXs toward aldehydes might resemble the function of XDHs, which are their evolutionary ancestors (Kurosaki et al., 2013;Wang et al., 2016). High levels of similarity between vertebrate XDHs and AOXs have been reported (Terao et al., 2020). For instance, sequence identity between mammal AOXs, namely HsAOX1 and MmAOX1, reaches 83%. On the other hand, sequence identity between lepidopteran and mammal AOXs is ∼30%. More specifically, among lepidopteran AOXs, sequence identity starts decreasing at 67%. This divergence within lepidopterans is evidenced in an important amount of PSSs among those phylogenetically close AOXs that were selected for our molecular evolution analyses. Nevertheless, and as expected, residues that are conserved were not PSSs, such as those from the active site.
Our MSA analysis revealed that highly conserved residues in vertebrate AOXs, namely Glu1266, Phe919, Lys889 and Gln772, may or may not be conserved in lepidopteran and D. melanogaster AOXs. Thus, Glu1266 (at position 1,209 in our MSA, Figure 3) that is reported to be crucial for catalytic activity resulted highly conserved (Coelho et al., 2012), while Phe919 from vertebrates changes to Pro in insect AOXs (at position 884 in our MSA, Figures 3, 4). It is difficult to predict the effect of Pro instead of Phe in insect AOXs. The change from an aromatic side chain toward an aliphatic portion like the one present in Pro, might have some effects on selectivity and stability of aldehyde substrates. In our structural analyses, Pro884 was found in the active site of AtraAOX2, BanyAOX2 and DmelAOX2, keeping the region hydrophobic. In other studies, enzymes such as HCV NS5b polymerase, Pro197 along with Arg200, Cys366, Met414 and Tyr448, were reported to be crucial for ligand selectivity (Li et al., 2010). On the contrary, Pro substitutions in human carbonic anhydrase II led to an increased rigidity of the enzyme and subsequent decreased catalytic activity (Boone et al., 2015). In fact, it is well accepted that Pro restricts protein backbones with the lack of a hydrogen bond donor, disrupting α-helices (Woolfson and Williams, 1990;Van Arnam et al., 2011).
Overall, we believe this study represents the first to group a comprehensive set of AOX genes for several lepidopteran species. We have validated AOX sequences previously described and added 58 more in moths and 33 more in butterflies. We have also uncovered the potential importance of aldehydes as semiochemicals in butterflies, as reflected by the number of AOX present in this group. The information presented herein is a helpful reference for further evolutionary and functional studies in this highly biodiverse order.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
RG, AM, and HV conceived the idea. RG and HV performed genome and structural analysis. LCP and AM analyzed phylogenetics and evolution data. HV and LCP wrote the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by Agencia Nacional de Investigación y Desarrollo (ANID) project N • 11200349 and ANID scholarship N • 21190666. LCP was supported by Wellcome Trust Investigator Award 110117/Z/15/Z.