Identification and Expression Profiling of Chemosensory Genes in Hermetia illucens via a Transcriptomic Analysis

The black soldier fly, Hermetia illucens, is a cosmopolitan insect of the family Stratiomyidae (Diptera). Chemosensory genes encode proteins involved directly in the detection of odorants. In this study, we sequenced the antennal transcriptome of H. illucens adults to identify chemosensory genes. Putative unigenes encoding 27 odorant binding proteins (OBPs), five chemosensory proteins (CSPs), 70 odorant receptors (ORs), 25 ionotropic receptors (IRs), 10 gustatory receptors (GRs) and two sensory neuron membrane proteins (SNMPs) were identified. Tissue-specific expression profiles of the identified OBPs, CSPs and SNMPs were investigated using RT-PCR. Eight OBPs (HillOBP1-2, 9, 11-14, and 17), one CSP (HillCSP5) and one SNMP (HillSNMP1) were predominantly expressed in antennae. Further real-time quantitative PCR analyses revealed that the antennae-enriched unigenes also exhibited significant differences in expression between males and females. Among the sex-biased unigenes, six ORs showed female-biased expression, suggesting that these genes might participate in female-specific behaviors such as oviposition site searching. Sixteen ORs and two OBPs showed male-biased expression, indicating that they may play key roles in the detection of female sex pheromones. Our study is the first attempt to delineate the molecular basis of chemoreception in H. illucens. Our data provide useful information for comparative studies on the differentiation and evolution of Dipteran chemosensory gene families.

The black soldier fly, Hermetia illucens, is a cosmopolitan Dipteran. The larvae of H. illucens play a pivotal role in terms of both environmental and economic aspects for waste disposal and processing. H. illucens larvae are useful in manure management for controlling housefly populations and converting organic waste into useful products such as compost (Newton et al., 2005). Like other many Dipteran species, H. illucens adults have a sensitive olfactory system and use a wide range of environmental chemical cues to locate food, mates, and egg-laying sites (Booth and Sheppard, 1984;Sheppard, 2001, 2002). H. illucens could potentially be used to decompose landfills if adult flies can be guided to deposit their eggs there. A better understanding on the molecular components of the H. illucens olfactory system is an initial step toward this type of practical application. Identification and characterization of chemosensory genes are also important steps toward understanding their evolution and primary functions. The objective of this study is to identify candidate chemosensory genes encoding OBPs, CSPs, ORs, IRs, GRs, and SNMPs by generating and analyzing the antennal transcriptome of H. illucens adults. Genetic and phylogenetic analyses as well as expression profiling of identified chemosensory genes were also carried out to gain insights on their potential functions.

Insects, Tissues Collection and RNA Isolation
Hermetia illucens adults were obtained from a colony maintained year-round in the laboratory of the Guangdong Public Laboratory of Wild Animal Conservation and Utilization, Guangdong Institute of Applied Biological Resources in Guangzhou City, Guangdong Province, China. The colony has been maintained at 28 • C with a photoperiod of 14:10 h (Light: Dark) and 70% relative humidity.
For transcriptomic analyses, 150 pairs of antennae were collected separately from both females and males of H. illucens adults. For RT-PCR analysis, 50 tissues of antennae, mouthparts, foreleg tarsus, wings and genitals were separately obtained from adult males and females. Three replicates were generated for each tissue set.
Total RNA was isolated from homogenized tissues using Trizol reagent (Invitrogen, Carlsbad, CA, United States) following the manufacturer's instructions and then treated with RNase-free DNase I (TaKaRa, Dalian, China) to remove potential genomic DNA contamination. RNA integrity was monitored on 1% agarose gel, and assessed with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, United States). RNA concentration and purity were analyzed on a NanoDrop ND-2000 Spectrophotometer (Nanodrop Technologies, Wilmington, DE, United States).
cDNA Library Construction, Sequencing and de novo Assembly cDNA libraries were constructed with 1.5 µg purified RNA using a TruSeq RNA Sample Preparation Kit (Illumina, San Diego, CA, United States) following the manufacturer's instruction. Library preparations were sequenced using the Illumina HiseqTM 2500 platform (San Diego, CA, United States) and paired-end reads were generated. After sequencing, raw reads were firstly processed through in-house perl scripts. Clean reads were obtained from raw data after removing reads containing adapter, unknown (poly-N) and low-quality reads. Clean reads assembly was accomplished using Trinity (Version: r2013-11-10) with the default parameters after combining the male and female clean reads (Grabherr et al., 2011). The largest assembly sequences were deemed to be unigenes. The clean reads from the antennal transcriptome of H. illucens were uploaded to the NCBI Sequence Read Archive (SAMN12915779).

Functional Annotation
BLASTx searches were carried out against sequences in the NCBI non-redundant protein (nr) protein database with a cutof E-value of 10 −5 . Unigenes were also annotated using other databases including NCBI non-redundant nucleotide (nt), Swiss-Prot, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Gene Orthology (GO) and Cluster of Orthologous Groups of proteins (COG).

Gene Identification
To identify chemosensory genes from H. illucens, known OBPs, CSPs, ORs, IRs, GRs, and SNMPs from other Dipteran insects were selected as queries to search the H. illucens antennal transcriptome. Query OBPs were from D. melanogaster, B. dorsalis, Ceratitis capitata, and M. domestica. Query CSPs were from D. melanogaster, B. dorsalis, M. domestica and G. morsitans. Query ORs were from D. melanogaster, B. dorsalis, C. stygia, and M. domestica. Query GRs were from D. melanogaster, C. stygia and C. capitata for IRs; D. melanogaster, B. dorsalis and C. stygia. Query SNMPs were from D. melanogaster, B. dorsalis, and M. domestica (Supplementary Table S1). tBLASTn was used to identify candidate unigenes encoding OBPs, CSPs, ORs, IRs, GRs, and SNMPs against the H. illucens antennal transcriptome with a cut-of E-value of 10 −5 . Identified candidate unigenes were manually checked using BLASTx against the NCBI non-redundant protein sequences database. Potential open reading frames (ORFs) of candidate chemosensory genes were predicted using the NCBI ORF Finder 1 . Alignments of amino acid sequences were performed using MAFFT (Version: 7.308) (E-INS-I parameter set) (Katoh and Standley, 2013) and visualized with Geneious (Version: 9.1.3) (Kearse et al., 2012). Protein domains (including transmembrane domains and signal peptides) were predicted using the InterProScan tool plug-in in Geneious (Quevillon et al., 2005). tBLASTn searches were used to determine the scaffold location, position and intronexon organization of the candidate chemosensory genes in each H. illucens's genomic scaffold (GCA_009835165.1). Candidate unigenes coding for chemosensory genes and their corresponding reference genes were listed in Supplementary Table S2.

Phylogenetic Analysis
Amino acid sequences of candidate OBPs, CSPs, ORs, IRs, GRs, and SNMPs from H. illucens were aligned together with proteins from other Dipterans. The sequences from other Dipterans used for building phylogenetic trees are listed in Supplementary  Table S3. Sequence alignments were generated using Clustal Omega (Sievers et al., 2011), and maximum-likelihood trees were constructed using FastTree2 (Jones-Taylor-Thornton amino acid substitution model) with default settings and 1000 bootstrap replicates (Price et al., 2010). Phylogenetic trees were colored and arranged using FigTree (Version: 1.4.2).

Analyses of Expression Levels Based on FPKM
Clean reads were mapped back onto the assembled unigenes and read count for each unigene was obtained from the mapping results. The expression levels of these unigenes were calculated as the fragments per kilobase per million mapped fragments (FPKM) method (Trapnell et al., 2010).

RT-PCR and Real-Time Quantitative PCR
The expression profiles of 27 OBPs, 5 CSPs, and 2 SNMPs among various tissues (antennae, mouthparts, foreleg tarsus, 1 https://www.ncbi.nlm.nih.gov/orffinder/ wings, and genitals) were initially evaluated using RT-PCR. Total RNA was isolated from these tissues, and cDNA was synthesized using a PrimeScript RT reagent Kit (Takara). In our pre-experiment, α-tubulin (α-TUB) and glyceraldehyde-3phosphate dehydrogenase (GAPDH) were evaluated as the most stable reference genes for gene expression profiling across the different tissues (male antennae, female antennae mouthparts, foreleg tarsus, wings, and genitals) using Normfinder (Andersen et al., 2004) and BestKeeper (Pfaffl et al., 2004) (Supplementary  Table S4). Therefore, α-TUB and GAPDH genes were used as controls to assess the cDNA integrity. PCR reactions were conducted using a Bio-Rad thermal cycler (Bio-Rad, Hercules, CA, United States) with the same cycling parameters as our previous study (Cui et al., 2019). PCR products were analyzed by electrophoresis in 1.5% agarose gels. Each PCR reaction was repeated twice with independently isolated RNA samples.
Based on the RT-PCR results, unigenes encoding OBPs, CSPs, IRs, and ORs that were predominantly or exclusively expressed in antennae were analyzed again for more accurate estimation using real-time quantitative PCR (RT-qPCR). RT-qPCR analysis was conducted using a LightCycler 480 system (Roche Applied Science, Basel, Switzerland). PCR reaction conditions were the same to those used in our previous study (Cui et al., 2019). Each RT-qPCR reaction was performed in three technical replications with three independent biological replications, and PCRs with no template (nuclease-free water) were used as negative controls. RT-qPCR analysis was performed using the LightCycler 480 gene scanning software. Relative gene expression level was quantified using the comparative 2 − CT method (Livak and Schmittgen, 2001) and calculated relative to α-TUB and GAPDH. All genespecific primers were designed using Primer3 (Version: 4.1.0) 2 (Supplementary Table S5).

Statistical Analysis
Data analysis was conducted using SPSS 22.0 (SPSS Inc., Chicago, IL, United States). The significant difference analysis of each gene among various tissues was determined using a one-way nested analysis of variance (ANOVA), following by Duncan's new multiple range test (α = 0.05). Values are presented as mean ± SE. The GraphPad Prism 6.0 software (GraphPad Inc, San Diego) was used to perform the figures.

Transcriptome Sequencing and de novo Assembly
In total, 118.56 million raw reads (17.78 GB raw data) were obtained from the antennal transcriptome of H. illucens. Additionally, 117.99 million of clean reads (17.70 GB clean data) were generated after filtering adapters and low-quality raw sequences. The clean reads were assembled into 70,124 unigenes with an N50 of 2,102 bp, average length of 1,002 bp (Supplementary Table S6). Length distribution analysis indicated 33,523 unigenes, which accounted for 47.8% of all unigenes, were longer than 500 bp (Supplementary Table S7).

Transcript Abundance in H. illucens Antennae
The expression levels of all unigenes are given in Supplementary  Table S8. Unigenes with FPKM values ≥ 1,000 were defined as highly expressed, those with FPKM 200∼1,000 were defined as moderately expressed, and those with FPKM ≤ 200 were defined as weakly expressed. According to these criteria, two unigenes coding for CSPs (named HillCSP1 and 2) and three unigenes coding for OBPs (named HillOBP1, 2, and 3) were abundant in antennae with 24,339, 2,654, 5,664, 5,368, and 4,918 FPKM, respectively, suggesting their potential roles in odorant detection. Other highly abundant unigenes included those coding for the cytochrome oxidase subunit 2 (mitochondrion) (CL10816.Contig1_All, 4,103 FPKM), the cytochrome oxidase subunit 1 (mitochondrion) (CL14318.Contig1_All, 2,942 FPKM), and cytochrome oxidase subunit 3 (mitochondrion) (Unigene12013_All, 2,908 FPKM). Interestingly, a unigene encoding a takeout-like protein was also highly expressed in H. illucens antennae (Unigene14734_All, 2,446 FPKM).   (HillOBP5, 21 and 22) had a full-length ORFs encoding proteins with 130 to 233 amino acid residues. All these predicted proteins have a putative signal peptide at the N terminal region. Sequence identities of predicted OBPs with those from other Dipterans in the NCBI-nr database ranged from 26.55 to 76.55%, with an average of 39.48%. According to the number and location of the conserved cysteines (Hekmat-Scafe et al., 2002), twenty four OBPs (HillOBP1-4, 6-20, 23-27) were classified as classic OBPs, with the typical six conserved C-residues (Supplementary Figure S2), and three OBPs (HillOBP14, 25, and 27) were classified as plus-C OBPs, with four to six additional cysteine residues in addition to the six conserved cysteine residues. Phylogenetic analyses of OBPs from H. illucens and other three Dipteran species (D. melanogaster, B. dorsalis, and M. domestica) showed that all the H. illucens OBPs formed distinct clades based on an insect OBP classification system, and segregated into the classic OBP and plus-C OBP sub-families (Figure 1). FPKM value analysis revealed that five OBPs (HillOBP1-5) were highly expressed in H. illucens antennae (FPKM >1,000) (Supplementary Table S2: Sheet 1).

Candidate Gustatory Receptors (GRs)
Ten GR-encoding unigenes were identified, and four of them encode full-length proteins with 6-8 TMDs (Supplementary Table S2: Sheet 5). The 10 H. illucens GRs were distributed among 10 different scaffolds. Potential functions of GRs identified from H. illucens could be inferred from their phylogenetic relationship with GRs previously well characterized from other Dipteran species (Figure 5). HillGR1, 2, and 3 were clustered with carbon dioxide GRs (DmelGR21a and 63a) (Jones et al., 2007;Kwon et al., 2007). HillGR5 was clustered with the Drosophila saponin receptor DmelGR28b (Sang et al., 2019). HillGR10 were clustered with Drosophila sugar receptors (DmelGR64a)   . All putative GR-encoding genes were expressed at very low levels, with an average FPKM value of 5.87, except three genes encoding carbon dioxide GRs, which showed higher expression levels with FPKM values 24.49, 10.22, and 9.3, respectively.

Candidate Sensory Neuron Membrane Proteins (SNMPs)
Two SNMP-encoding unigenes were identified. These two unigenes have full-length ORFs encoding proteins with two TMDs (Supplementary Table S2: Sheet 6). The two H. illucens SNMPs were distributed among two different scaffolds. Based on a phylogenetic analysis, all SNMPs were classified into two distinct subfamilies, SNMP1 and SNMP2 (Supplementary Figure S4). HillSNMP1 was clustered with the SNMP1 subfamily, while HillSNMP2 clustered with the SNMP2 subfamily. One SNMP unigene, HillSNMP1, was expressed at relatively high levels with an FPKM value of 1,338.07 in H. illucens antennae (Supplementary Table S2: Sheet 6).

Tissue-and Sex-Specific Expression
RT-PCR showed that eight OBP-encoding unigenes 9,(11)(12)(13)(14)and 17) were almost exclusively expressed in antennae, while HillOBP3 and 15 were highly expressed in both antennae and mouthparts (Figure 6). The remaining OBPencoding unigenes were abundant in multiple tissues. Among the CSP-encoding unigenes, HillCSP5 was exclusively expressed in antennae, while other HillCSPs were present in multiple tissues. In addition, HillSNMP1 were mainly expressed in antennae.

DISCUSSION
Hermetia illucens, a representative species of the Stratiomyidae family, is evolutionally different from many well characterized Dipteran species such as D. melanogaster, M. domestica, B. dorsalis, C. stygia, G. morsitans, and M. destructor (Brammer and von Dohlen, 2007). Homology analyses of the identified H. illucens chemosensory genes along with orthologs from other Dipterans show clear separation of the H. illucens genes from the ones from other flies (Supplementary Figure S1B). However, H. illucens adults exhibit both similarity and differences in mating and oviposition behaviors with D. melanogaster and other flies. For example, H. illucens males exhibit territorial lekking behavior during mating (Tomberlin and Sheppard, 2001) and can also detect odors released due to the action of bacteria for oviposition (Zheng et al., 2013). Thus, systematic research on chemoreception may provide valuable information on understanding the evolution of the insect olfactory system. H. illucens larvae play pivotal roles in both environmental and economic aspects of waste disposal and processing. Artificial release of H. illucens adults is a well-known method for waste treatment, but it is difficult to establish populations in the field. Understanding of the molecular recognition mechanisms and the molecular basis for egg-deposition behavior in response to environments has important implications for population establishment after release. With the recent release of the H. illucens genome, chemosensory genes have been annotated (Zhan et al., 2020). In this study, all the chemosensory genes identified in the antennal transcriptome of H. illucens are present in the genome. This result supports strongly the quality of the transcriptomic assembly.
OBPs identified in H. illucens are not very conserved in comparison with those from other Dipteran species. Phylogenetic analysis reveals that only a small subset of OBPs (such as HillOBP1,2,3,5,9,11,and 14) identified in H. illucens have homologs from other Dipterans (Figure 1). Only two types of OBPs are found in H. illucens: classic and plus-C. Minus and dimer OBPs which have been reported from other Dipteran species were not found in H. illucens. The missing of Minus and dimer OBPs may reflect physiological and evolutionary differences between H. illucens and other Dipteran species. Nevertheless, there are some OBPs that are conserved and have orthology relationships with counterparts from other Dipterans. For example, HillOBP1 and 2 are homolog to OBP83b found in B. dorsalis, HillOBP11 is a homolog to Lush found in D. melanogaster, and HillOBP9 is a homolog to DmelOBP59a found in D. melanogaster (Pikielny et al., 1994;Galindo and Smith, 2001;Wu et al., 2015;Sun et al., 2018). Both OBP83b and Lush have been reported to play roles in sensing semiochemical attractants and the pheromone 11-cis vaccenyl acetate (Xu et al., 2005;Wu et al., 2016). DmelOBP59a has been reported with a function in hygroreception (Sun et al., 2018). HillOBP1, 2, 9, and 11 are expressed only in antennae, similar to what has been observed in other Dipterans. OBP genes present in the same genomic cluster show same pattern of expression in antennae. Two pairs of OBPs HillOBP1/2, HillOBP12/17 are predominantly expressed in antennae and are organized in tandem on scaffold 1521 and 2279, respectively. Based on the phylogeny in Figure 1, it suggest that there has been recent gene duplication. In addition to genes encoding OBPs, a gene encoding a CSP, HillCSP5, is also exclusively expressed in antennae. Therefore, HillOBP1, 2, 11, and HillCSP5 are likely to play a role in antennal chemicalrecognition.
A total of 70 OR unigenes are identified from H. illucens antennae, which is a greater number than those identified from the antennae of other Dipterans, such as B. dorsalis (60 OR genes identified) (Jin et al., 2017) C. stygia (50) (Leitch et al., 2015) Episyrphus balteatus (51) and Eupeodes corollae (42) (Wang et al., 2017) D. melanogaster (39)  Scaeva pyrastri (38) (Li et al., 2016), and Chlorops oryzae (25) (Qiu et al., 2018). The high number of ORencoding unigenes may be associated with its ability for diverse host-odor detection in H. illucens. Surprisingly, except Orco, no OR orthologs to the identified H. illucens OR genes are found from other Dipteran species. Apparently, a speciesspecific expansion has happened to form the 69 OR-encoding genes in H. illucens (Figure 3). Gene expansion often reflects the adaptation of a species to its ecological niche. This major OR gene expansion seems to indicate that H. illucens require multiple OR genes to detect a diversity of structurally similar odorants in its living habitat. In addition, several ORs within tandem arrays including HillOR2/3/7/14/23/40/47/55/60, HillOR24/29/33/41/49, and HillOR13/19/26/31/43, form the same phylogenetic clade, respectively (Figure 3). It is likely to result from a relatively recent local gene duplication.
ORs expressed predominantly in female antennae are predicted to function in oviposition-related odorant (Pelletier et al., 2010) or male released pheromones detection (Anderson et al., 2009). ORs expressed equally in the male and female antennae are predicted to function in the detection of general odorants such as feeding attractants (Yan et al., 2015). HillOR32, 44, 56, 57, 60, and 66 are predominantly expressed in female antennae, and therefore are likely involved in regulation of female-specific behaviors, such as localization of oviposition sites and responses to the pheromones released by males. 9,10,14,15,19,39,40,52,59, and 63 are predominantly expressed in male antennae, and therefore, may be associated with the detection of female sex pheromones. The remaining ORs are roughly equally expressed in both female and male antennae, and therefore, may be involved in general odorant detection (Figure 7).
Seventeen antennal IR genes are identified in this study from H. illucens. In Drosophila, IR92a, IR84a/8a, IR76b/IR41a, IR75a/IR8a, IR64a/IR8a have been reported to be involved in, respectively, sensing ammonia and amines  phenylacetaldehyde and phenylacetic acid  polyamines (Hussain et al., 2016) acetic acid (Prieto-Godino et al., 2016), and other acids (Ai et al., 2010). IR93a/IR68a/IR40a and IR21a/IR25a have been reported to be responsible for, respectively, temperature and moisture detection (Knecht et al., 2016(Knecht et al., , 2017 and cool sensing . IR genes identified here in H. illucens show sequence similarity to some of these characterized IRs and they may have similar functions. Ten genes encoding GRs are identified from this study in H. illucens, which is similar to the 12 GR genes reported in the antennae of D. melanogaster . However, the number of GRs found in H. illucens is much fewer than those reported in other Dipteran species, such as C. stygia (21 GR genes reported) (Leitch et al., 2015). GRs are known to function as taste and contact receptors and are often involved in host-specific pollination behavior (Jauker et al., 2009;Lucie et al., 2013). Some of the identified GRs from H. illucens show orthologous relationship with GRs from other insect species. For example, HillGR5 is homologous to DmelGR28b, a saponin receptor in D. melanogaster, and HillGR10 is homologous to DmelGR64a, a sugar receptor. Those GR homologs identified here in H. illucens may play similar roles as reported in other insects.
Two SNMP unigenes are identified in this study. Both SNMPs are conserved compared with other holometabolous insect species. SNMP1 is usually expressed in pheromonesensitive olfactory sensory neurons, and mediates responses to lipid pheromones (Jin et al., 2008;Nichols and Vogt, 2008;Vogt et al., 2009;Gomez-Diaz et al., 2016). In H. illucens, SNMP1 is predominantly expressed in antennae, supporting a role of SNMP1 in pheromone detection.

DATA AVAILABILITY STATEMENT
All data generated during this study are included in this article/Supplementary Material.

AUTHOR CONTRIBUTIONS
QX performed the experiments. ZW analyzed the data. QX, ZW, XA, and XZ wrote and revised the manuscript.

ACKNOWLEDGMENTS
We are grateful to Dr. Mingshun Chen (Kansas State University, United States) for editorial assistance and comments on the manuscript.