Protandric Transcriptomes to Uncover Parts of the Crustacean Sex-Differentiation Puzzle

Hermaphrodite systems offer unique opportunities to study sexual differentiation, due to their high degree of sexual plasticity and to the fact that, unlike gonochoristic systems, the process is not confined to an early developmental stage. In protandric shrimp species, such as Hippolyte inermis and Pandalus platyceros, male differentiation is followed by transformation to femaleness during adulthood. The mechanisms controlling sexual differentiation have not been fully elucidated in crustaceans, but a key role has been attributed to the insulin-like hormone (IAG) produced by the androgenic gland (AG), a crustacean masculine endocrine organ. To uncover further transcriptomic toolkit elements affecting the sexual differentiation of H. inermis, we constructed eye and whole body RNA libraries of four representative stages during its protandric life cycle (immature, male, young female and mature female). The body libraries contained transcripts related to the reproductive system, among others, while the eye libraries contained transcripts related to the X-organ-sinus gland, a central endocrine complex that regulates crustacean reproduction. Binary pattern analysis, performed to mine for genes expressed differentially between the different life stages, yielded 19,605 and 6,175 transcripts with a specific expression pattern in the eye and body, respectively. Prominent sexually biased transcriptomic patterns were recorded for the IAG and vitellogenin genes, representing, respectively, a key factor within the masculine IAG-switch, and a precursor of the yolk protein, typical of feminine reproductive states. These patterns enabled the discovery of novel putative protein-coding transcripts exhibiting sexually biased expression in the H. inermis body and eye transcriptomes of males and females. Homologs to the above novel genes have been found in other decapod crustaceans, and a comparative study, using previously constructed transcriptomic libraries of another protandric shrimp, P. platyceros, showed similar sexually biased results, supporting the notion that such genes, mined from the H. inermis transcriptome, may be universal factors related to reproduction and sexual differentiation and their control in other crustaceans. This study thus demonstrates the potential of transcriptomic studies in protandric species to uncover unexplored layers of the complex crustacean sex-differentiation puzzle.


INTRODUCTION
Naturally occurring sexual shifts during hermaphrodite life cycles offer unique opportunities to study sexual plasticity and sexual differentiation. The latter processes are currently attracting worldwide interest due to accumulating evidence that environmental factors and endocrine-disrupting chemicals affect hormonal regulation, sexual development and fertility in animals (Olmstead and LeBlanc, 2000;Rodriguez et al., 2000Rodriguez et al., , 2007Ford et al., 2003Ford et al., , 2004Hayes et al., 2010;Ford, 2012).
Important examples of sexual plasticity may be found among the crustaceans (Pandian, 2016), an ancient highly diverse group of animals in which a variety of reproductive strategies are represented, including gonochorism (separate sexes) (Juchault, 1999), intersexuality (combination of male and female features within a gonochoristic species) (Goldschmidt, 1938;Reinboth, 1975;Sagi et al., 1996;Abdel-Moneim et al., 2015;Levy et al., 2020c), asexual reproduction (e.g., parthenogenesis) (Scholtz et al., 2003;Martin et al., 2007), and different types of hermaphroditism (Levy et al., 2018;Benvenuto and Weeks, 2020). The last of these strategies constitutes a means of reproduction in which a particular individual bears both ovarian and testicular tissues [ovotestis (Hoffman, 1972;Stentiford, 2012)] and produces gametes of both sexes (Benvenuto and Weeks, 2020). Hermaphroditism may be either simultaneous, in which the individual functions both as a male and a female (Bauer and Holt, 1998;Baeza, 2007), or sequential, in which the individual first matures as one sex and then transforms irreversibly into the other (Hoffman, 1968;Subramoniam, 1981;de Almeida and Buckup, 2000). Of relevance to this study, hermaphroditism may be exploited as a valuable model system for studying sexual plasticity, since the sex-differentiation process does not occur as a single event limited to the early developmental stages of the organism.
A fairly well researched case study of hermaphroditism in crustaceans is that of the shrimp Hippolyte inermis, formerly known as H. viridis (Reverberi, 1950), a caridean protandric species inhabiting seagrass (Posidonia oceanica) meadows in shallow waters of the Mediterranean Sea and the Atlantic coasts of Spain . While protandric species are commonly born as males, followed by a transitional stage before transforming to females (Yaldwyn, 1966), H. inermis diverts from this pattern in that it lacks the ovotestis-containing transitional stage (Cobos et al., 2005;Mutalipassi et al., 2018) in both of its reproductive cycles, one in the spring and the other in the fall (Figure 1). In the spring, these shrimp exhibit a first reproductive burst, and approximately three months after hatching, both immature males and females are present in the population. In contrast, at the end of the second reproductive burst in the fall, only males are present; these animals transform into females in the following spring (Zupo, 1994). It has been suggested that the spring sexual shift in H. inermis is promoted by a diet based on diatoms (Zupo, 2000), namely, the species of Cocconeis, that are abundant in their seagrass habitat, resulting in an early transformation of males at a younger age and, hence, in the presence of small females in the population by the end of the spring (Zupo, 1994(Zupo, , 2000. However, the lower abundance of these diatoms in the fall leads to a normal process of protandric development, with a predominance of young males in the population following this second reproductive burst. Physiologically, it has been suggested that compounds present in the Cocconeis spp. diatoms cause apoptosis of the crustacean androgenic gland (AG) and, consequently, control the sexual shift from maleness to femaleness . This suggestion is in keeping with the universal mechanism of control of crustacean sexual differentiation by the insulin-like androgenic hormone (IAG) -secreted by the AG -in a process involving many upstream and downstream genes, termed the "IAG-switch" . When the switch is "turned on" (i.e., in normal males or through activation of the AG and expression or induction of IAG), masculinization occurs, whereas when it is "turned off " (i.e., in normal females or through AG ablation or silencing or inactivation of the IAG), the result is feminization (Charniaux-Cotton, 1958, 1962Nagamine et al., 1980a,b;Ventura et al., 2012;Levy et al., 2016).
Another case study of crustacean hermaphroditism -one that we also exploit in this study -is that of the protandric Northern spot shrimp, Pandalus platyceros, which is widely distributed in the North Pacific Ocean (Butler, 1964). Unlike H. inermis, in which the transformation from maleness to femaleness takes place up to few months following hatching (Zupo, 1994), the transformation in P. platyceros is slower, occurring at the age of 3 to 5 years (Butler, 1965;King and Moffitt, 1984;Iversen et al., 1993;Kimker et al., 1996). Therefore, unlike gonochoristic species in which the IAG-switch-based sexual differentiation process is limited to early developmental stages, protandric shrimps, such as H. inermis and P. platyceros, may serve as models in the study of sex-controlling toolkits, because in such species the process is not confined to early development but rather occurs later in the life cycle, when adult males transform into females. Protandry may thus be regarded as offering an opportunity to obtain new insights into sexual differentiation that cannot be obtained from studies of gonochoristic species.
In the present study, we collected samples of H. inermis at different stages of the species' protandric life cycle and constructed RNA libraries that yielded both known and novel, yet unannotated, sex-specific genes that are assumed to be associated with sexual differentiation and reproduction. To obtain insight into the function of these novel genes, we also performed a comparative in silico analysis with P. platyceros, taking advantage of RNA libraries of specific tissues previously obtained over the life cycle of P. platyceros (Levy et al., 2020a). P. platyceros is particularly suitable for such an analysis for two reasons: (i) taxonomically, the two species, H. inermis and P. platyceros, belong to families -Pandalidae and Hippolytidae, respectively (Martin et al., 2009) -that are closely related within the infraorder Caridea (Christoffersen, 1990;Wolfe et al., 2019;Levy et al., 2020c), and (ii) technically, the RNA for the H. inermis libraries was extracted from the entire body of the animal (due to its small size that prevented dissection of specific tissues), while the RNA for constructing P. platyceros libraries was obtained by dissecting out specific tissues (Levy et al., 2020a). Therefore, our working hypothesis for the in silico analysis part of this study was that a search for homologs to the newly discovered H. inermis FIGURE 1 | Two reproductive cycles in the life cycle of Hippolyte inermis. Two reproductive seasons are shown. Animals that hatch in April, due to a higher abundance of Cocconeis spp. diatoms, become either immature males or females by July and, following a summer grow-out, reproduce in September. Animals that hatch in September, due to a lower abundance of the diatoms, become males by December and, following a winter grow-out, reproduce in September.
genes in transcriptomic libraries of P. platyceros (and other decapod crustaceans) might reveal the function of those genes in a more general context that includes a wider range of crustaceans.

Animal Collection and Life Stage Definition
Based on the reproductive stages comprising the protandric life history of H. inermis (immature animals, males and females) throughout the course of the year (Zupo, 1994), shrimp were collected during September 2018 and April 2019 in Lacco Ameno d'Ischia (Gulf of Naples, Italy) at depths of 3-15 m by using a plankton net of 400 mm diameter and 100 µm mesh size, as previously described (Mutalipassi et al., 2018; Figure 2). Sorting into the different reproductive stages was performed manually, both visually and under a Leica M16 macroscope, as previously described (Zupo, 1994), based on the total body length and on the presence or absence of the appendix masculina (AM), a prominent external male character (Tombes and Foster, 1979;Nagamine et al., 1980b;Zupo et al., 2008;Mutalipassi et al., 2018;Levy et al., 2020b). The animals were sorted into four stages: I -immature animals (total length of 1-6 mm and the absence of an AM), M -males (total length of 7-11 mm and the presence of an AM), YF -young females (total length of 7-11 mm and the absence of an AM), and MF -mature females (total length of 12-33 mm and the absence of an AM) (Figure 2).

RNA Library Preparation of H. inermis From the Different Reproductive Stages
To characterize the transcriptional patterns of genes related to sexual development in different stages of the H. inermis life cycle, RNA was extracted using EZ-RNA Isolation kit (BI, Cromwell, CT, United States). Ideally, tissues related to reproduction and its control should be dissected and extracted separately, but due to the small size of H. inermis, it was impossible to separately dissect out tissues such as the AG or gonads. Therefore, we separated the eyestalks, which contain a prominent endocrine controlling organ in crustaceans (Keller, 1992;Wilder et al., 1994), from the rest of the body, which contains the reproductive system, for immature animals (n = 90), males (n = 72), young reproductive females (n = 78), and mature females (n = 15). To compensate for low RNA quantity yielded from a single individual animal, for RNA extraction, three bulks of equal number of different body and eye tissues, for each reproductive stage, were pooled separately. RNA was then extracted and the 24 pooled RNA samples (2 tissues × 3 replicates × 4 stages) were sent for sequencing (Novogene, Hong Kong) on an Illumina NovaSeq 6,000 platform.

RNA Libraries
Bioinformatic analyses were carried out at the Bioinformatics Core Facility of Ben-Gurion University with a NeatSeq-Flow platform (Sklarz et al., 2018) and in-house R scripts. Reads were quality trimmed with TrimGalore 1 . Ribosomal RNA was filtered out as follows: reads were aligned to a database of crustacean rRNA sequences downloaded from the NCBI with bwa mem (Li, 2013) using default parameters. Reads that aligned to the rRNA database (4.9-18%) were discarded using samtools (Li et al., 2009). A total of 2,458,778,342 clean reads were retained for further analysis. Two reference transcriptomes were de novo assembled, one from the eye samples and the other from the body samples. The transcriptomes were assembled using Trinity version 2.8.4 (Grabherr et al., 2011) and then filtered to exclude transcripts with very low expression. To this end, the clean reads were aligned to the transcriptome, and only transcripts for which at least one of the experimental groups had at least two replicates with counts of three or more transcripts per million were retained. The resulting filtered transcriptomes contained 244,523 and 261,562 transcripts (> 200 bp) from 112,747 and 119,618 putative genes, in the eye and body, respectively. Each transcriptome was quality assessed using Quast (Gurevich et al., 2013) and BUSCO (Simão et al., 2015) vs. the Metazoa_odb9 database. The eye and body transcriptomes included 98.8 and 98.6% of BUSCO proteins, respectively. For transcriptome annotation, the most highly expressed transcript 1 https://github.com/FelixKrueger/TrimGalore per gene (i.e., the representative transcript) was selected using the filter_low_expr_transcripts.pl script from the Trinity software suite. These representative transcripts were annotated using Trinotate (Trinotate.github.io) by searching Swissprot and PFAM-A, and performing RNAmmer predictions. Best blastp hits of TransDecoder translated transcripts having e-value < 1E-5 were reported. In addition, the representative transcripts were searched against RefSeq Proteins using blastx, and the top 20 best hits of each transcript having e-value < 1E-3 were submitted to Blast2GO "Blast Description Annotator" algorithm.

Differential Expression Analysis of Genes From the RNA Libraries
Clean reads from the eye and body RNA samples were aligned to the respective filtered reference transcriptome (as described above) with bowtie2 (Langmead and Salzberg, 2012), and gene expression was estimated with RSEM (Li and Dewey, 2011). Statistical analyses were carried out for the eye and body libraries separately. For quality assessment, counts were subjected to variance stabilizing transformation [DESeq2; (Love et al., 2014)] and then to sample-wise correlation analysis and principal component analysis (PCA). One eye sample of the immature stage was found to be an outlier and was thus excluded from further analysis. Statistical testing for differential expression was carried out using DESeq2 (Love et al., 2014), a method specifically tailored to count data using negative binomial generalized linear models. All six possible contrasts were assessed (i.e., I vs. M, I vs. YF, I vs. MF, M vs. YF, M vs. MF and YF vs. MF). Genes were considered to be differentially expressed in a certain contrast if they had a false discovery rate (FDR) adjusted p-value of < 0.05 and a linear fold change > 1.3 or < −1.3, where the plus and minus signs denote up-and down-regulation, respectively. A unified list of differentially expressed genes in any of the contrasts was constructed for the eye and body libraries separately. Hierarchical clustering of the differentially expressed genes in each library, after z-scoring of their variance-stabilized expression values, was carried out using the pheatmap function of R. Also, to facilitate comparison between body and eye gene expression, a joint reference transcriptome was constructed from the clean reads of all body and eye samples. Transcriptome assembly and filtering were performed as described above, yielding 296,530 transcripts (> 200 bp) from 161,343 putative genes. Annotation was achieved using Trinotate. Read alignment and quantification of the body and eye samples to the joint transcriptome were done as described above. Counts were normalized using DESeq2 "counts" function with the "normalized" parameter set to TRUE, after excluding the exceptional eye sample. Genes with exclusive expression in either the body or the eye tissues were identified according to the following criteria: (1) All replicates of at least one of the developmental stages in the tissue of interest had more than 300 counts each. (2) All samples (in all stages) of the other tissue had less than 30 counts each.

Functional Enrichment Analyses
GO assignments and gene length data were extracted from trinotate results using trinotate-provided scripts. Gene Ontology (GO) enrichment analysis of the differentially expressed genes was carried out using the GOSeq R package (Young et al., 2010).

Binary Gene Expression Pattern Analysis
The unified list of genes expressed differentially in the eye and that for the body were subjected to binary gene expression pattern analysis, as previously described (Abehsera et al., 2015;Shaked et al., 2020). Briefly, a list of all possible binary patterns was constructed (n = 14), excluding "0000" and "1111" (i.e., low and high expression in all stages, respectively). For each gene, Pearson's correlation coefficient was computed between the gene's variance-stabilized counts across all samples and an artificial expression profile of each of the patterns. For example, for the body (having 3 replicate samples per developmental stage), the pattern 0110 was represented by the artificial profile "000111111000." It is noteworthy that a high correlation of a gene to a pattern does not indicate a "presence" or "absence" status, but rather relatively "high" and "low" expression levels at different stages. Genes were assigned to the pattern with which they had the highest Pearson's correlation, if the correlation was higher than 0.8. Hierarchical clustering of the genes in each pattern, after z-scoring of their variance-stabilized expression values, was carried out using R's pheatmap function. To aid in the choice of candidate genes for further study, we subsequently applied an additional cutoff, namely, a "counts cut-off, " requiring that in developmental stages considered as "1" in the binary pattern, the normalized counts in all replicate samples would be larger than 500. Normalized counts were computed using the DESeq2 "counts" function with the "normalized" parameter set to TRUE.

Characterizing the AG and IAG in H. inermis
Even though the AG is prominently situated at the base of the fifth pereiopod in Crustacea (Charniaux-Cotton, 1962), it was nearly impossible to isolate the AG from the very small H. inermis males (total body length of 8-10 mm). Therefore, to enable histological analysis of this organ, the whole body of a male specimen was fixed in 4% buffered formalin for 48 h at room temperature, as previously described (Levy et al., 2020a). Samples were gradually dehydrated through a series of increasing alcohol concentrations (70%, 80%, 90% and 100%), incubated in xylene, and embedded in Paraplast (Kendall, Mansfield, MA, United States). Serial dorsoventral sections of 5 µm were then placed on silane-coated slides (Menzel-Gläser, Braunschweig, Germany) and stained with hematoxylin and eosin, for morphological observations, as follows: The slides were dipped in xylene for 5 min × 2 and then in 100, 90, 80 and 70% EtOH for 1 min each, followed by tap water for 1 min, and hematoxylin for 5 min. The slides were then washed in tap water for 5 min, in acidic 70% EtOH for 10 s (for removing background staining), and again in tap water for 4 min. The final stage comprised dipping the slides in 95% EtOH for 15 s, eosin for 5 min, 95% EtOH for 5 min, 100% EtOH for 5 min × 2, and xylene for 5 min × 2 and then covering the section with a cover slip.
To find the sequence of the IAG mRNA in H. inermis (Hi-IAG), we aligned the IAG sequence from P. platyceros [Pnp-IAG; GenBank accession number: KX619617.1 (Levy et al., 2020a)] to the H. inermis body transcriptome generated in this study. From the IAG sequences that are available for dozens of decapod species, with different reproductive strategies and from various clades within the Crustacea , Pnp-IAG was chosen for alignment due to the similar protandric nature of P. platyceros and H. inermis and their close evolutionary relationship (Wolfe et al., 2019;Levy et al., 2020c). After finding the Hi-IAG mRNA transcript in the body transcriptome of H. inermis, the predicted structure of the protein was inferred from its deduced amino acid sequence.

Mining for Sex-Related Genes in the Different Stages of the H. inermis Life Cycle and Comparative Transcriptomic Analysis vs. P. platyceros
After successfully sequencing the male-representing IAG gene, as described above, we searched the six IAG-switch related genes previously described in P. platyceros (i.e., Pandalus platyceros IAG-switch-like proteins 1-6; (Levy et al., 2020a)) in the H. inermis body transcriptome. Also, we set out to sequence the female-representing vitellogenin (Vg) gene in H. inermis. Vg is a precursor of vitellin (yolk protein), which is synthesized in the hepatopancreas and, in some species, also in the ovary (Yano and Chinzei, 1987;Quackenbush, 1989;Tsukimura, 2001;Okumura et al., 2007). We note that this protein was found to reflect the physiological reproductive state of females along the life cycle of P. platyceros (Levy et al., 2020b). Guided by the same considerations as those described above for IAG sequencing (Wolfe et al., 2019;Levy et al., 2020c), we aligned the Vg sequence from P. platyceros [Pnp-Vg; GenBank accession number: MK070912.1 (Levy et al., 2020b)] to the H. inermis body transcriptome generated in this study.
In addition, to find novel sex-specific genes in H. inermis (beyond previously characterized sex-related genes), the list of genes (Supplementary Table 1) in the eye and body transcriptomes with binary patterns of 0100 (male-specific) and 0001 (mature female-specific) was investigated, focusing on unannotated transcripts that passed the counts cut-off threshold. Candidate genes with a coding region were investigated for the presence of conserved domains, using SMART 2 [ (Schultz et al., 2000)], and a signal peptide, using SignalP-5.0 Server (Almagro Armenteros et al., 2019).
As described above, it was not feasible to separate out specific tissues from the bodies of the sampled H. inermis shrimps and to sequence their transcriptomes. Therefore, to determine in which tissue the genes under investigation in this study were expressed, we performed an in silico comparative transcriptomic analysis of the candidate genes between H. inermis and P. platyceros. The coding regions of the IAG and Vg genes in H. inermis and the novel H. inermis male-and female-specific genes found in the present study were aligned to the available P. platyceros transcriptome, by using the tblastn module in BLAST (Altschul et al., 1990;Gertz et al., 2006). The first hit with the highest score from the BLAST alignment of each gene was considered as the homolog sequence in P. platyceros. In addition, to examine whether the novel male-and female-specific genes identified in this study are conserved among decapod crustaceans, blastx of the nucleotide sequences in the NCBI server was performed to Transcriptome Shotgun Assembly proteins (TSA), as was tblastn of the amino acid sequences to TSA (Organism: Decapoda (taxid: 6683)).

In vitro Verification of Sex-Specific Genes in Different Life Cycle Stages
Total RNA was extracted as described above, and cDNA was synthesized using qScript cDNA Synthesis kit (Quanta, Beverly, MA, USA) from the pooled H. inermis body samples at each reproductive stage: I (n = 3), M (n = 3), YF (n = 3) and MF (n = 3). Relative quantification (RQ) of transcript levels was performed using Roche Diagnostics FastStart Universal Probe Master Mix (Basel, Switzerland) and Roche Universal Probe Library probes. The primers and probes that were used for the different qPCR assays are listed in Table 1. The qPCR reactions were performed in the QuantStudio 1 Real-Time PCR System, Applied Biosystems (Foster City, CA, USA). The transcript levels of all samples in each qPCR assay were normalized against the sample with the lowest RQ within the same assay.

Statistical Analyses
For statistical analysis of the qPCR relative transcript levels in all the tested genes, the RQ data was first logarithmically transformed. The transformed data were then analyzed using one-way ANOVA, followed by a post hoc Dunnett test. For Hi-IAG and the tested novel, yet uncharacterized, male gene, the quantification in different stages was compared to the expression in the male stage. For Hi-Vg and the tested novel, yet uncharacterized, female gene, the quantification at different stages was compared to the expression in the mature female stage. Statistical analyses were performed using Statistica v13.3 software (StatSoft Ltd., Tulsa, OK, United States).

H. inermis RNA Library and Representative Sex-Specific Differentially Expressed Genes
De novo assembly of all 2,458,778,342 clean reads yielded filtered transcriptomes with 244,523 and 261,562 contigs in H. inermis eye and body, respectively. Total contig length was 319,146,505 and 349,094,053 bp in the eye and body transcriptomes, respectively. The sequencing depth was 485x and 713x with contig average lengths of 1,305.18 and 1,334.65 bp and N50 values of 2,382 and 2,467 bp in the eye and body transcriptomic libraries, respectively. Following the DESeq2 analysis, totals of 11,821 and 32,317 genes were found to be differentially expressed (FDR adjusted p-value < 0.05 and absolute linear fold change > 1.3) between the reproductive stages (immature, male, young female and mature female) in the body (Figure 3A and Supplementary Table 2) and eye ( Figure 3B and Supplementary Table 3), respectively. Alignment of Pnp-IAG to the body transcriptome revealed one transcript (Hippolyte_Body_TRINITY_DN25817_c0_g1; see line 2937 in the "MF vs. M" sheet in Supplementary Table 2) that was found to contain the IAG sequence in H. inermis (Hi-IAG; GenBank accession number: MZ222390) and was annotated as such. Further analysis of the genes differentially expressed between males and mature females in the body transcriptome yielded 11 transcripts that were annotated as vitellogenin. Further analysis revealed that homologs for the five longest transcripts in other decapods could be other copies of vitellogenin as two of them matched to vitellogenin, other two to vitellogenin 2 and one to vitellogenin-like gene. Among them, the transcript with the highest absolute linear fold change (Hippolyte_Body_TRINITY_DN538_c0_g1, Linear FC = 172; see line 7 in the "MF vs. M" sheet in Supplementary Table 2 and Supplementary Figure 1) was defined as Hi-Vg (GenBank accession number: MZ222391).

Binary Patterning Analysis, GO Enrichment Analysis and Novel Sex-Specific Genes
The binary pattern analysis of the differentially expressed genes in the body (Figure 4 and Table 2) yielded 6,175 significant

-AGATGAGAGCATCACCCAAGA -3 -CAGGATCTGGTTGACAGCAT -3 -ATCGAATCGACACAGAAGAAGT TTGG -3
Hi-Vg The gene ID in the transcriptome (in brackets) and accession number, together with primers and probes used for the qPCR, are given. genes, of which 949 passed the counts cut-off threshold (i.e., the normalized counts in all replicate samples was larger than 500). Similar analysis of the differentially expressed genes in the eye ( Figure 5 and Table 2) yielded 19,605 significant genes, of which 2,665 passed the counts cut-off threshold. In the body transcriptome, 819 genes were male specific (0100 pattern); of those, 277 were annotated. Finally, 992 genes were female specific (0001 pattern), and of those, 470 were annotated. In the eye transcriptome, 386 genes were male specific (0100, with 123 being annotated), while 1,368 genes were female specific (0001), with 595 being annotated. In addition, 41 (13 annotated) and 44 (21 annotated) immature-specific genes (1000 pattern) were found in the body and the eye transcriptomes, respectively. GO enrichment analysis of the differentially expressed genes in the body transcriptome (Supplementary Table 4) yielded 56 and 70 terms characterized as molecular function (MF) and biological process (BP), respectively, and in the eye transcriptome (Supplementary Table 5), 127 and 191 terms characterized as MF and BP, respectively.
Mining for novel sex-specific genes by focusing on unannotated female-specific genes with a binary pattern of 0001 and male-specific genes with a binary pattern of 0100 that had passed the counts cut-off threshold yielded numerous novel sex-specific genes. Among the novel female-specific genes in the body, a representative unannotated transcript with a clear protein-coding region (Hippolyte_Body_TRINITY_DN8088_c0_g2, Linear FC = 68.9; see line 20 in the "MF vs. M" sheet in Supplementary  Table 2) was defined as "Uncharacterized female gene" (Hi-UCF; GenBank accession number: MZ222394). While no predicted domains of the Hi-UCF putative protein were found according to SMART (see Text Footnote 2; (Schultz et al., 2000)), the first 24 amino acids of the predicted protein were found to code a signal peptide according  Genes were assigned to the pattern with which they had the highest Pearson's correlation, as long as the correlation was higher than 0.8. Genes that passed the counts cut-off threshold required that the normalized counts in all replicate samples will be larger than 500 in order to be considered as "1".
to  Table 2) was defined as "Uncharacterized male gene" (Hi-UCM; GenBank accession number: MZ222393). However, no conserved domains were found within the Hi-UCM putative protein sequence, and the probability of the protein containing a signal peptide was found to be low. In the eye transcriptome, representative unannotated malespecific (Hi-UCMe; GenBank accession number: MZ995266) and female-specific (Hi-UCFe; GenBank accession number: MZ995265) were defined as "Uncharacterized male eye gene" and "Uncharacterized female eye gene", respectively. Both Hi-UCMe (Hippolyte_Eye_TRINITY_DN22603_c0_g1, Linear FC = −3.09; see line 2066 in the "MF vs M" sheet in Supplementary Table 3) and Hi-UCFe (Hippolyte_Eye_TRINITY_DN39764_c0_g1, Linear FC = 4.19; see line 579 in the "MF vs. M" sheet in Supplementary Table 3) had a clear protein-coding region but only Hi-UCMe contained a signal peptide. The full nucleotide and amino acid sequences of Hi-UCF, Hi-UCM, Hi-UCFe and Hi-UCMe are given in Data S1 and the identified bodyand eye-specific genes in the joint reference transcriptome are presented in Supplementary Table 6.

AG and IAG in H. inermis
A dorsoventral histological section of an H. inermis male revealed the AG, as in other crustaceans (Charniaux-Cotton, 1962), at the base of the fifth pereiopod, adjacent to the sperm duct. The AG appeared to consist of dense nucleated cells ( Figure 6A). The Hi-IAG mRNA sequence ( Figure 6B) included fragments of 50, 426 and 419 bp for the 5 UTR, ORF and 3 UTR, respectively. The deduced structure of the Hi-IAG hormone, according to the predicted ORF, was found to contain a signal peptide (19 aa), a B chain (42 aa), an A chain (38 aa), and a C peptide (42 aa).
In silico and in vitro Expression of Sex-Specific Genes Throughout the H. inermis Life Cycle and Their Homologs in P. platyceros As expected, Hi-IAG and Hi-UCM relative transcript levels (Figure 7 -left panel) were found to be significantly different between the stages sampled in this study (ANOVA, F (3 , 8) = 17.775 and F (3 , 8) = 18.346, P < 0.05). Hi-Vg and Hi-UCF relative transcript levels (Figure 8 -left panel) were also found to be significantly different between the stages sampled in this study (ANOVA, F (3 , 8) = 45.099 and F (3 , 8) = 57.704, P < 0.05). More specifically, according to the post hoc Dunnett test, Hi-IAG and Hi-UCM transcript levels were significantly higher in the male stage compared to the immature and female stages, while Hi-Vg and Hi-UCF levels were significantly higher in the mature female stage compared to the immature, male and young female stages (P < 0.05). It is noteworthy that all qPCR results regarding relative transcript levels were consistent with the normalized read counts of the tested genes acquired from the in silico analyses of the body transcriptome (left and middle panels of Figures 7, 8).
The comparative analysis of the above H. inermis genes with the P. platyceros transcriptome revealed that the Hi-IAG homolog (Pnp-IAG) is exclusively expressed in the AG of P. platyceros males, while the Hi-UCM homolog is expressed in the P. platyceros eye with the highest transcript level in males, with the level decreasing as the animal transforms toward the female stage (Figure 7 -right panel). In contrast, the Hi-Vg homolog (Pnp-Vg) is expressed in the hepatopancreas of P. platyceros with a relatively low transcription level in males, with the level increasing as the animal transforms toward femaleness. The Hi-UCF homolog was found to be expressed in the gonad of P. platyceros with the highest transcript levels in females compared to male and transitional stages (Figure 8 right panel). Furthermore, according to the blastx and tblasn results against the TSA database, Hi-UCF protein homolog sequences were found in 21 other decapod species, including prawns and shrimps belonging to the families Palaemonidae, Lysmatidae, Alpheidae, Alvinocarididae, Atyidae, and Penaeidae, while the homolog sequence for Hi-UCM was found in 8 other decapod species, including prawns, shrimps and a crab from the families Palaemonidae, Lysmatidae, Alpheidae, and Portunidae ( Table 3). All corresponding sequences to Hi-IAG, Hi-Vg, Hi-UCF and Hi-UCM in P. playceros are given in Data S2. Moreover, homolog sequence for Hi-UCFe was found in 26 other decapods including prawns, shrimps, crayfish, lobsters and crabs from the families Portunidae, Penaeidae, Cancridae, Nephropidae, Lithodidae, Varunidae, Palaemonidae, Atyidae, Grapsidae, Astacidae, Gecarcinidae and Cambaridae while homolog sequence for Hi-UCMe was found in seven other shrimps and prawns from the families Palaemonidae and Lysmatidae (Supplementary Figure 2).
As per the IAG-switch-like protein from P. platyceros, only IAG-switch-like protein 1, 4, 5 and 6 corresponded to homologs in the H. inermis body transcriptome. Among them, only the homolog for IAG-switch-like 1 was malespecific (Hippolyte_Body_TRINITY_DN12510_c0_g1, Linear FC = −2.4; see line 1376 in the "MF vs. M" sheet in Supplementary Table 2) while the rest were not sexually biased (Supplementary Table 7).

DISCUSSION
In gonochoristic species, sexual differentiation processes are confined to early developmental stages -from embryogenesis to early post-larvae -while in hermaphrodite species they also occur during adult stages (Levy et al., 2018;Benvenuto and Weeks, 2020;Levy and Sagi, 2020). Indeed, in the present study, thousands of genes were recorded as differentially expressed and sexually biased during the life cycle of the protandric shrimp H. inermis, and homologs to representative genes were also found in P. platyceros and other decapod crustaceans. While the gene list generated in this study includes previously known key genes, such as the male-specific IAG and the female-specific Vg genes, which were used as reference genes, Hi-UCM, Hi-UCF, Hi-UCMe and Hi-UCFe were just four, among numerous novel (not yet annotated) candidate genes from the body and eye transcriptomes of H. inermis, that appear to be promising for further studies.
Hi-Vg, which was used as a female reference gene had, as expected, a transcriptional pattern that was negligible in H. inermis immature individuals and males but increased when the younger shrimp began to mature as females, similar to other protandric shrimps, such as P. platyceros (Levy et al., 2020b) and P. hypsinotus (Tsutsui et al., 2004;Okumura et al., 2005). In these two Pandalus species there is a clear transitional stage, in which vitellogenin gene and protein levels are intermediate between those during maleness and those during femaleness. In contrast, in H. inermis the transitional stage is absent (Cobos et al., 2005;Mutalipassi et al., 2018), and consequently the Hi-Vg level begins to rise at the young female stage and a sharp increase is observed in the mature female stage, as shown in this study. Also, the fact that 11 transcripts were annotated as vitellogenin in the H. inermis body transcriptome and some of them corresponded to vitellogenin 2 or vitellogeninlike genes supports the previous evidence that multiple copies of vitellogenin may be found in decapods (Zhao et al., 2021). In the male stage, the transcriptional pattern of Hi-IAG, which controls male differentiation and served as a male reference gene in H. inermis, differed from the pattern in P. platyceros: In the latter species, the IAG transcript level was sixfold higher in juveniles than in males (Levy et al., 2020a), but in H. inermis the IAG transcript level was negligible in immature animals and high in males.
The above findings could be explained by the differences in the lengths of the maturation period from juveniles to males: in H. inermis, this period lasts only about a month and is contained within a single molt cycle (Reverberi, 1950;Zupo et al., 2008), whereas in P. platyceros the maturation period lasts at least three years and thus extends over several molt cycles (Butler, 1965;King and Moffitt, 1984;Iversen et al., 1993;Kimker et al., 1996;Levy et al., 2020b). The very short maturation period in H. inermis makes it easy to miss the immature stage during which the IAG transcript level begins to rise. In addition, for this reason, AMs may be absent in immature stages of H. inermis, because sex maturation is a process expressed and contained in a very short period-a character unique to H. inermis among the crustaceans. Indeed, taken together, the absence of the AM in the immature H. inermis samples collected in the present study and the presence of a small AM in the juvenile P. platyceros samples collected in a previous study (Levy et al., 2020a) may imply that the immature H. inermis individuals in this study were collected at a very early stage before the development of the AG, compared to the juvenile P. platyceros in which the development of the AG already begun. Also, the fact that H. inermis homologs to four out of the six conserved IAG-switch like proteins that were previously described in P. platyceros, one of them with the same male-specific pattern, supports the claim that IAG-switch related genes are conserved among the Crustacea (Levy et al., 2020a).
While the IAG and Vg genes in H. inermis are reported here for the first time, the vast potential of the transcriptomic approach presented in this study is exemplified by the discovery of four representative novel transcripts, two male-specific and two female-specific, out of numerous potential candidates from the H. inermis body and eye transcriptomes that were found to be highly conserved among decapods. This is supported by the fact that the transcriptional pattern of Hi-UCF in the H. inermis body is positively correlated with the Hi-Vg transcript. Moreover, the in silico transcript level of the Hi-UCF homolog in the P. platyceros gonad (low in males and then increasing along the transition to the female stage) had similar transcription pattern to the Vg gene in the hepatopancreas of P. platyceros. However, although Vg is a prominent glycolipoprotein that is expressed in some decapods in both the hepatopancreas and the ovary of females (Okumura et al., 2007;Bai et al., 2015) and the Hi-UCF homolog in P. platyceros is expressed in the ovary and correlated with Vg transcription in the hepatopancreas, Hi-UCF is clearly different from Vg. Hi-UCF thus could be part of the vitellogenin toolkit or a controlling element in the vitellogenesis process.
A homolog to the representative body male-specific gene in H. inermis, Hi-UCM, was also expressed in the transcriptome of P. platyceros (Levy et al., 2020a), exclusively in the eye, with the expression being the highest in males and transitionals and decreasing as the animals transformed towards femaleness. Generally, a novel sex-specific gene found to be exclusively expressed in crustacean eyes could potentially be a sexcontrolling-related candidate gene, since the X-organ-sinus gland complex of the eyestalk in crustaceans produces numerous neurohormones that regulate key physiological processes, including molting, reproduction, and even AG development (Webster and Keller, 1986;Keller, 1992;Aguilar et al., 1996;Khalaila et al., 2002;Chang and Mykles, 2011;Katayama et al., 2013;Li et al., 2015;Pitts et al., 2017). However, unlike the behavior of its homolog in P. platyceros, Hi-UCM transcript levels were negligible in all stages sampled for the H. inermis eye transcriptome. Thus, Hi-UCM might be produced in the thoracic ganglia (TG), a component of the central nervous system (CNS), found in the crustacean body (Suwansa-Ard et al., 2015;Nguyen et al., 2018) rather than in the eyestalk. But, since Hi-UCM was not annotated as one of the well-known neurohormones that modulate reproduction and development [e.g., gonad-inhibiting hormone (GIH)/vitellogenesis-inhibiting hormone (VIH), the molt-inhibiting hormone (MIH), the crustacean hyperglycemic hormone (CHH) and the mandibular organ inhibiting hormone (MOIH) (Webster and Keller, 1986;Keller, 1992;Aguilar et al., 1996;Khalaila et al., 2002;Chang and Mykles, 2011;Katayama et al., 2013;Li et al., 2015;Pitts et al., 2017)], it might be a part of the genomic male-differentiation toolkit, upstream or downstream to the IAG-switch (which governs the activity of such hormones) rather than being a neurohormone per se.
To summarize, crustacean sexual development and differentiation and their control have largely been studied in gonochoristic species and, within these species, mostly in species of global importance to the growing aquaculture industry. However, to study genes that control sex-differentiation in such gonochoristic species, one must investigate the animals at very early developmental stages, which is not always easy and is sometimes impossible. According to the data generated from this study (i.e., a long list of novel sex-specific unannotated genes), the main finding of this study is that the solution to this problem may lie in investigating protandric species, because in these species the expression of genes related to the sexual differentiation mechanism are not limited to early developmental stages. Also, as exemplified here, it is likely that candidate genes found in protandric species (with or without intermediate stages) are conserved among crustaceans. Hence, a gene could be discovered in a hermaphrodite model species and then studied further in gonochoristic species. Eventually, the strategy adopted successfully in the current study, expands the debate over protandric life histories, previously based solely on histological evidence, by involving molecular tools utilizing protandric species to uncover parts of the crustacean sex-differentiation puzzle.