Phylogenetic and transcriptomic characterization of insulin and growth factor receptor tyrosine kinases in crustaceans

Receptor tyrosine kinases (RTKs) mediate the actions of growth factors in metazoans. In decapod crustaceans, RTKs are implicated in various physiological processes, such molting and growth, limb regeneration, reproduction and sexual differentiation, and innate immunity. RTKs are organized into two main types: insulin receptors (InsRs) and growth factor receptors, which include epidermal growth factor receptor (EGFR), fibroblast growth factor receptor (FGFR), vascular endothelial growth factor receptor (VEGFR), and platelet-derived growth factor receptor (PDGFR). The identities of crustacean RTK genes are incomplete. A phylogenetic analysis of the CrusTome transcriptome database, which included all major crustacean taxa, showed that RTK sequences segregated into receptor clades representing InsR (72 sequences), EGFR (228 sequences), FGFR (129 sequences), and PDGFR/VEGFR (PVR; 235 sequences). These four receptor families were distinguished by the domain organization of the extracellular N-terminal region and motif sequences in the protein kinase catalytic domain in the C-terminus or the ligand-binding domain in the N-terminus. EGFR1 formed a single monophyletic group, while the other RTK sequences were divided into subclades, designated InsR1-3, FGFR1-3, and PVR1-2. In decapods, isoforms within the RTK subclades were common. InsRs were characterized by leucine-rich repeat, furin-like cysteine-rich, and fibronectin type 3 domains in the N-terminus. EGFRs had leucine-rich repeat, furin-like cysteine-rich, and growth factor IV domains. N-terminal regions of FGFR1 had one to three immunoglobulin-like domains, whereas FGFR2 had a cadherin tandem repeat domain. PVRs had between two and five immunoglobulin-like domains. A classification nomenclature of the four RTK classes, based on phylogenetic analysis and multiple sequence alignments, is proposed.

RTK classes are distinguished by the functional domains in the N-terminal extracellular region (1,6).InsRs are characterized by two leucine-rich repeats (Receptor L1 and L2) flanking a furin-like cysteine-rich domain and two fibronectin type 3 (FN3) domains in the a subunit (7,13).EGFRs are characterized by L1 and L2 domains alternating with two furin-like cysteine-rich domains (6,8,14).FGFRs have three immunoglobulin-like domains (D1, D2, and D3), with a seven or eight amino acid "acid box" linking D1 and D2 (6,9).PDGFRs and VEGFRs are structurally related, which suggests a common origin.PDGFR and VEGFR have five or seven immunoglobulin-like domains (D1 to D7), respectively, that are involved in ligand binding (6,11,15).
In crustaceans, RTKs have been implicated in diverse physiological processes, particularly those involving reproduction, development, immunity, and growth.EGFR plays a role in ovarian development in the mud crab, Scylla paramamosain (16).EGFR and FGFR are linked to the ability of S. paramamosain and red swamp crayfish (Procambarus clarkii) to mount immune responses to pathogens (17)(18)(19).Knockdown of Mr-EGFR slows organismal growth, but it has no effect on molting frequency in freshwater prawn, Macrobrachium rosenbergii (20).By contrast, knockdown of Mr-InsR has no effect on organismal growth, but results in abnormalities in development of male sex characteristics and reproductive organs (21,22).In Chinese mitten crab, Eriocheir sinensis, Es-InsR expression is increased in limb regenerates and blocking InsR signaling with GSK1838705A slows regenerate growth (23).A male-specific InsR may be involved in sexual differentiation in Pacific whiteleg shrimp, Litopenaeus vannamei; Chinese shrimp, Fenneropenaeus chinensis; Eastern spiny lobster, Sagmariasus verreauxi; and S. paramamosain (24-27).VEGFR/PDGFR signaling is involved in immune responses to viral infection in L. vannamei; in hemopoiesis in signal crayfish, Pascifasticus leniusculus; and in regulating lipid metabolism in S. paramamosain (28)(29)(30).
Transcriptomics has assisted in the identification of RTKs in crustacean tissues, but these receptors have not been fully characterized (17,26,(31)(32)(33)(34)(35)(36)(37)(38)(39).Moreover, annotation and characterization of RTKs in diverse crustacean taxa has been hampered by databases that are limited to a relatively small number of species and taxonomic groups.Consequently, the number of RTK genes and/or isoforms present in crustaceans is unknown.Additionally, the patterns of evolution and diversification of RTKs across the Pancrustacea remain to be elucidated (31).CrusTome, a comprehensive multi-species database of crustacean transcriptomes (40), was used to identify contiguous sequences encoding insulin, EGF, FGF, and PDGF/ VEGF (PV) receptors in malacostracan and non-malacostracan crustaceans.A similar approach was used to identify G proteincoupled receptor candidates for crustacean hyperglycemic hormone neuropeptides (41).Characterization of decapod RTKs was emphasized, particularly in blackback land crab, Gecarcinus lateralis and green shore crab, Carcinus maenas, which have served as models for molting physiology for decades (42)(43)(44)(45)(46)(47)(48)(49)(50)(51)(52).Moreover, C. maenas is an invasive species that has established populations in temperate coastal regions (53).Its rapid growth and tolerance of a wide range of environmental conditions have contributed to its success (54)(55)(56).In G. lateralis, Gl-InsR, Gl-EGFR, Gl-FGFR, and other RTK signaling genes are expressed in transcriptomes of the molting gland (Y-organ), suggesting that growth factors have a direct effect on the synthesis of steroid molting hormones (ecdysteroids) (32,33,38,46).RTKs in C. maenas have not been characterized.Phylogenetic analysis and multiple sequence alignments revealed a rich diversity of RTK genes and isoforms.A classification nomenclature, based on InsR, EGFR, FGFR, and PVR clades and subclades, is proposed.

Materials and methods
Protein reference sequences for each receptor were collected from the NCBI GenBank database with a focus on arthropod sequences when available (Supplementary Material 1).Four iterative BLAST searches using these reference sequences against the CrusTome database (v.0.1.0)were then conducted to ensure that all possible matching sequences were found for a comprehensive phylogenetic analysis (40,57).Using Multiple Alignment using Fast Fourier Transform (MAFFT; v.7.490; (58), the BLAST search hits and the original reference sequence dataset were aligned with settings optimized for multi-domain proteins (as per (59) and to place a higher importance on accuracy rather than speed (-dashoriginalseqonly -genfpair -maxiterate 1000).The -dash parameter allowed MAFFT to refine the alignment by employing sequences from the Database of Aligned Structural Homologs (DASH; (60), which includes structural information to improve the alignment processes.Subsequently, ClipKIT (61), with the smart-gap parameter, was used to trim the alignment gaps while retaining phylogenetically informative sites for the most accurate phylogenetic inference.A maximum-likelihood phylogenetic reconstruction was undertaken with IQ-TREE (62) to accurately create a phylogeny of the sequences found for each given receptor using the models of evolution indicated by ModelFinder (63); VT+R8 for InsR, JTT+I+I+R6 for EGFR, VT+F+R7 for FGFR, and WAG+F +I+I+R7 for PDGFR/VEGFR).These trees were refined to reduce partial sequences (less than 200 aa for EGFR, 350 aa for InsR, 200 aa for PDGFR/VEGFR, 170 aa for FGFR), sequences with ambiguous or unknown residues (often found in Daphnia predicted transcriptomes), and any sequences that confidently lacked the domain organization of RTKs.Final phylogenies were reconstructed using the pruned input dataset.All final trees, their corresponding input files, and the alignments for G. lateralis and C. maenas can be found in Supplementary Material 2. Branch support for the finalized phylogenetic reconstructions was assessed via two complementary methods, the Ultra-Fast Bootstrap approximation (UFBoot; 1,000 iterations) and an approximate Bayes test (64)(65)(66).
A multiple sequence alignment restricted to brachyuran species was performed following the MAFFT strategy outlined above, to identify putative residues of structural and/or functional significance conserved across taxa (Supplementary Material 3).This alignment was subsequently used as input for the Motifs from Annotated Groups in Alignments (MAGA) tool (67) to identify motifs that could be employed to discriminate between RTK classes without the need of large-scale phylogenetic analyses.This tool consisted of a supervised method to detect motifs that can identify sites of structural, functional, and/or evolutionary significance based on sequence conservation within and across groups, as defined by the previous phylogenetic analyses.Multiple sequence alignments were produced to assess sequence content and conservation across receptor types and subclades among select decapod species.These alignments were generated with the MAFFT strategy and subsequently visualized with a custom script (code available at https://github.com/invertome/scripts/tree/main/plots; from (41).Additionally, the script generated sequence logo plots depicting the proportion of each residue found per alignment site.Amino acid residue colors that are proximal in color space, in both the alignments and logo plots, denote similarities in physicochemical characteristics of the corresponding residues (68).Additionally, further examination using NCBI's Conserved Domain Database (69) assisted with the comparison and identification of sequences.In this study, the protein sequences were analyzed using a database of recognized domains, which revealed commonly-found domains in RTKs, as well as domains that suggested a non-RTK identity.

Phylogenetic analysis of receptor tyrosine kinases
Maximum-likelihood phylogenetic analysis of crustacean RTKs in the CrusTome database produced a well-supported tree with four major clades, corresponding to InsR, EGFR, FGFR, and PVR classes (Figure 1).The EGFR class consisted of a single monophyletic group, designated EGFR1 (Figure 1).The other classes segregated into subclades denoting possible receptor subtypes.The analyses supported a classification nomenclature based on these clades and subclades.The InsR clade had three subclades, designated InsR1, InsR2, and InsR3; the FGFR clade had three subclades, designated FGFR1, FGFR2, and FGFR3; and the PVR clade had two subclades, designated PVR1 and PVR2 (Figure 1; full unedited tree provided in Supplementary Material 2).
Table 1 summarizes the distribution of RTK sequences obtained from pancrustacean and tardigrade transcriptomes in the CrusTome database.The 51 decapod species had the highest number of RTK sequences, which included 60 InsR sequences (Table 1).Fewer InsR sequences were identified in non-decapod taxa; the next highest number was eight sequences in 22 isopod species, followed by four sequences in two euphausiid species (Table 1).InsR sequences were not obtained from transcriptomes from the other 11 taxa (Table 1).By contrast, growth factor receptor sequences were well represented in seven pancrustacean taxa: a total of 344 in decapods; 128 in 22 isopod species; 97 in 26 amphipod species; 95 in two branchiopod species; 55 in two euphausiid Phylogenetic analysis of pancrustacean receptor tyrosine kinases.A phylogenetic tree synthesized using all the initial references sequences, all decapod sequences, and the identified sequences found in other species from other studies using a WAG+F+R9 substitution model of evolution.The pink clade represents an overall InsR identity.The green clade is EGFR, and the blue and purple are FGFR and PVR, respectively.Non-RTK and non-decapod sequences were removed for clarity.The full unedited tree can be found in Supplementary Material 2.  1).
Multiple sequence alignment of the identified decapod InsR sequences kinase domain with a Drosophila melanogaster reference revealed remarkable conservation of ATP-binding sites (10 out of 12 sites), including those outside (Figure 4; reference alignment positions #1823, #1846, #1848, #1895, #1897, and #1901) and in the loop regions (reference alignment positions #1967, #1968, #1970, and #1984).Peptide-binding residues on the other hand were only conserved across InsR subtypes in 5 out of 11 identified sites (Figure 4, reference alignment positions #1967, #1968, and #2006; positions #2015, #2050 in the loop region).MAGA search identified conserved motifs in decapod InsR proteins (67).A VHRDLAARNC motif, located in the catalytic loop, was conserved in all decapod InsRs, which distinguished the InsRs from the decapod growth factor RTKs (Table 3, Figure 4, reference alignment positions #1960 to #1969; Supplementary Material 1).The three InsR subclades were distinguished by motif sequences in a 20-amino acid stretch located proximal to the beginning of the first FN3 domain in the Nterminus.There were four residues in the motif that were conserved in all decapod InsRs (Table 3, Supplementary Material 4).The 20-amino acid sequence was highly conserved in InsR1 (Table 3, Supplementary Material 4, reference sequence positions #840 to #859).Although the motif sequences varied among InsR2 and InsR3 subclades, there were consistent differences in the sequences to distinguish the two subclades (Table 3).
isoforms showed that the four sequences, designated Gl-EGFR1-A1, -A2, -A3, and -A4, were likely products of a single gene (Figure 5B, Table 4, Supplementary Material 2).Four C. maenas isoforms, designated Cm-EGFR1-A1, -A2, -A3, and -A4, grouped proximally to G. lateralis and other brachyurans (Figure 5B, Table 4, Supplementary Material 2).In S. paramamosain, three contig sequences obtained from the CrusTome database grouped with three previously-described EGFR1 coding sequences (17) (Figure 5B, Table 4, Supplementary Material 2).A conserved domain search for the G. lateralis and S. paramamosain EGFR sequences ( 17) and a D. melanogaster reference sequence showed a highly conserved domain organization.The N-terminal region contained two leucine repeat (Receptor L) domains, a furin-like cysteine rich region, and a growth factor receptor domain IV (Figure 6).The C-terminal region had an EGFR-like protein tyrosine kinase catalytic domain.The D. melanogaster sequence and Sp-EGFR1 sequence had an additional furin-like repeat (Figure 6).Multiple sequence alignment of the catalytic domain of the decapod EFGR with a Drosophila reference revealed high conservation of ATP-binding sites in the catalytic and activation loop regions (Figure 7).Ten peptide-binding residues were identified based on homology to Drosophila, nine of which were conserved across Decapoda and Hexapoda (Figure 7, reference alignment positions #1080, #1109, #1111, #1112, #1113, #1115, #1116, #1125, #1128).Only one peptide binding site differed between Drosophila and the decapods investigated (Figure 7, reference alignment position #1126), with both presenting hydrophilic residues (arginine and glutamine, respectively) in the aforementioned position.
Multiple FGFR contigs were identified in decapod species (Figure 8, Table 5).Analysis of G. lateralis FGFR1 sequences and C. maenas FGFR1, FGFR2, and FGFR3 sequences showed that the isoforms were products from a single gene for each subclade.In G. lateralis, two isoforms, designated Gl-FGFR1-A1 and -A2, were apparently alternatively-spliced products of the same gene, based on highly conserved regions in the DNA alignment (Table 5, Supplementary Material 2).There were also two C. maenas isoforms of a single gene (Cm-FGFR1-A1 and Cm-FGFR1-A2; Table 5, Supplementary Material 2).The FGFR2 and FGFR3 subclades had one G. lateralis contig sequence in each subclade (Gl-FGFR2 and Gl-FGFR3) and two C. maenas isoform sequences in the FGFR2 subclade (Cm-FGFR2-A1 and Cm-FGFR2-A2) (Table 5, Supplementary Material 2).Sequence alignment of C. maenas FGFR3 showed that the sequences were nearly identical, suggesting either allelic variation or slight discrepancies caused by the difference in tissue types (YO vs. CNS; Table 5).
The decapod FGFR sequences showed a similar domain organization.Analysis of the G. lateralis Gl-FGFR1-A1 sequence, the Sp-FGFR3 and Pc-FGFR4 sequences from previous studies (18,19), and a D. melanogaster reference FGFR sequence showed two to three immunoglobulin-like domains in the N-terminal region and a protein tyrosine kinase catalytic domain in the Cterminal region (Figure 9).Gl-FGFR1-A2 was a partial sequence missing a portion of the N-terminal sequence; only one immunoglobulin-like domain was identified (Figure 9).Gl-FGFR2 and Gl-FGFR3 were partial sequences that lacked immunoglobulin domains (Table 5, Figure 9).Interestingly, the N-terminus of Gl-FGFR2 had a cadherin tandem repeat domain (Figure 9).
The decapod PVR2 sequences were separated into two wellsupported groups, designated PVR2-A and PVR2-B (Figures 11C,  D, Table 7).Two G. lateralis contigs, designated Gl-PVR2-A1 and Gl-PVR2-A2, differed in single nucleotide polymorphisms, suggesting that they were products of two genes (Table 7, Supplementary Material 2).By contrast, two C. maenas isoforms of one gene were identified (Cm-PVR2-A1a and Cm-PVR2-A1b; Table 7).Two C. maenas contigs, one from CNS and the other from YO, were assigned to Cm-PVR2-A1a, due to their high similarity in sequence identity (Table 7, Supplementary Material 2).Four sequences were assigned to Lv-PVR2-A and two sequences were  2).
A conserved domain search of the G. lateralis, L. vannamei, and P. leniusculus PVR sequences and a D. melanogaster reference sequence revealed a similar domain organization.The N-terminal region had between two and five immunoglobulin-like domains (Figure 12).The C-terminal region had a protein tyrosine kinase catalytic domain (Figure 12).
MAGA search and a multiple sequence alignment of the contigs in Table 7 identified a 46-amino acid motif in the catalytic domain that distinguished the PVR1 and PVR2 subclades (Table 8) (67).The motif was bounded a conserved "HGDLA" at the N-terminal end and a conserved "PxKW" at the C-terminal end (Table 8, Figure 13, reference alignment positions #1348-1352 and #1390-1393, respectively).It should be noted that the glycine (G) in the Multiple sequence alignment and logo plot of the catalytic domain of Drosophila and decapod insulin receptors.Includes representative species from Table 2 (Gecarcinus lateralis, Carcinus maenas, Cancer borealis, Sagmariasus verreauxi, Fenneropenaeus chinensis, Scylla paramamosain, Macrobrachium rosenbergii, and Eriocheir sinensis) and Drosophila melanogaster INSR1 (Accession: AAC47458.1)as a reference for comparison.The alignment illustrates the composition and length of conserved regions within subclades that reflect putative differences in ligands and/or binding affinities between receptor types.Catalytic loop and activation loop regions are demarcated by red and blue rectangles, respectively.ATP-binding and peptide-binding amino acid residues are annotated with green circles and orange squares, respectively, above the reference position.Partial sequences were excluded for ease of visualization and interpretation.MSA color scheme corresponds to similarities in physicochemical properties of amino acid residues.Logo plot illustrates conserved amino acid residues as a proportion of all the sequences included.6, 8).Multiple sequence alignment of the catalytic domain of decapod PVRs with a Drosophila melanogaster reference revealed structural diversity between and within PVR subtypes (Figure 13).The four amino

Discussion
Phylogenetic analysis of the CrusTome database yielded the most extensive catalog of Pancrustacea RTK contig sequences to date.The large number of species from major Crustacea taxa provided a higher confidence in distinguishing RTK types and identifying genes and isoforms.A total of 988 contigs encoding    (6,9,11,72).This is consistent with the inferred evolutionary histories with the ancestral versions being and/or containing InsR and EGFR domains, and FGFR, PDGFR, and VEGFR constituting later evolved receptors (72,73).Interestingly, while the other receptors distributed into a number of subclades, EGFR was highly conserved across crustacean taxa, suggesting that its role in Multiple sequence alignment and logo plot of the catalytic domain of Drosophila and decapod EGF receptors.Includes representative species from Table 4 (Gecarcinus lateralis, Carcinus maenas, Cancer borealis, Litopenaeus vannamei, Scylla paramamosain, and Eriocheir sinensis) and Drosophila melanogaster EGFR (Accession: NP476759.1)as a reference for comparison.The alignment illustrates the composition and length of conserved regions within subclades that reflect putative differences in ligands and/or binding affinities between receptor types.Catalytic loop and activation loop regions are demarcated by red and blue rectangles, respectively.ATP-binding and peptide-binding amino acid residues are annotated with green circles and orange squares, respectively, above the reference position.Partial sequences were excluded for ease of visualization and interpretation.MSA color scheme corresponds to similarities in physicochemical properties of amino acid residues.Logo plot illustrates conserved amino acid residues as a proportion of all the sequences included.
physiological processes is conserved across the Metazoa.RTK subclades often contained diverse pancrustacean taxa, and their topologies mirrored pancrustacean evolutionary history (74).This suggests that ancient duplication events gave rise to the diversity of RTKs observed today (31).In addition, the aforementioned phylogenetic reconstructions and classification resulted in a high diversity of newly characterized arthropod RTK sequences.
isoforms), M. rosenbergii (Mr-IR), and C. borealis (Figure 2, Table 2, Supplementary Material 2).Gl-InsR1 contained a nucleotide sequence of 1264 bp; Gl-InsR3-A1, -A2, and -A3 contained nucleotide sequences of 5647 bp, 2530 bp, and 2206 bp, respectively.The de novo assemblies produced only partial sequences, possibly due to low levels of expression in the sequenced tissues.A full-length Gl-InsR3-A1 sequence was constructed from three overlapping partial contigs (Table 2).Gl-InsR3-A1 was similar to cDNAs encoding M. rosenbergii insulin receptor (Mr-IR) and E. sinensis insulin-like receptor (Es-InR); the sequences were assigned to the R3 subclade (Table 2, Supplementary Material 2) (21,23).An InsR binds insulin-like androgenic gland hormone (IAG), an insulin-like peptide (ILP) that determines male sexual characters by the androgenic gland (75)(76)(77).InsR2 subclade members in S. paramamosain, S. verreauxi, L. vannamei, and F. chinensis (Table 2) appear to be IAG receptors, as InsR2 is only expressed in male reproductive tissues (e.g., testis, sperm duct, terminal ampullae, and androgenic gland) and RNAi knockdown of InsR2 reduces testicular development (25,27).Moreover, in vitro binding assays show interactions between IAG and InsR2 (25,26).The ligands of the InsR1 and InsR3 subclades are unknown.dsRNA knockdown of Mr-IR/Mr-InsR-R3 did not result in sex reversal, suggesting that a different InsR gene is involved (21).However, Tan et al. (2020) reported sex reversal in one or two M. rosenbergii individuals with dsRNA or siRNA knockdown of Mr-IR (22).Es-InR/ Es-InsR3 is implicated in limb regeneration, as Es-InR is up-regulated in limb regenerates and an InR inhibitor (GSK1838705A) suppresses limb regenerate growth (23).Seventy-seven decapod EGFR sequences were organized into a single monophyletic clade (EGFR1; Figure 1, Table 1).The assignment of the EGFR1s to a single clade was supported by the high amino acid sequence identity in the catalytic domain (Figure 7).Multiple isoforms were common (Figure 5, Table 4).G. lateralis had one EGFR gene with four isoforms obtained from the YO transcriptome; the contigs ranged from 4280 bp to 5550 bp and classified as Gl-EGFR1-A1, -A2, -A3, and -A4 (Figure 5, Table 4).Four C. maenas EGFR isoforms were also obtainedtwo from the YO transcriptome and two from the CNS transcriptome (Figure 5, Table 4).This compares to a single 6864-bp M. rosenbergii EGFR sequence obtained from the SRA database (Table 4) (20).Single EGFR sequences were also obtained from L. vannamei, E. sinensis, and C. borealis transcriptomes (Table 4).Three distinct EGFR transcripts varying between 5076 bp and 5457 bp have been identified in S. paramamosain (Table 4) (17).cDNAs encoding two genes, designated Sp-EGFR1 and Sp-EGFR2, were obtained by PCR of genomic DNA, followed by RACE of RNA from hepatopancreas (17).Sp-EGFR1 produces a single coding sequence, whereas Sp-EGFR2 produces two alternatively-spliced isoforms, designated Sp-EGFR2a and Sp-EGFR2b (17).Previously, a full-length Sp-EGFR sequence was cloned from ovary (16).As the protein sequences of Sp-EGFR and Sp-EGFR1 are identical, it is likely that they are products of the same gene.Comprehensive phylogenetic analyses and multiple sequence alignments in the present study suggest that all three Domain organization of Drosophila and decapod FGF receptors.Listed sequences include a model organism (D. melanogaster), all G. lateralis sequences found, and identified genes in other species using the classification as listed in the original referenced studies (Table 5).TABLE 6 Motif sequences distinguishing the three decapod FGFRs.

Receptor
Consensus sequences , and FGFR3 were distinguished by a 118-amino acid motif sequence in FGFR1 and by 117-amino acid sequences in FGFR2 and FGFR3, located in the catalytic domain in the C-terminal region.The 16-amino acid catalytic loop is underlined.Residues that are identical between all the sequences from Table 5 are indicated by bold font.Consensus sequences were obtained using the MAGA tool (67) and multiple sequence alignment (Figure 10).
sequences are isoforms of one gene and not two separate gene products as previously by Cheng et al. (17).Three partial contig sequences identified in the CrusTome database matched the three S. paramamosain cDNA sequences (Table 4, Figure 5, Supplementary Material 2).Thus, there are three Sp-EGFR coding sequences, which are designated Sp-EGFR1-A1, -A2, and -A3 (Table 4).Decapod EGFRs, which are widely expressed in tissues, mediate physiological processes involving growth and differentiation.Mr-EGFR is expressed in thoracic ganglion, heart, hepatopancreas, muscle, ovary in females, and testis and sperm duct in males (20).dsRNA knockdown of Mr-EGFR in male prawns inhibits moltincremental growth; inhibits growth of a male-specific secondary sexual characteristic (appendix masculina); and disrupts eye ommatidia organization (20).In S. paramamosain, EGFRs are expressed in all tissues (16,17).Sp-EGFR/Sp-EGFR1-A1 is expressed in 14 tissues, with higher expression in heart, YO, ovary, gill, and stomach (16).Sp-EGFR1/Sp-EGFR1-A1, Sp- Multiple sequence alignment and logo plot of the catalytic domain of Drosophila and decapod FGF receptors.Includes representative species from Table 5 (Gecarcinus lateralis, Carcinus maenas, Procambarus clarkii, Eriocheir sinensis, Cancer borealis, and Scylla paramamosain) and Drosophila melanogaster FGFR (Accession: BAA03617.1)as a reference for comparison.The alignment illustrates the composition and length of conserved regions within subclades that reflect putative differences in ligands and/or binding affinities between receptor types.Catalytic loop and activation loop regions are demarcated by red and blue rectangles, respectively.ATP-binding and peptide-binding amino acid residues are annotated with green circles and orange squares, respectively, above the reference position.Partial sequences were excluded for ease of visualization and interpretation.MSA color scheme corresponds to similarities in physicochemical properties of amino acid residues.Logo plot illustrates conserved amino acid residues as a proportion of all the sequences included.Domain organization of PDGF/VEGF (PV) receptors.Listed sequences include a model organism (D. melanogaster), all G. lateralis sequences found, and identified genes in other species using the classification as listed in the original referenced studies (Table 7).The shaded light green sections are the kinase-insert domains in mammalian PDGFRs and VEGFRs (6).
One hundred and twenty-nine decapod FGFR sequences were organized into three clades, designated FGFR1, FGFR2, and FGFR3 (Figure 1, Table 1).FGFR1 contigs were identified in G. lateralis (2 isoforms), C. maenas (2 isoforms), L. vannamei, E. sinensis, C. borealis, and S. paramamosain (Table 5).A cDNA encoding Sp-FGFR3 was cloned from S. paramamosain hemocytes (18).As the Sp-FGFR1 contig sequences and the Sp-FGFR3 sequence were similar (Figure 8A), Sp-FGFR3 was assigned to the FGFR1 subtype (Sp-FGFR1; Table 5).Likewise, a cDNA encoding Pc-FGFR4, which was cloned from P. clarkii hemocytes and hepatopancreas (19), clustered with other decapod FGFR1 sequences (designated Pc-FGFR1; Figure 8A).Gl-FGFR1 proteins with less than three immunoglobulin domains were partial sequences (Figure 9).FGFR2 and FGFR3 contigs were identified in G. lateralis, C. maenas (2 isoforms), L. vannamei, E. sinensis, C. borealis, and S. paramamosain; all seven FGFR2 and all seven FGFR3 sequences were novel (Table 5).The N-terminal region of Gl-FGFR2 and Gl-FGFR3 lacked immunoglobulin domains (Figure 9).Interestingly, Gl-FGFR2 had a cadherin tandem repeat domain, which occurs in other RTKs (78).This illustrates the challenge of using sequence-similarity based methods for growth factor receptor identification.However, their identity as FGFRs was confirmed by the conserved protein tyrosine kinase domain shared by all the decapod sequences (Figures 9, 10).Residues that are identical between all the sequences from Table 7 are indicated by bold font.Consensus sequences were obtained using the MAGA tool (67) and multiple sequence alignment (Figure 13).Multiple sequence alignment and logo plot of the catalytic domain of Drosophila and decapod PDGF/VEGF (PV) receptors.Includes representative species from Table 7 (Gecarcinus lateralis, Carcinus maenas, Cancer borealis, Eriocheir sinensis, and Scylla paramamosain) and Drosophila melanogaster PVR (Accession: NP001260235.1)as a reference for comparison.The alignment illustrates the composition and length of conserved regions within subclades that reflect putative differences in ligands and/or binding affinities between receptor types.Catalytic loop and activation loop regions are demarcated by red and blue rectangles, respectively.ATP-binding and peptide-binding amino acid residues are annotated with green circles and orange squares, respectively, above the reference position.Partial sequences were excluded for ease of visualization and interpretation.MSA color scheme corresponds to similarities in physicochemical properties of amino acid residues.Logo plot illustrates conserved amino acid residues as a proportion of all the sequences included.There are few reports on the functions of FGFRs in decapods, and those studies are restricted to members of the FGFR1 subclade.The functions of FGFR2 and FGFR3 are unknown.In P. clarkii and S. paramamosain, FGFR1 is involved in innate immunity.Viral and bacterial infection increases mRNA levels of Sp-FGFR3 in the hepatopancreas and Pc-FGFR4 in hemocytes and hepatopancreas (18,19).Moreover, RNAi knockdown of Pc-FGFR4 and Sp-FGFR3 or FGFR inhibitor (Pemigatinib) decreased mRNA levels of immunity-related genes (18,19).FGFR1s are broadly expressed in crustacean tissues, with higher Pc-FGFR4 mRNA levels in eyestalk ganglia, stomach, heart, intestine, and hepatopancreas and higher Sp-FGFR3 levels in hepatopancreas, muscle, intestine, and heart (18,19).Given their wide tissue expression, it is likely that FGFRs are involved in other processes.For example, in crayfish and other decapods, FGF controls blastemal growth during the initial stage of limb regeneration (79).
The PVRs were the most diverse of the four RTK classes.A total of 138 decapod PVR sequences were divided into two major subclades (PVR1 and PVR2; Figures 1, 11, Table 7).PVR2 was further divided into PVR2A and PVR2B sequences, with PVR2B brachyuran-specific (Figures 11C, D, Table 7).The PVR tree was constructed by using PDGFR and VEGFR sequences jointly, as vertebrate VEGFR and PDGFR are not clearly differentiated in invertebrates (80).The evolution of VEGFRs and PDGFRs parallels the diversification and expansion of VEGFs in metazoans (81).An ancestral VEGFR/PDGFR ortholog, originally discovered in Drosophila, was designated PVR (PDGFR and VEGFR-Related Receptor), which diverged and led to PDGFR and VEGFR genes in vertebrates (80, 82).The lack of a clear distinction between PDGFR and VEGFR genes has contributed to inconsistencies in the annotation of homologous sequences in invertebrates.According to the classification proposed in PVR signaling is implicated in diverse physiological processes in decapods.In L. vannamei, five VEGFs and two VEGFRs are part of the immune response to viral infections; knockdown of VEGF and VEGFR expression reduces mortality, suggesting that PVR signaling supports viral replication (70,71,(83)(84)(85).VEGF-and VEGFR-like immunoreactivities are localized in the eyestalk ganglia of the swamp ghost crab (Ucides cordatus), suggesting that VEGF is involved in neuron and glial cell differentiation and maintenance (86).In S. paramamosain, a VEGF-like gene (Sp-vegfb) has a role in lipid accumulation in the hepatopancreas and other tissues (30).In P. leniusculus, PVR signaling controls hematopoiesis by affecting extracellular transglutaminase (TGase) activity.Pl-PVR1 is expressed in hemocytes and hematopoietic tissue (HPT) (29).Sunitinib malate, a PVR inhibitor, decreases HPT progenitor cell migration and round cell morphology and increases HPT cell spreading and extracellular TGase activity (29).
Multiple sequence alignments of the catalytic domain aided the identification and classification of decapod RTKs.All four RTK classes shared three consensus motifs: the glycine-rich loop (GxGxFG), which plays a role in ATP binding; the aspartatephenylalanine-glycine motif (DFG) near the activation loop; and the histidine-arginine-aspartate-leucine-alanine (HRDLA) motif in the catalytic loop (Supplementary Material 3) (87,88).The only variation was in the PVR sequences, in which glycine replaced the arginine in the catalytic loop motif (HGDLA, Figure 13).Members of the EGFR class were readily identified by the high conservation in the catalytic domain; there were only four positions in the entire 305-amino acid sequence that differed (Figure 7).All the InsR had the same "VHRDLAARNC" sequence in the catalytic loop (Table 3; Figure 4), but the three InsR subtypes differed in sequences of a 20amino acid motif located in the first FN3 domain in the N-terminal region (Table 3).The three FGFR subtypes differed in motif sequences (118 amino acids in FGFR1 and 117 amino acids in FGFR2 and FGFR3) that included the catalytic loop (Table 6, Figure 10).While conserved motifs are certainly useful in discriminating RTK types and subtypes, further work is required to elucidate their functional relevance.The complete conservation of the residues involved in ATP binding and peptide binding in EGFR1 (Figure 7) suggests that all members of the clade share the same catalytic properties.By contrast, the residues involved in ATP binding and peptide binding in the InsR, FGFR, and PVR sequences were not always conserved (Figures 4, 10, 13), suggesting that the subtypes within each clade differ in catalytic properties.
Processes such as development, growth, homeostasis, cell proliferation, and metabolism are regulated by growth factors, many of which are mediated by RTKs (3,4,82).In insects, RTK signaling controls molting by stimulating mechanistic target of rapamycin (mTOR)-dependent synthesis and secretion of molting hormones (ecdysteroids) by the prothoracic gland (82,(89)(90)(91)(92)(93).By contrast, the control of mTOR-dependent YO ecdysteroidogenesis by growth factor/RTK signaling has not been established (46).In G. lateralis, previous identification of Gl-EGF, Gl-FGF, Gl-EGFR, Gl-FGFR, and Gl-InsR in the YO transcriptome suggested that growth factors stimulate ecdysteroidogenesis, possibly through an autocrine mechanism (32,33,38,46).The identification of multiple subtypes and isoforms provides a comprehensive catalog of RTK genes for functional analysis.Many of these RTKs were expressed in G. lateralis and C. maenas YO transcriptomes (Tables 2, 4, 5, 7).The YO is primarily regulated by molt-inhibiting hormone (MIH), a neuropeptide that binds to a G protein-coupled receptor to inhibit ecdysteroid synthesis (41,45,46).A drop in MIH release from neurosecretory neurons in the eyestalk ganglia activates the YO and the animal enters early premolt (45).Growth factor receptors may sustain high rates of ecdysteroid synthesis by the committed YO during mid-and late premolt (46).For example, EGFR signaling in the prothoracic gland supports ecdysteroidogenesis during the lava to pupa transition in Drosophila (92).

Conclusions
Bioinformatic and phylogenetic analysis using the CrusTome database yielded a rich diversity of hundreds of RTK contigs distributed across all crustacean taxa.The sequences were organized into InsR, EGFR, FGFR, and PVR clades, subclades, and isoforms, providing a framework for a classification nomenclature.Moreover, this extensive catalog of crustacean RTKs facilitates a systematic analysis of InsR, EGFR, FGFR, and PVR functions in various physiological processes, including, but not limited to, molting and growth, reproduction, regeneration, development and metamorphosis, nutrition and metabolism, and immunity, as well as their interactions with environmental stressors arising from climate change (94)(95)(96).Moreover, a greater understanding of growth factor/RTK signaling has important applications to sustainable aquacultural practices and the development of entirely new bioindustries, such as cellular agriculture and cultivated meats (28,(97)(98)(99)(100)(101)(102).

2
FIGURE 2 Insulin receptor phylogeny.A phylogenetic tree of InsR exhibiting an array of species, including two G. lateralis genes and three C. maenas genes.Inset depicts entire tree, divided into sections (A) (InsR1 and InsR2) and (B) (InsR3 and EGFR), for orientation.Further analysis of the conserved domains of the ROT67026.1 sequence suggested an identity other than an RTK, making it an outgroup for the tree.Support values correspond to the approximate Bayes test and the Ultra-Fast Bootstrap approximation with the VT+R8 substitution model of evolution.Images from PhyloPic: Homarus (lobster) by Steven Traver; Caridina multidentate (shrimp) by Douglas Teles da Rosa; Metacarcinus magister (Dungeness crab) by Harold Eyster; Pagurus pubescens (hermit crab) by T. Michael Keesey; and Sophophora melanogaster (fly) by Andy Wilson.

FIGURE 3
FIGURE 3Domain organization of Drosophila and decapod insulin receptors.Listed sequences include a model organism (D. melanogaster), G. lateralis sequences, and identified genes in other species using the classification as listed in the original referenced studies (Table2).

FIGURE 4
FIGURE 4 HGDLA sequence was replaced by an arginine (R) in Lv-VEGFR2 (Supplementary Material 2) (70).The PVR motif was located Nterminal to the FGFR motif, with the HGDLA/HRDLA sequence marking the N-terminal and C-terminal boundaries of the PVR and FGFR motifs, respectively (Tables

FIGURE 6
FIGURE 6Domain organization of Drosophila and decapod EGF receptors.Listed sequences include a model organism (D. melanogaster), G. lateralis sequences, and identified genes in other species using the classification as listed in the original referenced studies (Table4).
Contigs encoding PVRs in the CrusTome 1.0 database and previously identified PVRs in other decapods.Gene names are the proposed classification, based on clades and subclades from taxonomically comprehensive phylogenetic analyses.The C. maenas sequences with the same classification are the same version of a gene/isoform from different tissues with the small differences in sequences.Species: Gecarcinus lateralis, Carcinus maenas, Pacifastacus leniusculus, Cancer borealis, Eriocheir sinensis, Litopenaeus vannamei, and Scylla paramamosain.Tissue sources: CNS, central nervous system; HeTC, Hematopoietic Tissue Cells; N, neural tissues; MD, multiple developmental stages of whole larvae; PW, pooled whole organism; W, whole organism; and YO, Yorgan.Sequences available in Supplementary Material 1. Asterisk (*) indicates partial sequence; open reading frame incomplete. 1 from (29).

TABLE 1
Summary of CrusTome pancrustacean and tardigrade receptor tyrosine kinase sequences.

TABLE 2
Classification of decapod insulin receptors.

TABLE 3
Motif sequences distinguishing decapod insulin receptors.InsR3 were distinguished by a 20-amino acid motif sequence located near the N-terminal end of the first FN3 domain in the N-terminal region.All InsRs had a conserved 10-amino acid sequence in the catalytic loop in the catalytic domain in the Cterminal region.Residues that are identical between all the sequences from Table2are indicated by bold font.Consensus sequences were obtained using the MAGA tool (67) and multiple sequence alignment (Figure4).

TABLE 4
Classification of decapod EGF receptors.

TABLE 8
Motif sequences distinguishing the decapod PVRs.