Evolution and Potential Function in Molluscs of Neuropeptide and Receptor Homologues of the Insect Allatostatins

The allatostatins (ASTs), AST-A, AST-B and AST-C, have mainly been investigated in insects. They are a large group of small pleotropic alloregulatory neuropeptides that are unrelated in sequence and activate receptors of the rhodopsin G-protein coupled receptor family (GPCRs). The characteristics and functions of the homologue systems in the molluscs (Buccalin, MIP and AST-C-like), the second most diverse group of protostomes after the arthropods, and of high interest for evolutionary studies due to their less rearranged genomes remains to be explored. In the present study their evolution is deciphered in molluscs and putative functions assigned in bivalves through meta-analysis of transcriptomes and experiments. Homologues of the three arthropod AST-type peptide precursors were identified in molluscs and produce a larger number of mature peptides than in insects. The number of putative receptors were also distinct across mollusc species due to lineage and species-specific duplications. Our evolutionary analysis of the receptors identified for the first time in a mollusc, the cephalopod, GALR-like genes, which challenges the accepted paradigm that AST-AR/buccalin-Rs are the orthologues of vertebrate GALRs in protostomes. Tissue transcriptomes revealed the peptides, and their putative receptors have a widespread distribution in bivalves and in the bivalve Mytilus galloprovincialis, elements of the three peptide-receptor systems are highly abundant in the mantle an innate immune barrier tissue. Exposure of M. galloprovincialis to lipopolysaccharide or a marine pathogenic bacterium, Vibrio harveyi, provoked significant modifications in the expression of genes of the peptide precursor and receptors of the AST-C-like system in the mantle suggesting involvement in the immune response. Overall, our study reveals that homologues of the arthropod AST-systems in molluscs are potentially more complex due to the greater number of putative mature peptides and receptor genes. In bivalves they have a broad and varying tissue distribution and abundance, and the elements of the AST-C-like family may have a putative function in the immune response.


INTRODUCTION
Molluscs are the second most diverse animal group after the insects and belong to the speciose Lophotrochozoan clade. Their success is linked to their adaptation to a wide variety of habitats, and they are found from the abysses of the sea to mud flats and even as parasites dwelling in other animals. Unlike the more popular protostome models of the nematodes and insects that have substantial genome rearrangements and gene loss (1)(2)(3)(4), the molluscs have a more similar genome organisation and gene repertoire to deuterostomes. Their success, exquisite diversity in form and function and less rearranged genomes makes the molluscs of particular interest for evolutionary studies directed at deciphering the evolution, diversification and role of neuroendocrine factors (1,5).
The allatostatins (AST) are a diverse group of small neuropeptides that have low sequence conservation but have overlapping functions as insect allatoregulatory peptides (6). Three AST peptide families exist and are named allatostatin A (AST-A), B (AST-B) and C (AST-C). They were first described in insects as inhibitors of the biosynthesis by the corpora allatum (CA) gland, of juvenile hormone (JH), an important regulator of arthropod development and reproduction (7)(8)(9)(10). The ASTs are now recognized to be involved in a diversity of other activities and play a key role in arthropod physiology (8,11) and immunity (12)(13)(14). The ASTs are best studied in insects and despite the relatively low gene sequence conservation between the AST families their function and distribution has been conserved. This provides an interesting opportunity to assess if functional constraints have shaped AST evolution in the same way across the protostomes.
The members of each AST family derive from distinct precursor proteins that are assumed to undergo proteolytic cleavage to generate multiple peptides with a similar structure and sequence. The exception is the precursor of AST-C which encodes a single peptide. AST peptides bind and activate members of the rhodopsin G-protein coupled receptor (GPCR) superfamily. AST-A and AST-C activate receptors of the rhodopsin-gamma GPCR cluster while AST-B activates receptors of the rhodopsin-beta GPCR cluster (15)(16)(17). Sequence orthologues of ASTs and of their receptors have been identified in other protostomes outside the arthropod phylum, such as the molluscs, the second most diverse protostome phylum after the arthropods. The evolution and function of the AST families in molluscs that lack a CA are at present poorly described (18)(19)(20)(21)(22)(23)(24)(25)(26)(27).
AST-A was the first AST to be described and was initially isolated from the cockroach, Diploptera punctata (28,29) and the peptides are characterized by a conserved C-terminal FGL-amide motif. In arthropods a single peptide precursor exists and it is mostly expressed in the nervous system and mid-gut (30,31) and depending on the species encodes a differing number of peptides from 14 to 13 in Blatoidea (Periplaneta americana and Diploptera punctata) to 4 to 5 in Diptera (Drosophila melanogaster and Anopheles gambiae) (31). The regulation of JH biosynthesis by AST-A was demonstrated in cockroaches, termites and crickets and across the insects the most conserved physiological role for this peptide is regulation of food intake, inhibition of insect gut motility and regulation of digestive enzyme metabolism (30,(32)(33)(34)(35).
Recently we and others revealed that in the blood-feeding mosquitos the AST neuropeptides and GPCRs may regulate blood digestion and reproduction (31,36). A single AST-A receptor (AST-AR) has been described in arthropods but in Diptera there are two receptor genes, that shared a common evolutionary origin with the vertebrate KISS (KISSR) and galanin (GALR) receptors (15,31,37,38). In molluscs, buccalins are orthologues of insect AST-A, and were first identified and functionally described in the gastropod Aplysia californica where they regulate muscle contraction and feeding (18,24). Currently buccalins have been reported in several molluscs where they are suggested to regulate reproduction and spawning in the Sydney rock oyster (Saccostrea glomerata) (39) and in the Mediterranean mussel (Mytilus galloprovincialis) their presence in the mantle has been linked to a role in shell formation (37).
The functions of AST-B and AST-C peptides have received much less attention. AST-B peptides are encoded by a precursor that in different species generates a variable number of small peptides. In D. melanogaster the AST-B precursor gives rise to 5 peptides but in Rhodnius prolixus 12 peptides are generated (40) and they are characterized by the presence of two conserved tryptophan residues and a C-terminal amide (WxW-amide). AST-B peptides were first isolated from the brain-corpora cardiaca-corpora allata-suboesophageal ganglion complex of the migratory locust, Locusta migratoria and were shown to inhibit hindgut and oviduct contractions, and thus were initially named myoinhibiting peptides (MIPs) (41). The allatostatic properties of AST-B/MIP peptides were only discovered after their isolation in the two-spotted cricket (Gryllus bimaculatus) (9). Subsequently other functions have been identified for AST-B/MIP including regulation of ecdysis in the silkworm (Bombyx mori) (42,43), the circadian clock in Drosophila (D. melanogaster) (44), feeding and locomotion in the cockroach (Leucophaea maderae) (45), reproduction in the locust (Locusta migratoria) (46) and the immune response in the green mud crab (Scylla paramamosain) (14). In insects, the AST-B/MIP precursor is most commonly found in the nervous system (45,47,48). The functions of AST-B/MIP have also been described in other protostomes such as annelids (where they are known as MIP) and regulate larval settlement and feeding (49). The AST-B/MIP peptides activate the same GPCRs that are activated by insect sex peptide (23,50,51) and they are proposed to be proximate with the orphan receptors, GPR142 and GPR139, in vertebrates (15,16).
AST-C was first isolated from the head of the tobacco hornworm (Manduca sexta) (52) and unlike other insect ASTs, the precursor produces a single peptide, although other genes, AST double C and triple C exist in the genomes of some arthropods including insects (53,54). The AST-C mature peptide has a conserved C-terminal PISCF motif (8,55,56). The function of AST-C varies and includes inhibition of muscle contraction, feeding suppression, reduced fecundity and inhibition of ovarian development and regulation of circadian activities (57)(58)(59)(60)(61)(62). In D. melanogaster AST-C was also recently proposed to regulate nociception and immunity (13). Receptors for AST-C are proposed to have a common origin with the vertebrate Somatostatin receptors (SSR) (15,63,64).
The present study revisits and enriches understanding of AST-like peptide and receptor evolution in the molluscs and takes a comparative genomics approach integrating results from previous genome and transcriptome studies (15, 19-21, 26, 65). To elucidate the potential functions of the AST-like families and GPCRs in molluscs we searched bivalve transcriptomes and targeted their potential role in the mantle, a multifunctional tissue well recognized for its contribution to shell production (37,(66)(67)(68)(69) and with an emerging role as an innate immune barrier (70,71). Taking into consideration the emerging role of AST family members in immunity of insects (12)(13)(14), the response of specific AST-like family members on the mantle of the Mediterranean mussel (M. galloprovincialis) exposed to an immune challenge with bacteria lipopolysaccharide (LPS) and heat inactivated Vibrio harveyi, a common pathogenic bacteria of the marine environment, was established (72)(73)(74).
To increase the number of receptor sequences from the phylum Mollusca, selected genomes of representatives of different orders available from the NCBI Genomes database (https://www.ncbi. nlm.nih.gov/genome/) were also procured using the tblastn algorithm with the M. galloprovincialis deduced protein sequences of AST-like receptors as the query (Supplementary Table 1). The available genome assemblies of the Polyplacophora class were also searched. All molluscs mentioned above and the general non-redundant protein sequences (nr) database at NCBI were also searched with the Capitella teleta GALR-like (ELU13887.1) for other protostome orthologues.
Searches were also performed in mantle transcriptomes available from the Mediterranean mussel (M galloprovincialis, SRP 063654) (68) and the hard-shelled mussel (M. coruscus, kindly provided by Yifeng Li and Jin-Long Yang, SHOU, China) to identify the corresponding gene transcripts. Putative receptors were retrieved based on their sequence similarity (cut off < e -30 ) with M. galloprovincialis and D. melanogaster homologues or using transcriptome annotations. All sequences retrieved were translated using the Expasy translate tool (https://web.expasy. org/translate/) and their identity confirmed as described above.

Peptide Precursors
Mollusca buccalin precursors (37) and M. galloprovincialis and M. coruscus orthologues of the insect AST-B/MIP and AST-C peptide precursors were initially obtained by exploring the species-specific mantle transcriptome annotation available "in house". The deduced protein sequence of MIP-like and AST-Clike were used to identify the corresponding coding genes and additional gene isoforms (Supplementary Table 1). The mussel precursor protein sequences were translated with the Expasy translate tool (https://web.expasy.org/translate/) and the localization of the predicted mature peptides was manually deduced by a) the identification of monobasic, dibasic, or tribasic consensus cleavage sites (RR, KR, KK) and b) the identification of the conserved mature peptide motifs such as the two conserved tryptophan (W) residues (W( X )W-amide) in AST-B/MIPs and the two conserved cysteine (C) residues in AST-Cs. The mature peptides and the full peptide precursors were procured in other molluscans (bivalves, gastropods cephalopods and the polyplacophoran) using a similar strategy. Homologues from other representatives of the lophotrochozoan clade such as one brachiopod and two annelids were also characterized for comparisons (Supplementary Figure 1).

Sequence Comparisons and Phylogenetic Analysis
Multiple sequence alignments (MSA) of the deduced peptides and receptor protein sequences were performed using the MUSCLE algorithm available in Aliview 1.18 (75). For phylogenetic analysis the receptor alignments were manually inspected, and sequence gaps removed, and the edited alignments were used to construct Bayesian Inference (BI) and Maximum Likelihood (ML) phylogenetic trees. The Buccalin-R/AST-AR and AST-C-like/ AST-C (members of the Rhodopsin g family) trees and the MIP-R/AST-BR trees (members of Rhodopsin b family) included sequences from representatives of 27 molluscs (11 bivalves, 11 gastropods, 4 cephalopods, 1 polyplacophor), 1 brachiopod, 2 annelids, 1 cephalochordate and 2 vertebrates (Supplementary Table 1) and were built in the CIPRES Science Gateway v3 using an LG model (selected using model test-ng 0.1.5) since they best fitted the data (76). The BI trees were built in MrBayes (77) run on XSEDE v3.2.7a with 1.000.000 generation sampling and probability values to support tree branching. The ML trees were built with the RAxML v8.2.12 (78) method with 100 bootstrap replicates. The Buccalin-R/AST-AR and AST-CR-like/AST-CR trees were mid-rooted according to previous models proposed for receptor sequence evolution (15), and MIP-R/AST-BR were rooted with the H. sapiens (NP_000721) and L. oculeatus (XP_006629714) cholecystokinin receptor type A (CCKAR) branch. To build the phylogenetic trees for the AST-Rhodopsin g family GPCRs the sequences of the metazoan KISSRs and GALRs were also included as they are suggested to have evolved from the same ancestral gene as protostome Buccalin-R/AST-AR [sequences obtained from (37)] and also from database searches using the predicted receptor proteins of M. galloprovincialis as queries. For the trees of the AST-Rhodopsin b family GPCRs the related protostome receptors from the FMRFamide (FMRFaR), RGWamide (RGWaR), Proctolin (ProctR) and Myosupressin/ Myomodulin (MyoR) families were also included as well as the vertebrate sequence orthologues GPR142 and GPR139 from H. s a p i e n s ( G P R 1 4 2 , N P _ 0 0 1 3 1 8 0 0 5 . 1 a n d G P R 1 3 9 , NP_001002911.1) and L. oculeatus (GPR142, XP_006635495 and GPC139, XP_006637109) (15,16,79).
Receptor sequence alignments and percentage of sequence identity was displayed and calculated in the GenDoc programme (http://www.nrbsc.org/gfx/genedoc/). The mature peptide alignments were established using Clustal W (https://www. genome.jp/tools-bin/clustalw). Receptor structures were predicted in Protter (http://wlab.ethz.ch/protter) using the default settings and the outputs annotated in Inkscape to highlight the conserved aa positions across species. The receptor signal peptide was predicted using the SignalP-5.0 Server (80) and the DeepLoc1 server was used to explore protein cellular localization (81).

Neighbouring Gene Analysis (Short Range Synteny)
The neighbouring gene environments of the annelid C. teleta GALR-like and Buccalin-R that map to SuperContig CAPTEscaffold_148 (scaffold_148) and SuperContig CAPTEscaffold_45 (scaffold_45), respectively and H. sapiens GALRs and D. melanogaster AST-ARs (DAR1 and DAR2) were characterized to infer potential ancestral origin and to support the identification of the new protostome sister clade of the deuterostome GALRs. Ten genes upstream and downstream of the C. teleta gene loci were collected and they were used to search for gene homologues in the H. sapiens GALR1 (chromosome 18), GALR2 (chromosome 17), GALR3 (chromosome 22) and KISSR (chromosome 19) and in D. melanogaster DAR1 (chromosome X) and DAR2 (chromosome 3R) genome regions. The BioMart tool available from Ensembl was used to compare the C. teleta and D. melanogaster genome regions. The deduced C. teleta buccalin-R and GALR-like neighbouring protein sequences were searched against the species-specific H. sapiens (taxid:9606) and D. melanogaster (taxid:7227) datasets available from NCBI (https://www.ncbi.nlm.nih.gov). The gene loci of the top protein hits (with the lowest e value) were retrieved based on the genome assemblies available from NCBI (D. melanogaster, Release 6.32, and H. sapiens, GRCh38.p13). The O. bimaculoides GALR-like genome region on SuperContig KQ419625 was also analysed and the five neighbouring genes were retrieved and searched against C. teleta and H. sapiens genomes.

Animal Manipulation and Sampling
Mediterranean mussels (M. galloprovincialis) were obtained from a local producer in the Ria Formosa (Olhão, Portugal). For the experimental immune challenge mussels (length 4.35 ± 0.34 cm, soft body dry mass 1.61 ± 0.46 g) were transported live to the Centre of Marine Sciences (CCMAR) where they were cleaned and acclimatized for a week in 5 litres of aerated seawater (SW) prior to the immune challenge at 20 -22°C. Animals were fed daily with a mixture of a commercial dried microalgae diet (PHYTOBLOOM, Necton, Portugal). For tissue sampling animals were opened by cutting the adductor muscle and the mantle edge from the region most distal to the umbo (referred to as the posterior region) was dissected out and snap frozen in dry ice and stored at -80°C for RNA extraction. For tissue distribution, cDNA samples (n = 3 for each tissue) from gills, digestive gland, mantle edge and haemolymph available in the lab were used. Immune Challenge Mussels were exposed to heat-inactivated V. harveyi by introducing them into the bathing seawater. The V. harveyi (kindly donated Dr M. Manchado, IFAPA, Puerto Santa Maria, Spain) was grown in TSB medium supplemented with 1% NaCl and the number of cfu/ml was determined on TSA/1% NaCl agar plates. For the bacterial challenge 5 x 10 7 cfu/ml of the heat inactivated bacteria suspended in 1 L of sterile seawater was used. The V. harveyi bacteria was heat inactivated by boiling the culture for 2 hours.
For the challenge experiments mussels (n = 80) were randomly distributed in triplicate tanks. The experiments were performed in 2 L plastic tanks containing 1 L of filtered seawater (0.45 mm) that was collected from their natural environment. Each of the control (3 tanks) and exposed (3 tanks) tanks contained 12 -13 individuals (total 40 animals per group). The seawater in the experimental tanks was constantly aerated with aquarium air-pumps and the temperature was 20 -22°C, pH was 8.1 ± 0.1 and salinity 37 ppt and the experiments were conducted under natural photoperiod for March 2021 in the Algarve. Control mussels were maintained in seawater and transferred after 15 h to new tanks and the immune challenged mussels were exposed for 15 h to heat-inactivated V. harveyi (5 x 10 7 cfu/litre) and then transferred to new tanks containing clean filtered seawater. Specimens (n = 6 per timepoint) from control and challenged tanks were sampled (as outlined above) at 0, 6, 12, 24 and 36h post exposure. Animals were not fed during the experiment and no mortality was observed.

RNA Extraction and cDNA Synthesis
Total RNA (tRNA) from control and immune challenged mantle edge was extracted using an E.Z.N.A kit (VWR, USA) and DNase treatment was performed after elution using a Precision DNase kit (Primer design, UK) according to the manufacturers protocol. For extraction, collected tissues were defrosted in the lysis buffer and homogenized by mechanical disruption with two iron beads (5 mm) using a Tissue lyser II Qiagen and 1 cycle of 3 min at room temperature.

Tissue Expression Analysis
To characterise the tissue distribution publicly available control tissue transcriptomes of four bivalves: M. galloprovincialis, M. coruscus, C. gigas and M. yessoensis were retrieved and analysed. Public transcriptome data available for control tissues including the gills, muscle, mantle, digestive gland/hepatopancreas, haemocytes and nerve ganglia were searched using the species-specific nucleotide sequences identified for the peptide precursors and receptors. Searches were carried out using Blastn and the corresponding sequence read archive (SRA, https://www.ncbi.nlm.nih.gov/sra/) for each of the species analysed (Supplementary Table 2). Maximum target sequences were adjusted to 1000 and sequence hits with > 98% nucleotide identity were selected. FPKM counts were calculated taking into consideration the number of reads, gene length and the transcriptome sequencing depth.
The involvement of the bivalve homologues of the arthropod ASTs and receptors in the immune response was initially assessed using mantle edge transcriptomes of M. galloprovincialis challenged with Lipopolysaccharide (LPS, E. coli LPS 0111:B4, Sigma-Aldrich, USA) a major component of the outer membrane of Gram-negative bacteria. Candidate transcripts were identified by exploring available in-house DEG data (p-adj < 0.05, log2-fold > 2) for the mantle edge transcriptomes of control M. galloprovincialis (injected with 1x PBS) and LPS exposed M. galloprovincialis (injected with 50 ml of 0.5 mg/ml of bacterial LPS in the adductor mussel) from samples collected in the context of another study.
To further explore the involvement of the AST-C-like system in the bivalve response to pathogenic marine bacteria, expression analysis of the M. galloprovincialis members was assessed using cDNA (n = 3) from normal tissues (gills, digestive gland, mantle edge and haemocytes) and from the mantle edge of control and exposed specimens to heat-inactivated pathogen, V. harveyi, at 0 (n = 6), 6 (n = 6), 12 (n = 6), 24 (n = 6) and 36 h (n = 6) post exposure. Specific primers for the M. galloprovincialis AST-Clike peptide precursors and for four AST-CR-like were designed and the PCR products were sequenced, and their identity confirmed ( Table 1). It was not possible to design specific primers for the AST-CR-like VDI60978.1 as its sequence was smaller and the predicted nucleotide coding region was highly identical to other AST-CR-like (97%). The activation of the immune response in M. galloprovincialis was confirmed by determining the expression of three humoral factors that have previously been shown in bivalves to respond to Vibrio spp., Toll-receptor TLRa (82), Lysozyme goose-type 1 LYG1 (83) and complement-factor C3-like (Peng et al., submitted) in cDNA of the mantle edge of control (n = 3) and exposed (n = 3) specimens from 0, 6 and 12 h post exposure ( Table 1). Changes in gene expression were assessed by quantitative realtime PCR (qRT-PCR) using SsoFast EvaGreen Supermix (Bio-Rad, Portugal) in a 10 µl final reaction volume containing 200 nM of forward and reverse gene specific primers ( Table 1) and 2 µl of template cDNA (diluted 1:2). Elongation factor 1-alpha (EF1a) and 18s ribosomal subunit (18S) were used as reference genes (cDNA diluted 1:50 and 1:500, respectively). QRT-PCR analysis was performed in duplicate reactions (< 5% variation between replicates) using a CFX Connect ™ Real-Time PCR Detection System for 96-well microplates (Bio-Rad). Cycling conditions were 95°C, 30 sec; 44 cycles of 95°C, 5 sec; the most appropriate primer annealing temperature, 10 sec ( Table 1). Melting curves were performed to detect the presence of non-specific products and primer dimers. Reverse transcriptase (RT-) and PCR control reactions were included in each PCR plate to confirm the absence of genomic or PCR contamination. QRT-PCR efficiencies and R 2 (coefficient of determination) were established ( Table 1), and data was normalized using the geometric mean of the expression levels of the reference genes.

Statistical Analysis
Results are presented as the mean ± SEM. Statistical differences were detected using One-Way ANOVA for the tissue distribution and for gene expression between the control and immune challenged mussels with Two-Way ANOVA using a Sidak's multiple comparison test. Analysis was executed using GraphPad Prism version 8.0 for Mac OS X (USA, www.graphpad.com).

Nomenclature
The AST neuropeptides were named due to their inhibitory (allatostatic) actions on JH biosynthesis from the insects CA gland. In molluscs no equivalent organ has been described and JH is specific to insects. In molluscs the sequence orthologues of the arthropod AST-A are known as buccalin and in annelids the orthologues of the arthropod AST-B/MIP are known as MIP. For the lophotrochozoan AST-Cs no nomenclature has yet been proposed. Throughout the manuscript we use the existing lophotrochozoan nomenclature and have named AST-C as AST-C-like ( Table 2).

Orthologues of the Arthropod AST Precursors and Receptors in the Molluscs
Sequence homologues of the three arthropod AST peptide precursors and receptors were found in the 27 mollusc species included in the study and the number of peptide precursor genes and predicted mature peptides and putative receptor genes varied across the species (Figure 1 and Supplementary Table 1, Supplementary Figures 1A-C). In other lophotrochozoan phyla the peptide precursors shared a similar organization to the molluscs (Supplementary Figure 1A).

Buccalin Precursors and Buccalin Receptors
The Mollusca buccalin precursors are the orthologues of the arthropod AST-As and a single precursor gene was found in most molluscs and encoded for multiple mature peptides ( Figure 1 and Supplementary Figure 1A). The exception was the gastropods Conus ventricosus (CM031604.1 and CM031615.1) and Pomacea maculata (XP_025108540.1 and XP_025087986.1) where two putative genes were found. The number of buccalin-R was more variable and while most species possessed a single buccalin-R gene, other species genomes contained two receptor genes (the bivalves Pinctada imbricata, Margaritifera margaritifera; the gastropods Alviniconcha marisindica, Haliotis laevigata, Pomacea maculata and the cephalopod Nautilus pompilius but the bivalve C. gigas and the gastropod Candidula unifasciata contained three buccalin-R genes. The majority of the mollusca duplicate buccalin-R genes were localized within the same genome region and thus are likely to be tandem gene duplicates (Supplementary Table 1).
The buccalin precursors encoded for a different number of peptides. In the Mytilidae family 7 mature peptides were predicted in M. coruscus and 9 in M. galloprovincialis. In the Ostreidae family, the buccalin precursors have the potential to generate 10 mature peptides in C. gigas and M. hongkongensis and 9 in C. virginica and in the Pectinidae, P. yessoensis, 11 mature peptides were predicted (Supplementary Figure 1A). In other bivalves, such as the Margaritifera margaritıfera (Margaritiferidae family), Pinctada imbricata (Pteriidae family) and in Tegillarca granosa (Arcidae family) at least 8, 8 and 10 putative mature peptides respectively, were encoded, and the full-length precursors still remain to be isolated. No buccalin precursor genes were retrieved from the bivalves Sinonovacula constricta and Ruditapes philippinarum genomes.
The buccalin precursor in the gastropod A. californica (Aplysiidae family) contained the largest number of predicted peptides (29 mature peptides) and the Gigantopelta aegis (Peltospiridae family) precursor was the second most rich. In the cephalopod O. bimaculoides and most cephalopod genomes explored our searches failed to identify the buccalin precursor except in the Nautilus pompilius (Nautilidae family) where 29 peptides were predicted (Supplementary Figure 1A). In the polyplacophore Acanthopleura granulata a single buccalin precursor was found (Supplementary Figure 1A).
Previously, a putative buccalin-like precursor with a similar organization to the buccalin precursor was identified in molluscs, which generates a series of highly identical V-amide mature peptides with a conserved PFDRLASGLV/Iamide sequence (37) (Supplementary Figure 2). These buccalin-like peptides were recently reclassified and proposed as a new Mollusca neuropeptide family, the LASGLI/V-amide peptide family (19,25). For this reason, the buccalin-like peptides (LASGLI/V-amide peptide) were excluded from the present analysis.

MIP Precursors and MIP Receptors
In Mollusca a single MIP-B peptide precursor encoding multiple peptides was found (Supplementary Figure 1B) in most species, the exception was the gastropod, C. unifasciata, where two precursors were retrieved. The number of MIP-Rs was variable ( Figure 1). A single receptor gene was found in the gastropods, cephalopods and in the polyplacophore genomes analysed but gene number was very variable in bivalves where multiple receptors were found with 3 in M. galloprovincialis and 6 in T. granosa (Supplementary Table 1). No MIP or MIP-R was found in the gastropod B. glabrata. In bivalves, the number of mature MIP peptides varied from 11 in the Mytilidae to 12 in the Pectinidae (Figure 1 and Supplementary Figure 1B). The gastropod peptide precursors encoded the greatest number of peptides, and 25 MIP mature peptides were predicted in L. gigantea and 16 in A. californica. In other gastropods the minimum number of mature peptides found was 8 but most precursors were deduced from species genomes and were incomplete and it seems likely that more peptides may be produced. In the cephalopod O. bimaculoides the MIP precursor encoded the least number of peptides (only 4) but in other taxa at least 6 peptides existed (Supplementary Figure 1B). In the polyplacophore a single MIP precursor was found, which encoded at least 6 mature peptides.

AST-C-Like and AST-C-Like Receptors
In Mollusca a single AST-C-like peptide precursor that encodes a single mature peptide was found in all the species analysed ( Figure 1 and Supplementary Figure 1C). The only exception was the bivalve T. granosa where two identical mature peptides precursor genes localized in the same genomic fragment (JABXWC010000007.1) were found. In contrast, in other molluscs receptor number was variable and 5 putative AST-CR-likes were retrieved from bivalves of the Mytilidae family and four were found in the gastropod C. unifasciata (Arionidae family). The other representatives of the diverse Mollusca classes possessed 1 to 2 AST-CR-like ( Figure 1). Genes encoding the AST-C-like peptide precursors were not predicted in available protein coding gene data for M. galloprovincialis and M. coruscus but searches in mantle transcriptomes identified a single AST-C-like gene transcript in M. coruscus which mapped with 100% identity to the genome (CM029607.1). An orthologue sequence was identified in the genome of M. galloprovincialis (UYJE01007806.1) when the M. coruscus AST-C-like sequence was used for searches.

Phylogeny of the Mollusca Receptors
The retrieved receptors from Mollusca were compared with the orthologue receptors in other lophotrochozoans, arthropods and deuterostomes by building BI phylogenetic trees (Figures 2A, B and Figure 3). The ML tree had a similar branch topology (Supplementary Figures 3A, B). The protostome Buccalin-R/ AST-AR and AST-CR-like/AST-CR family members belonged to the GPCR-rhodopsin g subfamily. The protostome MIP-R/AST-BR family members belonged to the GPCR-rhodopsin b subfamily. All the Mollusca sequences within each receptor family clustered together based on their sequence homology in the phylogenetic trees and the lophotrochozoan receptors formed sister branches with the arthropod receptor homologues. The receptors of the Mytilidae family, M. galloprovincialis and M. coruscus, always clustered in proximity and the same was observed for the Ostreidae family, C. gigas and C. virginica. Clustering of the Mollusca receptors revealed that the variable number of members found within each family resulted from lineage and species-specific duplication events.

Buccalin-R
The topology of the phylogenetic tree confirmed that no Buccalin-R/AST-ARs existed in deuterostomes and that Mollusca as well as other protostome receptors shared a common origin with the metazoan KISSR and GALR (15,31,37,88,89) (Figure 2A). Our searches also confirmed that homologues of the deuterostome KISSRs exist in bivalves and gastropods and that GALR-like sequences were only found in cephalopods. The cephalopod GALR-like sequences clustered with the annelid GALR-like from C. teleta (ELU13887) and P. dumelii (AKQ63078) and with the deuterostome GALRs ( Figure 2A) irrespective of the phylogenetic tree building method. This reveals for the first time the presence in Mollusca of a GALR-like clade, which is apparently absent from the bivalves, gastropods and polyplacophore.

MIP-R
Homologues of the arthropod AST-BRs (MIP-Rs) were identified in molluscs and other lophotrochozoans and the clustering of the retrieved bivalve sequences suggested that there were three MIP-R subtypes (Figure 3). Type I MIP-Rs was assigned to the Mollusca receptors that were present in most species and that clustered with the other lophotrochozoans and in the same branch as the arthropod AST-BRs. The other two MIP-R clusters were named type II and type III and contained sequences from bivalves ( Figure 3). Identification of 3 MIP-R subtypes in bivalves and their absence from other protostomes suggests that they emerged prior to the divergence of the ecdysozoan and lophotrochozoan lineages (Figure 3). The protostome RGWaR, FMRFaR, ProctR and MyoR that are proposed to have radiated from the same common ancestor as MIP-R/AST-BRs were included in phylogenetic tree (15,16,79) and the tree topology confirmed that all receptors shared a common origin with the orphan deuterostome GPR139/142 (15,16).

AST-CR-Like
The arrangement of the sequences within the Mollusca AST-CRlike branch confirmed that the receptors shared a common origin with vertebrate SSRs ( Figure 2B). A large gene family expansion occurred within the Mytilidae family and originated four of the five receptor isoforms retrieved from M. galloprovincialis and M. coruscus. In addition, the clustering of one of the Ostreidae duplicate receptors from C. gigas, C. virginica and M. hongkongensis in distinct branches from the other lophotrochozoan receptors suggested that the two gene copies were under different evolutionary pressure ( Figure 2B). Genome mapping revealed that the duplicate Ostreidae receptor genes mapped in proximity in the same genome fragment and are likely to be the result of a tandem gene duplication (Supplementary Table 1).

Buccalin Mature Peptides
In molluscs the buccalin peptide precursors encoded L-amide peptides (as found in A. californica (18) and a different number of peptides (Supplementary Figure 1A). The gastropod and the cephalopod N. pompilius precursors encoded the largest number of mature peptides (Figure 1 and Supplementary Figure 1). The bivalve and cephalopod buccalin precursors encoded peptides with a conserved C-terminal L-amide (like D. melanogaster AST-A) ( Figure 4) but in gastropods it also encoded I-amide peptides and all predicted peptides were less than 55% identical in sequence to the D. melanogaster AST-A. The number of predicted peptides in the buccalin precursor varied across the molluscs. In bivalves, the buccalin precursor in the representatives of the Mytilidae family, encoded 9 and 7 mature L-amide peptides, respectively in M. galloprovincialis and M. coruscus that all had different sequences ( Figure 4 and Supplementary Figure 1A).
In gastropods, the A. californica buccalin precursor encoded the largest number of predicted peptides (29 mature peptides) and 25 were L-amine and the remaining 3 were C-terminal I-amide and   Figure 1A). In O. bimaculoides and other cephalopod genomes searches failed to identify the buccalin precursor except in N. pompilius and 29 L-amide peptides were predicted, and most of them were identical in sequence and only 3 different types of mature peptides were produced (Figure 4 and Supplementary Figure 1A). The buccalin precursor in the polyplacophore A. granulata encoded 25 L-amide peptides with 15 different sequences (Figure 4 and Supplementary Figure 1A). Conservation of the mollusc mature peptides encoded within the precursor was distinct and the amino acid sequence identity of the M. galloprovincialis mature peptides varied from 16 -72% and the predicted Mga_c and Mga_i peptides were most divergent and Mga_g was the most conserved (72% aa identical) with Mga_d, e and f.

MIP Mature Peptides
The MIP precursors in common with the mollusc buccalin precursor have the potential to generate multiple peptides by proteolysis of a single polypeptide precursors and the highest peptide numbers were identified in the gastropods (Figures 1, 4 and Supplementary Figure 1B). For MIP a single peptide precursor was found and in the bivalve Mytilidae the number of mature peptides varied from 11 in M. coruscus to 12 in M. galloprovincialis but in the gastropod C. unifasciata two MIP precursors existed and they encoded for different peptides (Figure 1 and Supplementary Figure 1B). The mollusca MIP mature peptides varied from 7-9 aa in length and the sequence identity in the M. galloprovincialis precursor revealed that 3 peptides (Mga_c, d and e) were identical and the peptides overall, shared 22 to 62% aa identity ( Figure 4). The MIP precursor in A. californica produced 16 peptides of which 7 had different sequences and overall, they shared 25-85% aa identity. The gastropod L. gigantea MIP precursor was the most peptide rich (25 MIPs in total) and it encoded putative mature peptides of different sizes, and the majority were slightly longer (12 aa) than most molluscan peptides due to their extended N-terminal region (Supplementary Figure 1B). The same occurred for the cephalopod S. pharaonic and A. dux MIPs but N. pompilius had peptides of a similar length to other mollusca MIPs and all the encoded peptides (6) differed and the same was true for the polyplacophore A. granulata (Figure 4 and Supplementary Figure 1). All the mollusc mature peptides possessed two conserved tryptophan (W) residues separated by 4-7 aa residues (W(x 4-7 )W, where x represents any aa). In D. melanogaster there were five AST-B/MIP peptides that contained two conserved W residues separated by 6 aa (W(x 6 )W). The 12 MIP mature peptides in the bivalve M. galloprovincialis shared 38 to 78% aa sequence identity with the D. melanogaster orthologues.

AST-C-Like Mature Peptides
A single AST-C-like peptide was encoded by the AST-C precursor and in the molluscs analysed, the peptide was 13-15 aa in length (Figure 4 and Supplementary Figure 1C). AST-C in molluscs like AST-C and AST-CC of D. melanogaster possessed two conserved cysteine (C) residues that form a disulphide bond in the peptide ( Figure 4). Within molluscs the aa conservation varied from 33 to 76% and the mollusc AST-C-like mature peptide shared higher sequence conservation with D. melanogaster AST-C (25 -33%) than with AST-CC (13 -17%).

Structure of the Mollusca Receptors
The Mollusca receptors shared a conserved protein structure with their homologues from D. melanogaster and H. sapiens and receptors possessed seven transmembrane domains linked by three extracellular loops that alternated with three intracellular loops ( Figure 5 and Supplementary Figures 4A-C). Multiple sequence alignments of the mollusc receptors with the other protostome homologues revealed that the TM domains shared the highest sequence conservation across species. The conserved aa motifs involved in receptor activation, DRY in ICL2 between TM3 and TM4 and NSxxNPxxY (where x represents any aa) within TM7 were present (90, 91) ( Figure 5 and Supplementary  Figures 4A, C). The exception was the protostome MIP-Rs in which aspartic acid of the motif DRY was mutated to QRY (Glutamine) and degeneration of the motif in TM7 occurred ( Figure 5 and Supplementary Figure 4B). Two conserved cysteine residues, which potentially form a disulphide bond between ECL1 and ECL2 and determine receptor structure and stability were conserved in the mollusc and arthropod receptors. A C-terminal cysteine residue for potential palmitoylation after TM7, which is a characteristic of the rhodopsin-GPCRs, was also identified in the Buccalin-Rs and AST-CRs-like ( Figure 5). No signal peptide was predicted for the M. galloprovincialis receptors or the D. melanogaster homologues except for DAR1. However, DeepLoc1 analysis predicted a cell membrane localization for the M. galloprovincialis receptors. Several N-glycosylation sites were found in the N-terminal region of M. galloprovincialis and other bivalve receptors ( Figure 5 and Supplementary Figures 4A-C).
The deduced protein sequence of M. galloprovincialis and M. coruscus buccalin-Rs shared 96% aa identity, 40 -54% identity with the homologues from the Ostreidae family and 34 -35% with the two receptor homologues in D. melanogaster (DAR1 and DAR2). The three M. galloprovincialis MIP-Rs were only 25 -29% identical suggesting that they may have different functions. The M. galloprovincialis type I receptor was most similar to D. melanogaster AST-BR (35%), while the other paralogues type II and type III only shared 20% identity with D. melanogaster AST-BR. The amino acid sequence of the five M.galloprovincialis AST-CR-like shared between 37 to 80% identity and VDI08560.1 and VDI60978.1 (that arose due to a recent gene duplication) shared the highest aa sequence identity. The M. galloprovincialis and M. coruscus AST-CR-like shared 36 -94% aa identity and 27 to 34% aa identity with the two D. melanogaster receptors.

Comparisons of the Gene Environment of Buccalin-R And GALR-Like With the Homologue Regions in Human and D. melanogaster
Previously, our studies on the characterization of the Buccalin-R/ AST-ARs and the sequence homologues of the vertebrate KISSR and GALRs in Arthropods and Mollusca suggested that they were most closely related to the metazoan KISSRs (31,37). However, the topology of the present phylogenetic trees (Figure 2A) that included a larger number of molluscan representatives indicated that the protostomes, arthropod AST-ARs and lophotrochozoan Buccalin-Rs, tended to cluster with the vertebrate GALRs ( Figure 2A). Moreover, a novel receptor clade containing homologues of the deuterostome GALRs were identified in molluscs (cephalopods), annelids and other lophotrochozoans. To gain further insight into the relationship of the lophotrochozoan Buccalin-R and GALR-like members with the KISSR/GALR, the gene environment in the annelid C. teleta and O. bimaculoides, which have the two receptor types was characterized ( Figure 6). The O. bimaculoides GALR-like gene mapped to a short genome fragment (SuperContig KQ419625) and only 5 gene neighbours were mapped, and searches against C. teleta and the human genome regions failed to identify gene homologues in the neighbourhood of the GALRlike and GALRs genes.
In C. teleta the GALR-like homologue mapped to scaffold_148 and gene homologues of human CEP131 and HID1 located near the loci of human GALR2 in chromosome 17 were identified ( Figure 6A). In addition, homologues of three other neighbouring genes were found in D. melanogaster AST-AR chromosomes, TRIM and RIM in proximity with the DAR2 gene on chromosome 3R and TRAM in proximity with DAR1 on chromosome X. This suggests that the C. teleta GALR-like genome region shares similarity with both arthropod AST-AR and human GALR2 genome regions.
The C. teleta AST-AR genome region was also characterized, and the gene mapped to scaffold_45 and shared conserved homologue neighbouring genes with the D. melanogaster DAR1 and DAR2 genome regions suggesting that both annelid Buccalin-R and insect AST-AR shared a common origin ( Figure 6B). No homologue neighbouring genes of C. teleta AST-AR loci were found in the proximity of the human GALR1 (chromosome 18), GALR2, GALR3 (chromosome 22) or KISS1R (chromosome 19) genome regions. Comparison of the gene environment of human KISSR and AST-ARs in D. melanogaster (chromosomes 3R and X) and A. gambiae (chromosome 2R) using the conserved neighbouring genes previously identified, PTBP1, EVIL5L, DOT1L and ODF3L2 (31), failed to retrieve putative homologues in the annelid C. teleta KISSR-like and GALR-like genome regions.

Expression Analysis in the Bivalves and Targets of Immunity
Transcriptome data available for the bivalves, M. galloprovincialis, M. coruscus, C. gigas and M. yessoensis (Supplementary Table 2) was used to infer potential AST system function in molluscs. The expression profile obtained suggested that buccalin, MIP and AST-C-like peptide precursors and receptors have a widespread tissue distribution and that they seemed to be most abundant in the mantle (Figure 7). Of the tissue transcriptomes analysed the haemocytes had the lowest expression except for the MIP system in C. gigas. The widespread tissue distribution and different relative abundance (RPKM) of the peptides and receptors suggests that they may have different functions (Figure 7). In M. yessoensis all homologues of the arthropod ASTs and receptors were present in the nerve ganglion (transcriptome, SRR6407589), and were far more abundant in this tissue than in the mantle ( Figure 6). Analysis of a mantle transcriptome of M. galloprovincialis challenged with bacterial LPS revealed that only the AST-CR-like (VDI60978.1), was significantly down-regulated 12h post challenge (Supplementary Figure 5). These results suggest that the bivalve AST-C-like system may be involved in the immune response, as described in arthropods. To test this hypothesis M. galloprovincialis were challenged with the pathogenic marine bacteria V. harveyi.

Exposure of M. galloprovincialis to an Immune Challenge and Response of the AST-C-Like System in Mantle
Quantitative PCR (qRT-PCR) analysis of M. galloprovincialis tissues confirmed the AST-C-like precursor and two AST-CR-like (VDI53419.1 and VDI15122.1) were detectable in the mantle (Figure 8). Other tissues such as the gills, the digestive gland and the haemocytes also expressed the peptide precursor and receptors. The abundance of AST-C-like was similar in all tissues analysed. AST-CR-like VDI15122.1 was significantly more expressed in the gills compared to haemocytes (p < 0.05) and AST-CR-like VDI53419.1 was significantly more expressed (p < 0.01) in the gills than in other tissues. AST-CR-like VDI53419.1 was most abundant and its expression in tissue was approximately 10-fold higher than the other receptor gene VDI15122.1 and the gene encoding the peptide precursor ( Figure 8). No amplification product was obtained for VDI08560.1 and VDI13242.1. The AST-CR-like gene, VDI60978.1, which shared 97% identity with VDI08560.1 was not analysed as isoform specific qRT-PCR primers could not be designed.
To assess if the AST-C-like system was modified by an immune challenge, the immune response of M. galloprovincialis to heatkilled V. harveyi was assessed by measuring by qRT-PCR the change in expression of innate immune-related genes. TLRa was significantly up-regulated 6h (p < 0.001) and 12 h (p < 0.05) after an immune challenge and LYG1 was significantly down-regulated at 6 h (p < 0.05) and complement-like C3-like was not significantly changed (Supplementary Figure 6). The AST-C-like peptide precursor and receptor expression was modified after exposure to heat-killed V. harveyi (Figure 9). The AST-C-like precursor was significantly up-regulated 6 h (p < 0.05), 12 h (p < 0.0001) and 24 h (p <0.01) after an immune challenge. AST-CR-like VDI15122.1 was down-regulated at 6 h (p < 0.01), 12 h (p < 0.001) and 24 h (p < 0.0001) and AST-CR-like VDI53419.1 was down-regulated at 6 h (p < 0.0001) and 12 h (p < 0.01) after exposure to heat-killed V. harveyi (Figure 9). The qRT-PCR expression data corroborated the LPS transcriptome data of the AST-C-like system (peptide precursor and receptors) under an immune challenge in M. galloprovincialis.

DISCUSSION
The homologues of the arthropods AST-A, AST-B/MIP and AST-C systems and their function have been poorly described in the second largest animal phylum, the Mollusca. To better understand the peptide and receptor evolution and function a comparative approach was taken paying particular attention to the shelled molluscs (the bivalves) a group of ecologically and commercially important species. In molluscs, homologues of the three arthropod peptide groups (named buccalin, MIP and AST-C-like) and their corresponding receptors were identified. The sequence similarity between the mollusc and insect peptides and receptors was low but aa motifs involved in peptide and receptor structure were well conserved. In molluscs the peptide precursor and receptors evolved via lineage and species-specific events with receptor gene family expansions found in some species. The buccalin and MIP precursors encoded several mature peptides with differing aa sequences and sizes suggesting that they may have differing affinity or potencies for the corresponding receptors and may modulate distinct biological functions. Expression of all elements of the buccalin, MIP and AST-C-like systems were detected in the bivalve mantle and changes in the AST-C-like peptide and receptor transcripts in response to a bacterial immune challenge in M. galloprovincialis revealed that this neuropeptide system may contribute to the immune response in Mollusca.

Diversity of Mollusca Peptides and Receptors
In molluscs homologue peptide precursors and receptors of the arthropod AST-systems were found. The potential peptides produced, and their putative receptors were more numerous than in D. melanogaster and other insects. In common with the insects and other arthropods AST-A and AST-B/MIP, the buccalin and MIP mature peptides in Mollusca were encoded by long protein precursors containing multiple mature peptides with distinct amino acid sequences but well conserved motifs important for bioactivity (18-20, 25, 37, 92). The results of phylogenetic analysis from the present and previous studies  Table 2). FPKM counts were calculated having in consideration the number of reads, gene length and the transcriptome sequencing depth. Only tissues common to all species are represented. Expression data for C. gigas mantle corresponded to the average of inner (SRX093415) and outer (SRX093411) transcriptomes and for the M. yessoensis muscle the average of smooth (n=3) and striated (n=3) muscles. Other M. yessoensis transcriptome data also reflect the average FPKM of 2 to 3 datasets (except the nerve ganglia, n = 1). Colour grading highlights differences in transcript abundance in the tissue libraries analysed. na-not available. (15,16,37) revealed that the Mollusca and Arthropoda receptors shared a common origin. The general sequence and structural conservation of the mollusc peptides and receptors with the arthropod AST systems homologues suggests functional conservation is likely across the protostomes.
In molluscs, buccalins were originally isolated from neuronal elements in the accessory radula closer (ARC) muscle of the gastropod A. californica and regulate food-induced arousal (24). In arthropods including insects a single gene for the AST-A precursor has been described. In D. melanogaster the AST-A peptides share a conserved FGL-amide terminus but the sequences of the orthologue peptides in molluscs was very different although a conserved L-amide C-terminus existed but, in some gastropods and cephalopods I-amide C-terminal peptides were also found. A second AST-A peptide precursor previously proposed to contain buccalin-like peptides that terminated with a V-amide (37) was exclusive to molluscs and has recently been assigned to the LASGLV-amide family (19,20,25). The two buccalin precursors recently described in the gastropod Pacific abalone (Haliotis discus hanna) are authentic as they both encode C-terminal L-amide peptides (92) as well as the C. ventricosus and P. maculate precursors identified in this study. The longest and most peptide rich buccalin precursors were found within the gastropods, in the cephalopod N. pompilius and in the polyplacophore A. granulata. In bivalves, the number of buccalin L-amide peptides varied from 7 to 11 and peptide sequence conservation within each species precursor was distinct and may indicate they have different functions. Despite the large variety of putative buccalins only a single homologue of the arthropod AST-AR was found in most of the species analysed suggesting that alternative receptors activated by buccalins may exist. It would be interesting to test if molluscan buccalins are the cognate peptide activators of KISSR/GALR-like since protostome AST-A and AST-ARs are proposed to share a common evolutionary origin with the metazoan KISSR and GALRs (15,16,31,37,38). Putative KISSR-like and GALR-like were FIGURE 9 | Effect of an immune challenge on AST-C-like precursor and receptor expression in the mantle of the bivalve M. galloprovincialis. Mantle edge samples were collected at 0, 6, 12, 24 and 36 hours after exposure to heat-inactivated V. harveyi. The expression of two AST-CRs-like (VDI15122.1 and VDI53419.1) was analysed. Gene expression levels were normalized using the geometric mean of two reference genes (EF1a and 18S). The results are presented as the mean ± SEM of six (n= 6) biological replicates per group/sampling point. Significant differences between control and immune challenged mussel mantle at each timepoint are indicated by different letters and were detected using two-way ANOVA and a Sidak's multiple comparison test in GraphPad Prism version 8.0.0 software for Mac OS X. identified in molluscs and in annelids (this study and (31, 37) but our searches in Mollusca failed to identify orthologues in the genome of several of the species.
The MIP peptide precursors in molluscs in common with arthropod homologues encodes for multiple peptides with a similar structure. All mollusc and arthropod deduced mature peptides were C-terminal amides and possessed two conserved tryptophan residues previously shown to be important for peptide bioactivity (55). However, the consensus W(X 6/7 )Wamide sequence in Arthropoda AST-B/MIP peptides (8) was modified to W(X 4-7 )W-amide in Mollusca MIPs. Gastropod MIP precursors produced the largest variety of peptides and in L. gigantea 25 peptides were predicted, which was 5 times higher than in the insect D. melanogaster. Like the insects, D. melanogaster and T. castaneum a single AST-BR/MIPR was identified in most gastropods and in cephalopods. In contrast, bivalves possessed three receptor type genes that arose prior to the divergence of the molluscs and arthropods by gene duplication. In insects, AST-B/MIP share the same receptor as insect sex peptide, a.k.a. accessory gland peptide, which in D. melanogaster is 36 aa in length. The single gastropod A. californica MIP-R has been deorphanized but the species MIP peptides (23) and the newly identified bivalve MIP-Rs (type II and III) found in this study remain to be characterized with the MIP peptides as the obvious choice of ligand. The degeneration of functionally important rhodopsin-family receptor motifs such as DRY (in ICL2) and NSxxNPxxY (where x represents any aa) within TM7 previously reported in arthropod AST-BR/MIP-R also existed in the mollusc homologue receptors (88,87,8). MIP and MIP-R were present in all species analysed, except B. glabrata and it is unclear if this was due to the genome quality or if they were deleted from the genome.
In molluscs the AST-C-like peptide precursor generated a single peptide as observed in arthropods. The AST-C mature peptides were characterised by two conserved cysteine motifs in arthropods (19,20,25,54,92), which were proposed to determine peptide structure and function and they were also conserved in mollusc AST-C-like suggesting they may be important for function. Nonetheless, despite superficial sequence similarities the evolutionary model underpinning the AST-C/AST-C-like systems in arthropods and molluscs diverged. In arthropods three genes encode three different peptide precursors (AST-C, CC and CCC) (54, 93) and generally only one or two AST-CR exist (63,94,95). In the case of molluscs, a single gene encodes the peptide precursor (except in the gastropod C. unifasciata that contains two identical mature peptide genes), but a larger number of putative AST-CR-like exist compared to arthropods. In molluscs the receptor number was generally conserved apart from in the bivalves and specifically the Mytilidae family where lineage and species-specific events caused expansion of the AST-CR-like (type I). The oysters possessed the protostome AST-CRlike type I and an additional AST-CR-like type II that according to the phylogeny diverged from other lophotrochozoan receptors. Genome analysis of the lophotrochozoans revealed that AST-CR-like type II are tandem duplicates with the AST-CR-like type I gene suggesting that after gene duplication in the ancestral Ostreoidea they considerably diverged.
The identification in molluscs of homologues of the arthropod AST peptide precursors with the potential to produce multiple mature peptides with distinct sequences and of AST-Rs that emerged from lineage/species-specific expansions in molluscs suggests that the diversity and complexity of this neuropeptide system is higher than in arthropods. The functional role of the arthropod AST-like system in the molluscs and other Lophotrochozoans remains to be studied but its diversity suggests it may be pluripotent. In the future, deorphanization of the multiple receptors within the Mollusca with the corresponding peptides will help to understand the peptide-receptor interactions and provide a route towards the establishment of function.

Evolutionary Scenarios for the Protostome AST-Like System and Discovery of GALR-Like Receptors in Molluscs
The lack of sequence conservation and distinct peptide precursor organization of the three protostome AST-like peptide families (AST-A/buccalin, AST-B/MIP and AST-C/AST-C-like) is indicative of their different evolutionary origins. This contrasts with the common evolutionary origin proposed for cognate receptor families. The presence of both peptide precursors and receptors in arthropods and molluscs suggested that they emerged prior to the ecdysozoans-lophotrochozoan divergence and shared a common origin and functional conservation with homologue systems in vertebrates. The protostome AST-A/ Buccalin systems are homologues of the highly studied deuterostome GALR/KISS systems and the AST-C/AST-C-like system in protostomes is suggested to be the counterpart of the vertebrate SST-system (15,31,(63)(64)(65)95). For AST-B/MIP no homologue neuropeptide system has been described but receptor clustering with the vertebrate orphan receptors, GPR139/142, suggests that they may be evolutionary related. The function of GPR139/142 is poorly understood, and in mammals GPR139 is mostly expressed in the CNS and is suggested to regulate movement and food consumption/metabolism (96).
In our study we uncovered a novel mollusc GALR-like clade which only persisted in cephalopods. Previously we showed that putative GALR-like receptors were also present in annelids (31,37) and the identification of homologues in other lophotrochozoans revealed that the GALR-like receptor clade emerged early but was selectively deleted from the genomes of several species. The GALR-like is the protostome orthologue of the deuterostome GALRs and changes the currently accepted paradigm that AST-AR/Buccalin-R are the protostome GALR representatives. In a previous study we proposed that the AST-ARs/Buccalin-Rs were more related with KISSR than with the GALRs. However, the inclusion of more family members in the present study clustered the mollusca and other lophotrochozoan Buccalin-Rs and insect AST-ARs closer to the GALRs suggesting an alternative evolutionary scenario. Comparisons of the gene environment of AST-AR and GALRs in the annelid (C. teleta), insect (D. melanogaster) and vertebrate (H. sapiens) revealed orthologue genes. The GALR-like genome region in C. teleta possessed genes shared in the genome region flanking arthropod AST-AR and human GALR2 suggesting that the protostome GALRlike and vertebrate GALR genome regions are related and that the ancestral genes of metazoan GALR and protostome AST-ARs may have emerged from a common genome region in the early bilaterian genome prior to the protostome and deuterostome divergence. The Buccalin-R genome regions in C. teleta only shared genes with the equivalent AST-AR genome regions in D. melanogaster and no orthologues were found in human GALR or KISSR genome regions. Nonetheless, the evolutionary scenario that gave rise to the protostome ancestral AST-ARs, metazoan KISSR and GALR genes is difficult to resolve since protostomes and deuterostomes are suggested to have diverged approximately 535 million years ago (97,98) according to fossil records and global, lineage and speciesspecific genome events may have blurred evolutionary traits.

AST-C-Like System and Mantle Immune Function in M. galloprovincialis
In protostomes innate immunity is the main defence against pathogens. This system provides an immediate and effective response and depends on haemocytes that circulate in the haemolymph and humoral factors like lysozymes, complementlike molecules, peptidoglycan-recognition proteins (PGRPs) amongst others (71,99,100). Recent studies have revealed that immune factors in bivalves and molluscs are highly diverse due to species-specific expansions [(71, 101-106), Peng et al., submitted]. The diversity of bivalves, their adaptation to highly diverse environments, the plethora of pathogenic microorganism to which they are exposed, and their very different susceptibilities makes them excellent model systems to understand how environmental adaptation has shaped the immune system.
In arthropods, the role of the AST families in immunity is poorly resolved. In the haemolymph of the cockroach Diploptera punctate ASTs were suggested to be important regulatory peptides of the immune response (107) and in Litopenaeus vannamei, transcripts for the three AST peptide precursors (AST-A, AST-B/MIP and AST-C) were up-regulated in the haemocytes 6 hours after infection with the white spot syndrome virus (12). In Scylla paramamosain both AST-B/MIP and AST-BR/MIP-R were significantly induced in haemocytes and in the hepatopancreas after immune challenge and animals treated with AST-B/MIP had increased expression of pro-inflammatory cytokines and immune-effectors and receptor knock-down impaired the innate immune response to bacterial proliferation (14,108). The only report that links the AST-C system to immunity was performed in D. melanogaster where ASTC-R1 and R2 were up-regulated by pathogenic bacteria and ASTC-R2 knock-down significantly decreased survival (13). The effect of the AST-C system was linked to modulation of the "inhibition of immune deficiency" (Imd) pathway (13). The Imd pathway controls anti-bacterial peptide gene expression in the fat body in response to gram-negative bacterial infections and the pathway is suggested to be similar to the mammalian TNF-a pathway (109,110).
The orthologues of the arthropod AST families had a particularly high expression in the bivalve mantle (5,68,(111)(112)(113) a mucosal surface where host-pathogen interactions are initially established (70). Expression analysis (qRT-PCR and transcriptome analysis) indicated that the AST-C-like system was highly expressed in the bivalve mantle and mantle transcriptome data revealed modified AST-CR-like expression after a bacterial LPS challenge. The activation of the immune response in M. galloprovincialis exposed to a pathogen challenge was revealed by the significantly modified mantle expression of TLRa and LYG1 and was associated with a significant change in expression of the AST-C-like system. However, in contrast to what was described in D. melanogaster down-regulation of the receptors occurred in M. galloprovincialis mantle. The basis for the divergent response of the AST-C/AST-C-like system between D. melanogaster and M. galloprovincialis was not determined but may result from the character of the immune challenge used. In molluscs, homologues of the vertebrate TNF-a and TNFR have been described and respond to infection (114,115). However, if the AST-C-like system in bivalves modulates TNF-a and TNFR is unstudied and further work using targeted gene knock-down approaches or overexpression studies and defining the peptidereceptor pairs and their response to different pathogens will be required to characterise in detail the molecular basis of the innate immune response.

CONCLUSION
Peptide and receptor homologues of the arthropod AST-like families existed in molluscs. The diversity of AST peptides and receptor members found, and their widespread tissue distribution in molluscs suggests they are pluripotent factors that modulate a diversity of physiological processes. The mature peptides and receptor members (Buccalin, MIP and AST-C-like) were distinct across mollusc species, and this suggests that the AST families may be as complex as the neuropeptide system described in insects and other arthropods.
Phylogenetic analysis confirmed that receptors are specific to protostomes and emerged early in evolution prior to the Lophotrochozoa and Ecdysozoa divergence. In a previous study (37) we proposed that AST-ARs/Buccalin-Rs were most closely related to the metazoan KISSRs. In the present study by including AST-CRs in the phylogenetic analysis we found that they grouped with metazoan GALRs. Short-range gene linkage analysis of annelid genomes (the only protostomes where genes for Buccalin-R, GALR and KISSR persisted) confirmed that AST-AR and GALRs may be more evolutionary proximate, and we revised our previous evolutionary model for the protostome AST-ARs/Buccalin-Rs. In addition, we revealed for the first time the existence of Mollusca GALR-like genes in cephalopod genomes, and they were also present in annelids and a few other lophotrochozoans. This reveals that AST-AR/Buccalin-Rs evolved as a separate lineage and are not the orthologues of vertebrate GALRs as previously accepted.
Peptide precursors that encode for multiple mature peptides that diverge in sequence and the existence of lineage and speciesspecific duplicate receptors in molluscs suggests that the function of the Buccalin, MIP and AST-C-like systems may be distinct across species and that adaption to different environments may have affected gene evolution. Meta-analysis of tissue transcriptomes revealed that the Buccalin, MIP and AST-C-like families have a broad tissue distribution and varying abundance in several different bivalves, indicating it is an omniscient regulatory factor of the molluscan neuropeptide repertoire. The administration of an immune challenge to M. galloprovincialis significantly changed the expression of the AST-C-like peptide and receptor genes supporting its role in immunity and hinting at conservation of this function across protostomes. Overall, our study provides for the first time a comparison of the homologues of the three arthropod AST-systems across different molluscs and contributes to a better understanding of their evolution and function in protostomes.

DATA AVAILABILITY STATEMENT
The sequence, accession sequence numbers and datasets analyzed in this study are included in the Figures and also available in article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
JC and DMP planned and supervised the study. JC and JI performed the bioinformatic and comparative analysis. ZL and MP performed the experimental challenge, transcriptome analysis and expression studies. JC and DMP analysed the results and wrote the manuscript. All authors contributed to the article and approved the submitted version. AST-C-like peptide precursors. Each protein precursor encodes multiple small peptides except for AST-C-like where a single mature peptide exists. The predicted mature peptide sequences are highlighted in green. The protease cleavage sites are predicted based on the identification of dibasic residues (KR) and are in yellow. The predicted glycine that forms the Cterminal amide is annotated in blue. The predicted peptides were numbered with letters according to the order in which they appear in the precursors and are aligned in Figure 3. Sequences that were derived from mantle transcriptomes are annotated with a "*" and those deduced from the genome are indicated with "#". The orthologue peptide precursors and the localization of the mature peptides in the brachiopod L. anatina, two annelids P. dumerilii and C. teleta and from the insects D. melanogaster and T. castaneum are also represented for comparison. The conserved amino acids that are important for the peptides structure and function are highlighted in bold.
Supplementary Figure 2 | Deduced amino acid sequences of previously designated Mollusca buccalin-like precursors. The peptide precursors are now known to be members of the LASGLI/V-amide peptide family. Each precursor protein contains multiple small peptides. The predicted mature peptide sequences are highlighted in green. The protease cleavage sites are predicted based on the identification of dibasic residues (KR, RK) and are in yellow. The predicted glycine that forms the C-terminal amide are annotated in blue. Shading denotes amino acid conservation; dark grey represents 80% and black 100% conservation. Other sequence motifs suggested to be important for receptor function the motif DRY in ICL2 between TM3 and TM4 and NSxxNPxxY (where x represents any aa) within TM7 are marked with "*". The putative disulphide bridge between the ECL1 and ECL2 is annotated and the C-terminal cysteine for potential palmitoylation after TM7 is marked with "#". The N-terminal N-glycosylation sites conserved across the species are annotated with ".". The accession number of receptor genes are indicated. . Expression levels were normalized using the geometric mean of two reference genes (EF1a and 18S). The results are represented as the mean ± SEM of three (n= 3) biological replicates per group/sampling point. Significant differences (*p< 0.05 and ***p< 0.001) between the control and immune challenged groups of the same sampling period were detected using two-way ANOVA and a Sidak's multiple comparison test using GraphPad Prism version 8.0.0 software for Mac OS X.

Supplementary
Supplementary Table 1 | Accession numbers of the mollusc, brachiopod and annelid peptide precursors and receptor genes. The genome localizations (scaffold or genome fragment) and positions (first coding exon, Mb) are indicated. Accession numbers that correspond to scaffolds and genome scaffolds are represented in italics. The species taxonomic classification was based on http://www.marinespecies.org/.