Structural and evolutionary insights into astacin metallopeptidases

The astacins are a family of metallopeptidases (MPs) that has been extensively described from animals. They are multidomain extracellular proteins, which have a conserved core architecture encompassing a signal peptide for secretion, a prodomain or prosegment and a zinc-dependent catalytic domain (CD). This constellation is found in the archetypal name-giving digestive enzyme astacin from the European crayfish Astacus astacus. Astacin catalytic domains span ∼200 residues and consist of two subdomains that flank an extended active-site cleft. They share several structural elements including a long zinc-binding consensus sequence (HEXXHXXGXXH) immediately followed by an EXXRXDRD motif, which features a family-specific glutamate. In addition, a downstream SIMHY-motif encompasses a “Met-turn” methionine and a zinc-binding tyrosine. The overall architecture and some structural features of astacin catalytic domains match those of other more distantly related MPs, which together constitute the metzincin clan of metallopeptidases. We further analysed the structures of PRO-, MAM, TRAF, CUB and EGF-like domains, and described their essential molecular determinants. In addition, we investigated the distribution of astacins across kingdoms and their phylogenetic origin. Through extensive sequence searches we found astacin CDs in > 25,000 sequences down the tree of life from humans beyond Metazoa, including Choanoflagellata, Filasterea and Ichtyosporea. We also found < 400 sequences scattered across non-holozoan eukaryotes including some fungi and one virus, as well as in selected taxa of archaea and bacteria that are pathogens or colonizers of animal hosts, but not in plants. Overall, we propose that astacins originate in the root of Holozoa consistent with Darwinian descent and that the latter genes might be the result of horizontal gene transfer from holozoan donors.

In this article, we both dissected reported molecular structures and calculated new high-confidence computational models to analyse the molecular determinants of the most relevant astacin domains. Based on structural and molecular specifications of the prototypic astacin catalytic domain (CD), we further performed comprehensive sequence similarity searches to identify potential family members outside vertebrates to locate the origin of astacins according to Darwinian descent. Finally, we screened and reviewed the literature available for functional and evolutionary implications of the distinct astacin subfamilies outside vertebrates.
2 Results and discussion 2.1 Architecture and function of relevant astacin-family domains Astacins across all phyla minimally comprise a zinc-binding CD, which is preceded by an upstream propeptide or prodomain (PRO) for latency and a signal peptide (S) for targeting to the plasmalemma or the extracellular space in animals [(Gomis-Rüth et al., 2012a); Figure 1]. However, most astacins are multidomain proteins, which have acquired a diverse set of additional domains ( Figure 1). We retrieved reported experimental crystal structures of a CD, a "MAM" domain [first identified in meprin, A5 protein and receptor protein tyrosine phosphatase μ; (Cismasiu et al., 2004)], a "TRAF" domain [reminiscent of tumour-necrosisfactor receptor-associated factor; (Park, 2018)] and PRO domains, and computed high-confidence computational models of the "CUB" domain [first identified at the sequence level in complement subcomponents C1r/C1s, Uegf and BMP1; ] and epidermal growth factor (EGF)-like domains (see the Methods section) for their molecular analysis.
The CD is ascribed to protein family Pfam-01400 ( Figure 1) and spans~200 residues. It contains two or three disulfide bonds at variable positions (Gomis-Rüth et al., 2012a) and is divided into an upper N-terminal subdomain and a lower C-terminal subdomain by an extended active-site cleft, as first revealed by the crystal structure of archetypal crayfish astacin (Bode et al., 1992;Gomis-Rüth et al., 1993) (Figure 2A). The N-terminal subdomain is rich in regular secondary structure and contains a hallmark five-stranded β-sheet (β1-β5; Figure 2A), whose lowermost strand β4 frames the upper rim of the active-site cleft when viewed in the standard orientation of MPs (Gomis-Rüth et al., 2012b) (Figure 2A, left), and two helices: the "backing helix" and the "active-site helix". The latter encompasses most of a characteristic zinc-binding motif (HEXXHXXGXXH; aminoacid one letter code; X stands for any residue), which is found across astacins and other metzincin families Stöcker et al., 1993;Yiallouros et al., 2000;Gomis-Rüth et al., 2012a;Cerdà-Costa and Gomis-Rüth, 2014;Arolas et al., 2018). The helix includes the first two zinc-liganding histidines and the general base/acid glutamate required for catalysis (Arolas et al., 2018). After the glycine of the motif, the chain undergoes a sharp turn and enters the C-terminal subdomain, which is more irregular and just encompasses a short β-ribbon (β6-β7) and a "C-terminal helix" as regular secondary structure ( Figure 2A). The C-terminal subdomain provides two more zinc ligands, viz., the third histidine of the motif and a downstream tyrosine, which is swung out in a "tyrosine-switch" motion upon substrate binding to stabilize the reaction intermediate during catalysis (Grams et al., 1996;Yiallouros et al., 2000). This tyrosine is found two positions after another conserved element within metzincins, the "Met-turn" methionine Tallant et al., 2010), which creates a hydrophobic base for the metal-binding site (Tallant et al., 2010). The tyrosine and the methionine are embedded in a characteristic SIMHY-motif in astacins .

FIGURE 2
Representative structures of the most relevant astacin domains. (A) Ribbon-type plot of the mature Astacus astacus crayfish astacin catalytic domain [PDB 1AST; residues 50-251, see UniProt P07584; (Bode et al., 1992;Gomis-Rüth et al., 1993)], which is shown in the standard orientation of (Continued ) Frontiers in Molecular Biosciences frontiersin.org 04 2020; Guevara et al., 2022). Moreover, the new N-terminus binds the "family-specific" glutamate immediately after the third zincbinding histidine Gomis-Rüth, 2003), either directly through its side chain or through the α-amino group via a solvent molecule (Figure 2A). This feature is unique among MPs and reminiscent of trypsin-like serine endopeptidases, which dedicate an aspartate next to the catalytic serine to bind the likewise buried mature N-terminus (Bode et al., 1986). The astacin glutamate is immediately followed by an XXRXDRD motif (Gomis-Rüth, 2003) whose charged residues establish interactions relevant for domain stability.
A MAM domain is found after the CD in meprins α and β, Limulus and Hydra astacins and other (potential) family members ( Figure 1) (Arolas et al., 2012;Eckhard et al., 2021;Guevara et al., 2022). The crystal structure of human meprin β (Arolas et al., 2012;Eckhard et al., 2021) reveals that its MAM domain is a β-sandwich consisting of a four-and a five-stranded antiparallel β-sheet, which are twisted and rotated~25 degrees relative to each other ( Figure 2B). The domain conforms to a jelly-roll architecture featuring two four-stranded Greek key motifs and is connected by two disulfide bonds. Furthermore, the domain has a sodium-binding site, at which the cation is octahedrally coordinated by six oxygens from side chains and the main chain of the protein ( Figure 2B). The overall architecture of the domain conforms to the structural criteria defined for the MAM protein family (Pfam-00629), which was identified in silico in meprin α and β, A5 protein, and receptor protein tyrosine phosphatase μ . Comparison with other MAM domains reveals that the central β-sandwich is conserved but the loops responsible for functionality deviate, as well as the metalbinding capacity and arrangement (Aricescu et al., 2006;Yelland and Djordjevic, 2016). This domain appears to have adhesive functions and, in meprin β, it contributes to dimerization by bringing the CD and TRAF domains together (Arolas et al., 2012;Eckhard et al., 2021).
Uniquely for astacins, meprins α and β exhibit a TRAF domain downstream of the MAM domain ( Figure 1) (Arolas et al., 2012;Eckhard et al., 2021). The crystal structure of human meprin β (Arolas et al., 2012;Eckhard et al., 2021) shows that this moiety features two twisted four-stranded antiparallel β-sheets, which are rotated~40 degrees relative to each other ( Figure 2C, left) and give rise to a flatter sandwich than in MAM (compare Figure 2B, right and Figure 2C, right). The strands are connected by loops of variable length, which include two short helical segments plus a short β-ribbon and give rise to a double Greek key architecture. The second Greek key is inserted into the first one but does not form a jelly roll. The only cysteine of this domain (C 492 ) is buried and unbound, the N-and the C-terminus are on contiguous β-strands of the front β-sheet ( Figure 2C, left), the C-terminus protrudes from the top surface of the domain ( Figure 2C, right). In general, the TRAF domain of meprin β resembles tumor-necrosis-factor receptor-associated factors, which are mediators of cell activation engaged in homoand heterodimerization and originated the TRAF protein family (Pfam-00917) (Zapata et al., 2001).
Further relevant for astacins are CUB domains (Figure 1), which were first identified in complement subcomponents C1r/ C1s, Uegf and BMP1  and form protein family Pfam-00431. They occur in BTP-subfamily astacins including BMP1, as well as in echinoderm astacins, a paralogue within A. astacus and several other orthologues in up to five copies ( Figure 1). According to a highly reliable AlphaFold computational model (see Figure 2D and the Methods section), the first CUB domain of human BMP1 would be a β-sandwich made of an antiparallel four-stranded β-sheet and a mixed parallel/antiparallel five-stranded β-sheet, which would be both partially twisted. Their strands would be nearly parallel  (Gomis-Rüth et al., 2012b)] and vertically rotated by 90 degrees (right). Regular secondary structure elements are shown as yellow βstrands (β1-β7) and aquamarine α-helices (αA-αC). The first five strands constitute the typical five-stranded β-sheet of astacins (Gomis-Rüth et al., 2012a) and the helices are dubbed "backing helix" (αA), "active-site helix" (αB) and "C-terminal helix" (αC). The latter is split in two by a kink. Unbound mature astacin has its catalytic zinc cation (magenta sphere) bound in trigonal-bipyramidal coordination by the three histidines (①-③) of a characteristic zinc-binding motif [HEXXHXXGXXH; ] plus a more distal downstream tyrosine (④) and the catalytic solvent molecule [small red sphere; (Arolas et al., 2018)]. The glutamate within the motif (⑤) is the general base/acid for catalysis (Arolas et al., 2018). The "Met-turn" with the conserved methionine [⑥; Tallant et al., 2010)] is shown as an orange ribbon. The mature N-terminal residue (labelled N) is bound to the family-specific glutamate (E 103 ) [⑦; (Gomis-Rüth, 2003)] after the third zinc-binding histidine. The C-terminus is also labelled (C) and the two disulfide bonds of the structure (C 42 -C 198 and C 64 -C 84 ) are further displayed with sulphur atoms in green. (B) The structure of the unique EGF-like domain of human BMP1 predicted with AlphaFold (Jumper et al., 2021) shows two β-ribbons and three disulfide bonds. Two orthogonal orientations are displayed. (C) Experimental structure of the MAM domain of meprin β [PDB 4GWM; (Arolas et al., 2012)] in two orthogonal orientations. The β-sandwich domain (residues 259-427, see UniProt Q16820) features two disulfide bonds and a structural sodium cation (blue sphere) octahedrally coordinated by six protein oxygens. (D) Structure of the first CUB domain of human BMP predicted with AlphaFold in two orthogonal orientations, which show a β-sandwich architecture with two disulfide bonds. (E) Experimental structure of the TRAF domain of meprin β [PDB 4GWM; (Arolas et al., 2012)] in two orthogonal orientations. The β-sandwich domain (residues 428-597, see UniProt Q16820) has two short helices and a β-ribbon grafted into strand-connecting loops. (F-I) Experimental zymogen structures as Cα-traces in standard orientation (top panels) and after a vertical 90-degree rotation (bottom panels) of (F) crayfish astacin [PDB 3LQ0; (Guevara et al., 2010)], (G) human meprin β [PDB 4GWM; (Arolas et al., 2012)], (H) myroilysin from the bacterium Myroides sp. [PDB 5GWD; (Xu et al., 2017)] and (I) astacin from the horseshoe crab Limulus polyphemus [PDB 8A28; (Guevara et al., 2022)]. Only the PROs (aquamarine) and CDs (sandy brown) are displayed for clarity, together with the catalytic zinc ions (magenta spheres) and the side chains of the respective aspartate/cysteine-switch residue.

Frontiers in Molecular Biosciences
frontiersin.org ( Figure 2D, left), in contrast to MAM ( Figure 2B, left) and TRAF ( Figure 2C, left), and connected by mostly short loops. Two disulfide bonds would crosslink the domain. CUB domains were apparently present in the last common ancestor of eumetazoans and are currently found in synaptic proteins (González-Calvo et al., 2022). Remarkably, combinations of CUB and MAM domains are found in neuropilins, which are receptors for axon guidance cues and play synaptic roles (González-Calvo et al., 2022). Moreover, a CUB domain is engaged in the "Venusflytrap" mechanism of inhibition of endopeptidases by the human pan-peptidase tetrameric inhibitor α 2 -macroglobulin. It participates in major structural rearrangement of the C-terminal half of the protomer, which further includes three more domains (Marrero et al., 2012;Goulas et al., 2017;Luque et al., 2022). A CUB domain also participates in the "snap-trap" mechanism of monomeric α 2 -macroglobulin-related inhibitors from commensal and pathogenic bacteria such as Escherichia coli and Salmonella enterica (Wong and Dessen, 2014;Garcia-Ferrer et al., 2015;Goulas et al., 2017). Next, EGF domains (Pfam-00008) are widely present in up to six copies in several astacins, including meprins, BTPs and proteins from nematodes and echinoderms ( Figure 1). Generally, they are found in many animal proteins in the extracellular part of membrane-bound or secreted proteins (Bork et al., 1996). In meprin β, the EGF-like domain is considered a hinge domain, which moves the dimer from a membrane-proximal position for cleavage of transmembrane substrates, such as the amyloid precursor protein, to a membrane-distal position upon binding to its endogenous inhibitor fetuin B (Karmilin et al., 2019;Eckhard et al., 2021). We obtained a generally reliable AlphaFold computational model (see the Methods section) for the EGF-like domain of human BMP1 (see Figure 2E). It revealed a~40-residue structure crossconnected by three disulfide bonds for structural integrity and two β-hairpins, which overall conform to the standard architecture of these domains (Wouters et al., 2005).
Finally, large diversity is found across astacin PROs, which range between 34 and 486 residues and just share the motif FXGDI among animal orthologues (Guevara et al., 2010;Gomis-Rüth et al., 2012a). The PROs of crayfish astacin ( Figure 2F), human meprin β ( Figure 2G), bacterial myroilysin ( Figure 2H) and horseshoe crab astacin ( Figure 2I) have been structurally characterized. They revealed essentially unstructured peptides running along the cleft of the CD in the opposite direction of a true substrate, which precludes their intramolecular cleavage (Guevara et al., 2010;Arolas et al., 2012;Xu et al., 2017;Guevara et al., 2022). The catalytic solvent molecule bound to the catalytic zinc ion in the mature CD ( Figure 2A) is replaced by either the aspartate of the motif in the three animal zymogens (Guevara et al., 2010;Arolas et al., 2012;Guevara et al., 2022) or a cysteine in the bacterial enzyme (Xu et al., 2017), which lacks the motif. These residues operate according to an "aspartate-switch" or "cysteine-switch" mechanism of latency, respectively. Such mechanisms have been also reported, among others, for the MPs fragilysin-3 from Bacteroides fragilis (Goulas et al., 2011) and matrix metalloproteinases (Springman et al., 1990;Rosenblum et al., 2007), respectively.

Astacins possibly originate in Holozoa
Multicellularity presumably originated several times in unicellular opisthokont holozoans, which have been suggested as precursors of metazoans [(Sebé-Pedrós et al., 2017;Berman 2019)]. To better understand the hierarchical clustering of the distinct phyla within Holozoa, which originate 1.3 billion years ago (Berman 2019), and to put our phylogenetic studies into context, we tentatively assembled a consensus dendrogram based on current literature ( Figure 3) given the apparent disparity in the available models (see Section 3.2). This hypothesis entails that Holozoa would split into Teretosporea, themselves consisting of Corallochytrea/Pluriformea (alias Opisthokonta incertae sedis) and Ichthyosporea, and Filozoa. These, in turn, would divide into Filasterea and Choanozoa. The latter would consist of Choanoflagellata, which are unicellular flagellates, and Metazoa, which encompass the multicellular animals and date back to about 760 million years ago (Berman 2019). Up the tree, Bilateria would englobe animals with a plane of symmetry (including Xenacoelomorpha), except echinoderms, which evince post-larval (secondary) pentaradial symmetry. They sequentially would team up with Cnidaria, Placozoa, Porifera and Ctenophora to eventually form Metazoa (Figure 3).
We performed searches for astacins in several protein and gene databases (see Section 3.1), which revealed > 25,000 entries for potential peptidases of the M12A family. This is how astacins are defined in the MEROPS database of peptidases and their inhibitors [www.ebi.ac.uk/merops; (Rawlings and Bateman, 2021)]. In addition, > 12,000 sequences from > 1,000 species of identified and putative family members were found within family PF01400 within the PFAM database (Mistry et al., 2021). At this point, high-confidence manually curated sequence searches were performed with the sequence of the mature CD of crayfish astacin. The resulting hit sequences were verified to span the entire CD and contain the intact zinc-binding motif, as well as the family-specific glutamate followed by the XXRXDRD and SIMHY motifs, with just minimal conservative substitutions (Figure 4 reproduces selected aligned example sequences). They were further checked to contain a PRO with the zinc-blocking aspartate. A subgroup of sequences was chosen for alignments, phylogenetic tree construction and physiological considerations (Supplementary Table S1; Sections 2.4-2.7). In addition, Table 1 presents a selection of described and potential non-vertebrate metazoan astacins.
We consistently found astacin sequences from humans down the tree of life until the root of subphylum Vertebrata (a selection  Table S1). Vertebrata associate with the subphyla Tunicata/Urochordata, which includes sea squirts and the base tunicate Ciona intestinalis, and Cephalochordata, which features the lancelet (amphioxus), to form the phylum Chordata ( Figure 3). These taxa also evinced abundant astacins. In addition, we could find sequences for the phyla Echinodermata and Hemichordata within Ambulacraria, which together with Chordata form Deuterostomia. Within Ecdysozoa, we could retrieve sequences from Tardigrada and Arthropoda but not Schulze and Kawauchi, 2021). Phylum Chordata is further shown for its constituting subphyla Vertebrata, Tunicata/Urochordata and Cephalochordata. The first two give rise to Olfactores.
A selection of invertebrate and vertebrate sequences (Supplementary Table S1) enabled us to construct a phylogenetic tree for metazoan astacins (Figure 5), which was solely based on a sequence alignment of the respective CDs. Three sequences encompassed multiple CDs, which originate in the most basal sponge Amphimedon qeeenslandica (UniProt code [UP] A0A1X7U9V1), the nematode hookworm Ancylostoma caninum (UP A0A368GTC8) and the blow fly Lucilia cuprina (UP A0A0L0BYD0). This tree does not mirror the phylogenesis of organisms presented in Figure 3 since it contains both orthologous and paralogous astacins. Indeed, members of the distinct subfamilies evinced separate clustering ( Figure 5). The tree is not comprehensive, since only a selection of astacins comprising the minimal setup of typical sequential and structural motifs (as outlined above) were included. Nevertheless, certain clusters of astacins can be recognized. These are, starting clockwise in the upper left quadrant of the circular tree ( Figure 5), the 1) hatching enzymes, which originate from the same root as ovastacin; 2) ShKT-carrying astacins from chordates, nematodes, cnidarians, priapulids and arthropods (chelicerates and crustaceans), which include the prototypal crayfish astacin; 3) a clade containing the meprins; 4) a second cluster of ShKT-astacins, mostly from cnidarians (right centre); 5) proteins rich in MAM, EGF and CUB domains (right bottom); and 6) the BTPs (left bottom).

FIGURE 5
Phylogenetic tree based on the catalytic domains of a selection of 147 astacins. The list of species and UniProt and GenBank accession numbers are listed in Supplementary Table S1. The asterisk in the top right quadrant indicates the position of the prototypical name-giving enzyme astacin from the crayfish Astacus astacus. Crayfish astacin is translated with a signal peptide for extracellular targeting, a prodomain conferring latency and a catalytic protease domain (see also Figure 1). The domain compositions of astacins consisting merely of these three domains are omitted for clarity. Astacin-like proteases with more complex domain structures are shown schematically. A detailed list of domains with Prosite database accession numbers is contained in Figure 1 and in Supplementary Table S1.

Frontiers in Molecular Biosciences
frontiersin.org

Scattered presence of astacins outside Holozoa
Searches in non-holozoan Eukaryota including plants, fungi and the fungus-like Oomycota revealed merely < 400 sequences scattered across Alveolata, Stramenopila, Rhizaria, Archaeplastida (Haptista), Excavata (Discoba) and some Amoebozoa clades. By contrast, no astacins were detected in any of the other eukaryotic taxa except for the silver mallet wood Rhodamnia argentea [GenBank code (GB) XP_030553468], the only plant orthologue retrieved. However, this entry was debunked as a contamination with a tolloid-like chelicerate astacin from the wheat curl mite Aceria tosichella (UP A0A8B8R4B3).
The observation of astacin-like proteins within Stramenopila is remarkable since this taxon alone already accounts for~200 of the hits. These are heterokonts that were formerly grouped into fungi (Holomycota) but currently are considered to be closer to brown algae than to fungi (Sebé-Pedrós et al., 2017). They further include the Oomycetes ("egg fungi"), a clade containing many parasitic/saprophytic organisms, which in turn encompass most of the hits within Stramenopila. Among them is Aphanomyces astaci, a well-known parasite of the North American crayfish Cambarus clarkii, which developed resistance against this pest. However, when American crayfish were brought to Europe in the late 19th century, A. astaci infection caused the "crayfish plague" in the endogenous crayfish population (A. astacus), which almost caused its extinction (Diéguez-Uribeondo and Söderhäll, 1993). Oomycetes are known champions of horizontal gene transfer (HGT) (Koonin et al., 2001;Keeling and Palmer, 2008), thereby gathering enzymes useful to target their prey (Judelson, 2017). This genetic transfer route could thus explain the generally scattered but locally focused presence of astacin CDs, which is inconsistent with Darwinian descent, in eukaryotes outside Holozoa.
HGT could also account for the sporadic occurrence of astacin-like peptidases in archaea, bacteria and viruses, which would thus also be xenologues (Koonin et al., 2001). Examples of archaeal sequences were found in Candidatus korarcheota (UP A0A662SFB1), Nitrosopumilus sp. (GB MCA9827382), Methanotrichaceae archaeon (GB MBN1323470) and Halobacteriales archaeon QH_6_64_20 (GB PSP40402). Viral sequences were restricted to Lutzomyia reovirus 2 (UP A0A0H4M9A8). The more populous bacterial examples were from Bacillus cereus (GB WP_235610182), Acinetobacter baumanii (GB WP_207273295), Klebsiella pneumoniae (GB NAU77905), Bacillus thuringiensis (GB WP_228528809), Legionella pneumophila (GB WP_ 061484376), Bacillus mycoides (GB WP_186320991), among others. Overall, the vast majority of bacterial astacin hosts live in intimate contact with animals, which would facilitate HGT of genes from eukaryotes to prokaryotes. Among them are those of the biochemically studied proteins flavastacin from Flavobacterium meningosepticum [also known as Elizabethkingia meningoseptica; UP Q47899; (Tarentino et al., 1995)] and myroilysins from Myroides profundi (UP B5B0E6) and Myroides sp. CSLB8 (UP A0A0P0DZ84) (Xu et al., 2017;Ran et al., 2020). In the latter case, zymogenic latency was shown to follow a different mechanism from the animal forms [see Section 2.1; (Guevara et al., 2022)], which would further support an HGT event as the origin of its presence in the bacterium. This is reminiscent of the aforementioned fragilysin-3, which originates in a member of the human colon microbiota. Its CD was proposed to be an adamalysin/ADAM xenologue acquired by HGT from the host that separately evolved to derive a distinct mechanism of latency (Goulas et al., 2011;Goulas et al., 2013).

Frontiers in Molecular Biosciences
frontiersin.org

Functional and evolutionary aspects of meprin metallopeptidases
Two other human astacin genes, mepa and mepb, encode meprin α and meprin β for which orthologs have only been detected among vertebrates. Both meprins are membrane bound but meprin α is released already in the trans-Golgi network by furin cleavage and stays membrane bound only in association with meprin β. The latter is a "sheddase", which releases cellsurface proteins such as growth factors, cytokines, receptors, as well as amyloid precursor protein through cleavage at its βsecretase site. Deregulation of meprins leads to neurodegenerative diseases, changes in barrier function (such as in the blood brain barrier), inflammatory bowel disease, fibrosis, nephritis and cancer Arolas et al., 2012;Becker-Pauly and Pietrzik, 2016;Arnold et al., 2017;Eckhard et al., 2021;Gindorf et al., 2021;Bayly-Jones et al., 2022;Werny et al., 2022). The unique domain composition of meprins includes MAM, TRAF and EGF-like domains (Figures 1, 5), and although MAM-and EGF-containing astacins have been identified in other metazoan phyla (see Figures 1, 5; Supplementary Table S1), none of these are apparently membrane bound or exhibit comparable physiological potential to vertebrate meprins. Finally, our database searches unravelled meprin-like astacin-CDs also in basal vertebrates such as lamprey (MEPβ_PETMA) and hagfish (MEPα_EPTBU), which just encompass the S, PRO, CD and MAM moieties (see Figures 1, 5; Supplementary Table S1), similarly to a reported horseshoe crab enzyme [LASTMAM_ LIMPO;(Becker-Pauly et al., 2009;Guevara et al., 2022)].

Astacins in animal reproduction
The sixth human astacin is ovastacin, which is encoded by the astl gene and is expressed only in oocytes among mammals (Burkart et al., 2012). Absence of ovastacin results in subfertility, since it is released during the cortical reaction after intrusion of a sperm cell and causes hardening of the zona pellucida of the extracellular matrix surrounding the egg. This provides rigidity and robustness to the resulting embryo until its implantation in the uterus (Stöcker et al., 2014;Körschgen et al., 2017). Ovastacin has the basic domain composition S-PRO-CD, which is followed by a disordered domain of unknown function. This domain stays connected with the oolemma after the release of the enzyme into the perivitelline space during the cortical reaction, which suggests a function in membrane anchoring and shedding of ovastacin (Körschgen et al., 2017). A similar function to ovastacin was reported for alveolin from the medaka fish Oryzias latipes, which likewise hardens the envelope of the fertilized egg (zygote) in bony fishes (Shibata et al., 2000).
In egg-laying vertebrates like fishes, amphibians, reptiles and birds, a specialized group of astacin MPs termed hatching enzymes has evolved (Nagasawa et al., 2022). They are absent from mammals and involved in the cleavage of the eggshell. They optionally contain an additional pair of cysteine residues in the N-terminal subdomain of their mature CDs compared to crayfish astacin, as well as extra C-terminal CUB domains (Figures 1, 5). Hatching enzymes are also present in egg-laying invertebrates, such as the crayfish A. astacus, which in addition to the prototypic digestive astacin archetype possesses the "Astacus embryonic astacin" (AEA_ASTAS in Figure 5; Supplementary  Table S1; Geier and Zwilling, 1998). Finally, a reproductive astacin was also described from Drosophila seminal plasma (SEMP1_DROME; Figure 5; Supplementary Table S1). This MP is involved in a proteolytic cascade that triggers sperm capacitation and thus regulates fertility in the fruitfly (LaFlamme et al., 2012;LaFlamme and Wolfner, 2013;LaFlamme et al., 2014).

Astacins of helminths and cnidarians
The genome analysis of the roundworm Caenorhabditis elegans uncovered 40 astacins termed "nematode astacins" (Möhrlen et al., 2003;Park et al., 2010). Moreover, in a comprehensive analysis of 154 helminth species of the phyla Nematoda and Platyhelminthes, many of them parasitic, an enormous radiation of astacins was also observed (Martín-Galiano and Sotillo, 2022). Most remarkable are the > 100 different additional domains that occur downstream of the CD in variable combinations, thus yielding an enormous functional versatility for these proteins. These domains do not only include protein-protein and protein-carbohydrate interacting domains, but also additional enzymatic functions, such as trypsin-like serine peptidases and hydroxylases.
Particularly striking are astacins containing "ShKT" domains, which mimic a toxin from the sea anemone Stychodactyla helianthus that blocks potassium channels (Castañeda et al., 1995). We found such ShKT-astacins in Nematoda (CAEEL, ONCVO, ANCCA and TRISP; see Figure 5; Table 1; Supplementary Table S1), Plathyhelminthes (SCHMD, ECHMU and SCHJA), Arthropoda (PENVA and DAPMA), Priapulida (PRICA), Bryozoa (BUGNE and LINUN), Mollusca (MYTCO) and Cnidaria (HYDVU, HYDEC, PODCA, NEMVE), as well as in the lower chordates sea squirt (HALRO and CIOIN) and lancelet (BRABE). In Figure 5, ShKT-astacins are labelled with pink branch tips (see Supplementary Table S1). These astacins are mostly expressed in epithelia forming barriers to the environment or in the digestive tract, which suggests functions in protection and defence, as well as preservation of the epithelial integrity (Park et al., 2010;Isolani et al., 2018). However, considering the parasitic lifestyle of many of the organisms harbouring ShKTastacins, the combination of proteolytic and toxin domains may also challenge the respective host (Moran et al., 2013; Frontiers in Molecular Biosciences frontiersin.org Martín-Galiano and Sotillo, 2022). Similarly, toxicity has also been reported for astacins lacking ShKT domains from spider venoms (Trevisan-Silva et al., 2010), in which other proteins may take over the role of the latter domains in linking proteolytic activity with specific toxicity. This is reminiscent of the snake venom MPs from the adamalysin/ADAM family, for which forms spanning only the CD are not haemorrhagic while those encompassing further C-terminal disintegrin-like and cysteine-rich domains may be haemorrhagic (Fox and Serrano, 2009;Gomis-Rüth et al., 2013;Herrera et al., 2018). Finally, many ShKT-astacins carry additional CUB, EGF, MAM, etc. domains that may serve to modulate activity. Examples are the morphogenetically active Hydra vulgaris ShKT-proteins HMP1, HMP2 and HAS7 ( Figure 5; Supplementary Table S1). While HMP1 is involved in head formation and head regeneration (Yan et al., 1995), HMP2 has a function in foot morphogenesis (Yan et al., 2000) and HAS7 specifically cleaves Hydra WNT3 during head morphogenesis and thereby restricts head organizer formation (Ziegler et al., 2021;Holstein, 2022).

New functions of astacins
A surprising function for an astacin peptidase was very recently reported for the sea star Asterias rubens, which underpins the enormous versatility of nature's toolbox. In zoology textbooks [e.g., (Hickman et al., 2020)], the locomotion of Asteroidea on a surface is usually explained as the concerted action of a multitude of tiny sucking pods in the podia of the tube feet lining the oral face of the animal's arms. However, pod attachment is apparently not based on suction but on a secreted glue consisting of adhesive matrix proteins that are left on the surface after detachment. The latter is mediated by an astacin MP spanning a CD and a CUB domain (CUB_ASTRU; see Figure 5; Supplementary Table S1), which is specifically secreted by de-adhesive gland cells and releases the adhesive material from the surface of the tube feet (Algrain et al., 2022).

Compilation of a dendrogram for holozoans
The currently available trees for holozoans in zoology textbooks such as (Hickman et al., 2020) present several discrepancies with models proposed by recent research publications (Cannon et al., 2016;Giribet et al., 2019;Laumer et al., 2019). These are based on massive sequence data generated in the last decade by increasingly affordable sequencing methods, first through the Illumina MiSeq platform and more recently through MinION sequencing (Santos et al., 2020), which are also partially applicable to natural history collections (Folk et al., 2021). As an example, phylum Chaetognatha was considered a separate clade at the same level as Spiralia and Ecdysozoa, while it is currently envisaged as a sister clade of Gnathifera within Spiralia . Also setting phylum Porifera at the root of Metazoa contradicts recent models, which choose Ctenophora for this place (Giribet et al., 2019;Laumer et al., 2019). Finally, recent work proposed new models at the base of Holozoa, with Ichtyosporea and Corallochytrea/Pluriformea forming Teretosporea, which together with Filozoa give rise to Holozoa (Torruella et al., 2015;Arroyo et al., 2020). Accordingly, we assembled a dendrogram for holozoans based on consensus information extracted from these and other recent publications (Ryan et al., 2010;Ruggiero et al., 2015;Torruella et al., 2015;Cannon et al., 2016;Kocot et al., 2017;Lu et al., 2017;Sebé-Pedrós et al., 2017;Whelan et al., 2017;Adl et al., 2019;Giribet et al., 2019;Laumer et al., 2019;Marlétaz et al., 2019;Sogabe et al., 2019;Schoch et al., 2020;Schulze and Kawauchi, 2021).

Computation of alignments and phylogenetic trees
Amino-acid sequence alignments and phylogenetic trees were computed using the Seaview program (http://doua.prabi. fr/software/seaview) (Galtier et al., 1996;Gouy et al., 2010). Alignments were performed with Clustal Omega within Seaview (Sievers et al., 2011) using default parameters. Manual adjustment of the S 1 ' regions in Figure 4 was performed based on overlays of the X-ray crystal structures of crayfish astacin [Protein Data Bank (PDB) access codes 1AST, 1QJI] and zebrafish hatching enzyme 1 (PDB 3LQB). Phylogenetic trees were calculated with PhyML using the maximum likelihood approach as implemented in Seaview . The BLOSUM62 scoring matrix was used and 100 bootstrap replications were computed in each case. Trees were represented with Figtree (http://tree.bio.ed.ac.uk).

Computation of three-dimensional structural models
Precise and accurate computational models of specific astacin domains were obtained with the AlphaFold program (Jumper et al., 2021;Tunyasuvunakool et al., 2021). To this aim, the program was locally installed in a high-performance computing cluster operated with linux and the amino acid sequences were fed into the program, which was run employing standard parameters. Quality of the predicted models was monitored through the average predicted local distance difference test (pLDD1) value. Values exceeding 90% are considered to originate in high-accuracy models, while those above 80% correspond to generally correct models for the backbone (Tunyasuvunakool et al., 2021). The unique EGF-like domain of human BMP1 (residues 547-590, see UP P13497) was predicted as a generally correct model for the backbone given an average pLDD1 value of 82.5%. Moreover, the first CUB domain of human BMP1 (residues 321-434, see UP P13497) was predicted highly accurately (average pLDD1 = 91.5%). Three-dimensional structure figures were prepared using Chimera (Goddard et al., 2018).

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors upon reasonable request.

Author contributions
FXG-R and WS conceived and supervised the project, performed calculations and sequence searches, and wrote the manuscript.

Acknowledgments
Special thanks to I. Ruiz-Trillo and M. Leger from the Institute of Evolutionary Biology (CSIC-UPF) for their assistance in identifying astacin sequences at the root of Holozoa.

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.