DM9 Domain Containing Protein Functions As a Pattern Recognition Receptor with Broad Microbial Recognition Spectrum

DM9 domain was first identified in Drosophila melanogaster, and it was subsequently found to integrate with or without other protein domains across a wide range of invertebrates and vertebrates. In the present study, a member of DM9 domain containing protein (DM9CP) family from marine invertebrate Crassostrea gigas (designated CgDM9CP-1), which was only composed of two DM9 domains, was taken as a protein model to study the biological functions of DM9 domain and its molecular determinants. CgDM9CP-1 was found to exhibit high binding specificity and avidity toward d-mannose residue. It served as a pattern recognition receptor (PRR) with a broad range of recognition spectrum to various pathogen-associated molecular patterns, including lipopolysaccharide, peptidylglycan, mannan, and β-1, 3-glucan in a d-mannose-dependent manner, as well as bacteria and fungi. In order to reveal the molecular mechanism underlying its pattern recognition activity, the crystal structures of wild-type and loss-of-function mutants were solved, and Asp22 and Lys43 were found to be the critical residues for ligand recognition. Moreover, CgDM9CP-1 protein was found to mainly distribute on the surface of C. gigas hemocytes, and it could be translocated into cytoplasm and colocalized with the engulfed microbes during hemocyte phagocytosis. The present result clearly indicated that CgDM9CP-1 was a PRR, and it provided an important clue for the better understanding of DM9CP function.

domains. The cytotoxic activity was probably associated with the ETX/MTX2 domain rather than the DM9 domain, because natterin-like proteins containing only the former toxin domains led to transmembrane pore formation activity, which killed the target cells (4). Recently, increasing DM9 domain containing proteins (DM9CPs) were discovered in both invertebrates and vertebrates. In D. melanogaster, a DM9CP (CG16775) was found to be significantly upregulated after oral infection by entomopathogenic Pseudomonas entomophila (5), while another three DM9CPs (CG3884, CG10527, and CG13321) were revealed to be part of functional complexes involved in the engulfment of microbial pathogens, intracellular trafficking and phagosome modulation (6,7). In addition, the expression of a DM9CP named Plasmodium responsive salivary 1 (PRS1) was significantly upregulated in the lateral lobe of the salivary glands of Anopheles gambia after the invasion of protozoan pathogen Plasmodium, and its expression level increased proportionally to the number of infecting sporozoites (8,9). In vertebrates, DM9CPs were found not only in venomous fish T. nattereri but also in viperid snake Bothrops jararaca, which could cause cell necrosis, edema, and even permanent disabilities in humans (10,11). These studies strongly suggest that DM9CPs are participated in the immune response. However, their detailed biological functions and its underlying structural basis are still not well understood.
Pattern recognition is an evolutionarily conserved immune process vital for multicellular organisms to discriminate self from non-self (12,13). Unlike adaptive immunity with its huge repertoire of lymphoid cell-surface receptors, the innate immune system employs a limited number of receptors, called pattern recognition receptors (PRRs). These PRRs recognize pathogenassociated molecular patterns (PAMPs) presented exclusively on the surface of microorganisms (14). So far, a number of PRRs have been identified with important biological functions in microbial pathogenesis and immune responses (15). For example, some Tolllike receptors (TLRs) and NOD-like receptors recognize various PAMPs through their leucine-rich repeats and modulate downstream inflammatory responses (16,17). RIG-like helicases act as cytoplasmic sensors for virally derived dsRNA by using helicase domain and subsequently activate antiviral responses (18). PRRs play important roles in the initiation of innate immune defense as well as activation of adaptive immunity through different mechanism (19). For example, some PRRs are mainly associated with signaling activation. The interaction of TLR4 with PAMP ligands can activate the antimicrobial response in macrophages (20). While some other PRRs predominantly function as phagocytic receptors. For instance, when the triggering receptors expressed on myeloid cells 2 was expressed in Chinese hamster ovary cells, it promoted the binding and phagocytosis of Escherichia coli and Staphylococcus aureus (21). A growing number of PRRs are currently being discovered in both invertebrates and vertebrates, which help us to better understand the molecular mechanisms of microbial pathogenesis and host immune defense.
Recently, Unno et al. reported the isolation and characterization of a DM9CP (designated CgDM9CP-1 in the present study) from marine invertebrate Crassostrea gigas (22). It was found to possess high binding specificity toward mannose and high mannose-type N-glycans with the application potential as a research and clinical tool for probing glycans. As a parallel study, we started the project of identification and characterization of DM9CP in early 2013, and the biological function of CgDM9CP-1 was further investigated in the present study with the aim to reveal its role in innate immunity and the potential molecular determinants. CgDM9CP-1 was found to serve as a PRR with extensive microbial binding and agglutination activities. The crystal structure of wild-type and loss-of-function mutants revealed the molecular mechanism underlying its pattern recognition activity. In addition, the molecular phylogeny combined with WebLogo analyses indicated that DM9CPs were ubiquitously distributed and sequence conserved across biological kingdoms, which provided an important clue for the functional study of DM9CP family during the evolution.

MaTerials anD MeThODs cultivation of animals
Adult C. gigas, 10-15 cm in length and 150-200 g in weight, were collected from a farm in Qingdao, Shandong Province, China, and acclimated in aerated seawater at 18°C for two weeks prior to use. BALB/C mice were purchased from Qingdao institute for the control of drug products. All the experiments were conducted according to the regulations of local and central government. All animal-involving experiments of this study were approved by the Ethics Committee of Institute of Oceanology, Chinese Academy of Sciences (23).

The extraction of crude Protein from C. gigas
The shelled fresh C. gigas were crushed and homogenized with a Dounce tissue grinders (Sigma, USA), and 500 g of the wet mass was suspended and extracted with 1,000 ml TBS (50 mM Tris-HCl, pH 7.4, 150 mM NaCl) at 4°C with continuously agitation over night. The extract was centrifuged at 12,000 g for 30 min, and the crude proteins in the supernatant were precipitated with 80% (w/v) ammonium sulfate at 4°C over night. The protein precipitate was collected by centrifugation at 12,000 g for 1 h, followed by extensively dialysis against TBS for three times at 4°C. The supernatant was collected after centrifugation at 15,000 g for 1 h and filtered through a 0.45 µm membrane.

Preparation of carbohydrates coupled sepharose 6B Matrix
Carbohydrate coupled Sepharose 6B matrix was prepared according to our previous study (24). In brief, epoxy-activated Sepharose 6B matrix (GE Healthcare, Sweden) was washed extensively with distilled water, and then mixed with four kinds of carbohydrates, including d-lactose, N-acetyl-d-glucosamine (d-GlcNAc), d-mannose, and l-fucose (Sigma-Aldrich, Buchs, Switzerland), at a final concentration of 200 µM in distilled water (pH 13.0). The mixtures were incubated at 30°C with gentle shaking for 16 h, and 1 M ethanolamine (pH 8.0) was used to block the remaining active groups after TBS washing. The carbohydrate coupled Sepharose 6B matrix was then washed extensively with TBS, and

sequence retrieval, Domain Prediction and alignment of DM9cPs
The coding sequence of CgDM9CP-1 was identified by searching the PMF against C. gigas genome database. The homology searches of amino acid sequence of CgDM9CP-1 were conducted with BLAST at the National Center for Biotechnology Information (NCBI). 1 The protein domain was predicted with the simple modular architecture research tool (SMART) 2 and conserved domain database (CDD) search service. 3 Multiple alignment was performed with the Clustal X 4 and the alignment show software Jalview. 5 Conserved amino acid residues among DM9CPs were identified based on the sequence alignment and presented using WebLogo V3. 6

Phylogenetic analysis
Phylogenetic tree was constructed as previously reported (27). Briefly, the amino acid sequences of DM9CPs were searched against NCBI and the Joint Genome Institute. 7  alignments were generated using Clustal W 8 with default parameters. The alignment was imported into the phylogenetic analysis program MEGA, 9 and a maximum likelihood tree was generated. A circular phylogenetic tree was then constructed using the interactive tree of life server. 10 cloning of the cgDM9cP-1 gene Total RNA was isolated from the hemocytes of C. gigas using Trizol reagent (Invitrogen, USA) according to the manufactor's instruction. RQ1 RNase-free DNase (Promega, USA) was used to digest genomic DNA, and Moloney murine leukemia virus reverse transcriptase (Promega, USA) was used to synthesize cDNA from 1 µg of total RNA. The CgDM9CP-1 coding sequence was amplified using ExTaq polymerase (TaKaRa, Japan) and primers listed in Table 1 with the temperature profile as following: 5 min denaturation step at 95°C and completed by a 10 min extension step at 72°C, with 30 s at 94°C, 20 s at 50°C, and 30 s at 72°C for 30 cycles. The amplified product was purified and cloned into pET-30a vector (Novagen, USA) according to the manufacturer's instruction. The coding sequence of CgDM9CP-1 was confirmed by sequencing, and the plasmid was then transformed into E. coli Transetta (DE3) cells.

site-Directed Mutagenesis
The mutagenic primers were designed using the PrimerX tool 11 ( Table 1). CgDM9CP-1 mutants, including D22A, K43A and H52A, were constructed by PCR amplification. The coding sequence of mutants were cloned and inserted into the pET28a (+) vector (Novagen, USA), which encoded 6× His-tag at the N-terminal of mutants. All the mutants were confirmed by DNA sequencing, and the plasmid was then transformed into E. coli Transetta (DE3) cells.

glycan Microarray analysis
To determine the carbohydrate-binding specificity of rCg-DM9CP-1, glycan array screening was performed by the consortium for functional glycomics (Core H) 12 according to the standard protocol (28). The glycan array (version 5.1) was printed with 610 different natural and synthetic glycans. In the glycan array screening, His-tagged wild-type rCgDM9CP-1 was incubated with slides, and the bound rCgDM9CP-1 was determined by fluorescence labeled anti-His tag antibodies. The fluorescence intensity was detected using a ScanArray 5000 confocal scanner (PerkinElmer, USA). ImaGene image analysis software (BioDiscovery, USA) was used to analyze the image. The relative binding for each glycan was expressed as mean relative fluorescence unit (RFU) of four from the six replicates, with the highest and lowest RFU removed.
isothermal Titration calorimetry (iTc) Isothermal titration calorimetry experiments on the interaction of rCgDM9CP-1 and its mutants with carbohydrates were performed at 25°C with a VP-ITC isothermal titration calorimeter (Microcal, USA). The freshly purified wild-type rCgDM9CP-1 was dialyzed overnight in PBS (pH 7.4) at 4°C and the protein concentration in the microcalorimeter cell (1.4478 ml volume) was adjusted to 0.05 mM. Carbohydrate solutions, including d-mannose, l-mannose, d-glucose, d-galactose, d-lactose, l-fucose, and d-GlcNAc at a final concentration of 5 mM in PBS were placed in the syringe. After the first injection with 4 µl, 27 injections of 10 µl were conducted with a stirring rate at 300 rpm. The dilution heats of the carbohydrates were measured by injecting different carbohydrate solutions into buffers alone and were subtracted from the experimental curves prior to data analysis. The determination of interactions between mutants and d-mannose were performed by the procedure described above, except that the d-mannose concentration in the syringe was adjusted to 2.5 mM in PBS. The experimental data was fitted to a theoretical titration curve using Microcal ORIGIN software supplied with the instrument, and the standard molar enthalpy change for the binding, D b m 0 H , and the dissociation constant, Kd, were derived. The standard molar free energy change, D b m 0 G , and the standard molar entropy change, D b m 0 S , for the binding reaction were calculated by using the following thermodynamic equations: The purified proteins were concentrated to 10 mg/ml by a centrifugal filter with 3 kDa cutoff (Millipore, MA, USA) at 4°C. The protein precipitate was removed by centrifugation at 12,000 g for 30 min, and the supernatant was further filtered through a 0.45 µm filter (Millipore, MA, USA). The protein sample was loaded onto a Superdex 200 10/300 GL gel-filtration column (GE Healthcare, Sweden) equilibrated with TBS at a flow rate of 0.5 ml/min on an AKTA avant chromatography system. The eluates corresponding to the peak areas were collected and used for crystallizations. Standard crystallization screening was carried out using a

Data collection and Processing
Native data were collected at beamline X10SA at the Swiss Light Source (SLS). Long wavelength data for native single-wavelength anomalous diffraction (SAD) phasing were collected at beamline X06DA at SLS. For collection, the crystal was reoriented twice during data collection using the PRIGO mutiaxis goniometer (29). According to the single crystal native SAD phasing strategy described before (30)

structure solution
The structure was solved using the SHELXC/D/E pipeline with hkl2map as graphical user interface. Four sites were readily identified with SHELXD with a resolution cutoff of 2.6 Å resulting in CCall and CCweak of 46.0 and 28.5, respectively. Density modification in SHELXE was carried out for 60 cycles using the high resolution native data up to 1.1 Å resolution with a solvent content of 0.45. The initial model was auto built using BUCCANEER and completed with 144 of 144 residues built. The other structures were determined employing the Molecular Replacement Method using the native model DM9CP (PDB ID: 5MH0) as an ensemble in the program PHASER (33).

refinement
The structures were refined initially using REFMAC5 (34) and PHENIX REFINE (35) for the final stages. Necessary model improvements as well as search for solvent molecules were performed using COOT (36) and "update water" in PHENIX REFINE. Anisotropic thermal displacement factors were refined at 1.3 Å or better resolution, otherwise using the TLS model. Glycerol was tentatively built in the model to allow for the corresponding electron density as well as d-mannose in the density of DM9CPm.

culture of Bacterial and Fungal cells
Vibrio splendidus was grown in 2216E media at 28°C, 220 rpm for 12 h. E. coli, S. aureus, and Bacillus subtilis were grown in LB media at 37°C, 220 rpm for 8 h. Pichia pastoris and Yarrowia lipolytica were grown in YPD media at 30°C, 220 rpm for 24 h. All microbes were grown to mid-log phase, harvested by centrifugation at 6,000 g for 15 min, and washed three times with PBS.

Preparation of FiTc-labeled Microbes
Microbes were collected, fixed with 4% paraformaldehyde (PFA) as previously reported (37), and mixed with 1 mg/ml FITC (Sigma, USA) in 0.1 M NaHCO3 (pH 9.0) buffer with continuous gentle stirring at room temperature overnight. The FITC-labeled microbes were washed with PBS for three times to eliminate free FITC molecules.

Microbial Binding and agglutination assay
Microbes including E. coli, V. splendidus, B. subtilis, S. aureus, P. pastoris, and Y. lipolytica (10 8 cells/ml) were incubated with 3% BSA in PBS for 1 h to block the non-specific binding sites. After three times PBS washing, microbes were incubated with or without rCgDM9CP-1 (control group) at a final concentration of 0.5 mg/ml for 30 min at room temperature. The cells were then washed three times with PBS and incubated with FITC labeled anti-His tag antibody for 1 h at room temperature. After extensive washing, the samples were examined by flow cytometry to detect the microbial binding activity, and analyzed by the fluorescence microscopy to determine the microbial agglutination activity.

antibiotic assay
The antibiotic assay was performed as described previously (38). Briefly, microbes (10 8 cells/ml) were incubated with 3% BSA in PBS for 1 h, followed by incubation with rCgDM9CP-1 at a final concentration of 0.5 mg/ml at room temperature for 2 h. The cells were then washed three times with PBS and incubated with propidium iodide (5 µg/ml) at room temperature for 10 min. After extensive washing, the samples were examined by flow cytometry.

PaMP Binding assay
Pathogen-associated molecular pattern binding activity was determined by modified enzyme-linked immunosorbent assay (ELISA eluted IgG was concentrated and the binding specificity toward rCgDM9CP-1 was determined by western blotting.

Western Blotting analysis of cgDM9cP-1 expression
Hemolymph samples from 10 adult C. gigas were prepared as described previously (39). After centrifugation at 800 g, 4°C for 10 min, supernatant was collected, and hemocytes were pelleted. Hemolymph was centrifuged at 12,000 g, 4°C for 10 min to remove cell debris. Hemocytes were washed with PBS (pH 7.2) for three times at 800 g, 4°C for 10 min, and lysed in RIPA buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 1% Nonidet P-40, 0.5% deoxycholate and 0.1% SDS) on ice for 15 min. The cell lysate was collected after centrifugation at 12,000 g, 4°C for 10 min. Hepatopancreas, gill, mantle, and adductor muscle from 10 C. gigas were collected and homogenized in RIPA buffer using Dounce tissue grinders (Sigma, USA). The supernatant was collected by centrifugation at 12,000 g, 4°C for 20 min. The protein concentration was determined by BCA Protein Assay Kit (Pierce, USA). The samples with same amount of proteins (30 µg) were separated by SDS-PAGE. The proteins were transferred from the gel to the polyvinylidene difluoride membranes (Millipore, USA), and the membrane was soaked with 5% BSA in TBST. The membrane was then incubated with antibodies against rCgDM9CP-1 and β-tubulin for 1 h, respectively, followed by HRP labeled secondary antibodies incubation for 1 h. The immune-reactive protein bands were visualized by using an enhanced chemiluminescence kit (Pierce, USA).

Flow cytometric analysis of cgDM9cP-1
Hemolymph was extracted from the posterior adductor muscle sinus using a 2 ml syringe equipped with a 23 G sterile needle, and immediately mixed with pre-chilled anticoagulant citrate dextrose solution A (ACD-A, 0.1 mol/l trisodium citrate, 0.11 mol/l dextrose, and 71 mmol/l citric acid monohydrate) at a volume ratio of 7:1. The hemocytes were harvested by centrifugation at 1,000 g at 4°C for 10 min, washed twice, and suspended in modified Leibovitz L15 medium (supplemented with 0.54 g/l KCl, 0.60 g/l CaCl2, 1.00 g/l MgSO4, 3.90 g/l MgCl2, and 20.20 g/l NaCl). After incubation with 3% BSA for 1 h, the hemocytes were incubated with polyclonal IgG against rCgDM9CP-1 for 1 h, while the control hemocytes were incubated with isotype IgG under the same condition. After extensive washing, the hemocytes were incubated with FITC labeled goat anti mouse IgG for 1 h. The fluorescence intensity and the mean fluorescence intensity (10,000 cells) of hemocytes were determined on a FACSAria II flow cytometer (BD Biosciences, USA).
confocal Microscopy Hemocytes were prepared as described above, and plated on glass-bottom culture dishes and incubated at 18°C for 3 h. For the CgDM9CP-1 distribution analysis, 4% PFA was added to fix cells at 4°C for 15 min followed by three times PBS washing, and 0.1% Triton X-100 was added for permialization for 10 min. After PBS washing, 3% BSA in PBS was added to block the nonspecific binding sites for 1 h. Polyclonal IgG against rCgDM9CP-1 were incubated with hemocytes for 1 h, and Alexa Fluor 488 labeled goat antimouse IgG was incubated with hemocytes for 1 h after extensive PBS washing. Hemocytes were further incubated with Dil for 30 min to stain cytoplasmic membrane, and incubated with DAPI for 5 min to stain cell nucleus. For the phagocytosis assay, hemocytes were precultured with FITC labeled microbes for 1 h, and then fixed and permealized as described above. Polyclonal IgG against rCgDM9CP-1 were incubated with hemocytes for 1 h. After PBS washing, Alexa Fluor 594 labeled goat anti-mouse IgG was incubated with hemocytes for 1 h, followed by DAPI staining for 5 min. The hemocytes were monitored and the fluorescent images were taken using Carl Zeiss LSM 710 confocal microscope (Carl Zeiss, Germany).

statistical analysis
The two-sample Student's t-test was used for comparisons between groups. Statistical analysis was performed with GraphPad Prism 5 software. Results are shown as means ± SEM, and statistical significance was defined as P < 0.05.

resUlTs
The high Binding specificity and avidity of cgDM9cP-1 toward d-Mannose Carbohydrate affinity chromatography was employed to isolate glycan binding proteins from crude protein extract of C. gigas. A single protein was highly enriched by d-mannose affinity chromatography, whereas d-lactose, N-acetyl-d-glucosamine (d-GlcNAc), and l-fucose affinity chromatography yielded far less ( Figures S1A,B in Supplementary Material). The d-mannose binding protein was analyzed by MALDI-TOF/TOF-MS, and the amino acid sequence was identified by searching ten tryptic peptides against the genome of C. gigas (Figures S1C,D and Table  S1 in Supplementary Material). The d-mannose binding protein was identical to the recently reported protein (22). Sequence alignment revealed that CgDM9CP-1 was not homologous to any other known lectins or carbohydrate binding proteins. Protein domain analysis using NCBI's CDD and SMART showed that CgDM9CP-1 was only composed of two DM9 domains ( Figure S2A in Supplementary Material), and they shared 33% sequence identity ( Figure S2B in Supplementary Material). The carbohydrate binding specificity of rCgDM9CP-1 was determined by high-throughput glycan microarray printed with 610 different glycans (version 5.1, the consortium for functional glycomics). His-tagged rCgDM9CP-1 was found to bind nonreducing mannosyl glycans with the highest binding specificity. These glycans can be either branched high-mannose oligosaccharides or mannosylated bi-or triantennary hybrid oligosaccharides ( Figure 1A; Table S2 in Supplementary Material). The former, for example, glycan #2, #5, #6, and #8, are uniformly composed of mannose residues, while the latter contain at least one branch antenna modified with non-reducing mannose termini, for instance, glycan #1, #3, #4, and #7. rCgDM9CP-1 also exhibited strong binding activity toward glycan #9 [6S(3S) Galβ1-4(6S)GlcNAc], which did not contain mannose residues, suggesting that there might exist another carbohydrate binding    mechanism different from that of mannosylated glycans. The carbohydrate binding capacity of rCgDM9CP-1 was further revealed by ITC. The calorimetric data of d-mannose binding was fitted to one binding site model, and the dissociation constant (Kd) was determined to be 122.6 ± 19.7 µM (Figure 1B, left panel and Table 2). Compared with the relatively high binding affinity to d-mannose, rCgDM9CP-1 showed almost no binding affinity to other carbohydrates, including d-glucose, d-galactose, d-lactose, l-fucose, and d-GlcNAc ( Figure S3 in Supplementary Material). Moreover, it did not exhibit any binding activity toward the l-isomer of mannose ( Figure 1B, right panel).
structural Basis for the specific Binding to d-Mannose Recombinant CgDM9CP-1 was prepared for crystallization by d-mannose-Sepharose 6B affinity chromatography followed by gel filtration chromatography (Figure 2A; Figure S4A in Supplementary Material). The crystal structure of rCgDM9CP-1 was determined at 1.24 Å resolution using the single crystal native SAD phasing strategy (PDB: 5MH0, Figures S5A,B in Supplementary Material). The diffraction data collection and structure refinement were summarized in Table 3. The crystal structure of rCgDM9CP-1 in complex with d-mannose was solved to reveal the potential residues involved in the d-mannose binding (5MH1, Figure S5C in Supplementary Material), which exhibited similar features to that of native CgDM9CP-1 reported by Unno et al. (22). Moreover, the side chain from Gly128 and four water molecules were also found to participate in the formation of hydrogen bond net work between d-mannose and rCgDM9CP-1 ( Figure 2E). In order to confirm the determinants of interaction between rCgDM9CP-1 and d-mannose, the residues of Asp22, Lys43, and His52 were individually mutated to alanine, and the His-tagged mutants were purified by Ni-NTA affinity chromatography  (Figures 2B,C), while the mutation of H52A exhibited a limited effect on the d-mannose binding activity ( Figure 2D). The crystal structures of D22A (PDB: 5MH2) and K43A (PDB: 5MH3) were superimposed with that of rCgDM9CP-1 in complex with d-mannose, respectively. Compared with the wild-type protein, both the two mutants exhibited much smaller nonpolar side chains, which significantly impaired the hydrogen bond formation between d-mannose and rCgDM9CP-1 (Figures 2F,G).

The involvement of cgDM9cP-1 in Phagocytosis toward Microbes
The distribution of CgDM9CP-1 in different tissues was examined by Western blotting. CgDM9CP-1 was highly expressed in hepatopancreas, mantle and hemocytes, lower expressed in gill, while it was hard to be detected in muscle and hemolymph ( Figure 5A; Figure S6 in Supplementary Material). Flow cytometric analysis revealed that CgDM9CP-1 was distributed on the outer membrane of hemocytes with high abundance (Figure 5B). Confocal analysis further confirmed that CgDM9CP-1 was mainly distributed on the hemocyte membrane, while less in the cytoplasm under non-challenged condition ( Figure 5C). When the hemocytes were incubated with microbes, including E. coli, V. splendidus, S. aureus, and Y. lipolytica, to induce phagocytosis, CgDM9CP-1 could internalize from the cell membrane into the cytoplasm, and was found to colocalize with or surround the engulfed microbes (Figures 5D-G). Moreover, CgDM9CP-1 was found to exhibited antibiotic activity toward Y. lipolytica, while its antibiotic activity toward Gram-negative and Gram-positive bacteria was weak ( Figure S7 in Supplementary Material).

DM9cPs are Ubiquitously Distributed and sequence conserved
To date, up to 477 DM9CPs have been annotated in the released genomes of a wide range of organisms from the Procaryotae, Fungi, Protista and Animalia Kingdoms ( Figure S8A and Table S3 in Supplementary Material). DM9CPs are found to be of multi-copy in animal species, especially in invertebrates, with approximately ten similar genes in each Drosophila species (Figure 6A; Figure  S8B in Supplementary Material). In C. gigas, seven DM9CPs were annotated, and they all contained two DM9 domains and shared high similarity with each other (Figures S2C,D in Supplementary  Material). In order to further analyze the potential biological function of DM9CP family members, CgDM9CP-1 was aligned with another 476 DM9CPs using WebLogo ( Figure 6B). The primary structure alignment illustrated that the amino acid sequence of DM9 domains were highly conserved throughout evolution, such as Asp22 and Lys43, which were found to be essential for the ligand binding in CgDM9CP-1.

DiscUssiOn
The potential biological functions of DM9 domain have been reported in several invertebrates and vertebrates. In invertebrates, a DM9CP (CG16775) from D. melanogaster was found specifically upregulated after oral infection by microbial pathogens but not septic injury with unknown functions (5). A system biology analysis of phagosome combined with the protein interaction network of D. melanogaster revealed that three DM9CPs (CG3884, CG10527, and CG13321) could interact with other proteins to form functional complexes involved in phagocytosis of microbial pathogens and phagosome modulation in the innate immunity (6,7). A DM9CP named PRS1 was predominantly expressed in distal part of the lateral lobe of salivary glands, where the protozoan pathogen Plasmodium infected the A. gambia (9).  After infection, the expression level of PRS1 was upregulated not only in salivary glands but also in midgut epithelial cells, which were the critical barriers for the pathogens to pass through to develop in mosquitoes. In addition, PRS1 was found to be highly concentrated into vesicle-like structures in infected cells (8). Another three DM9CPs with high similarities have been identified in human liver fluke, and two of them from liver fluke Fasciola hepatica and Opisthorchis viverrini, respectively, were both highly distributed on the surface of the tegument, which was the outermost surface and major interface between host and environment (40,41). The other DM9CP from Fasciola gigantic was found to localized into cytoplasmic vesicle-like structures after the infection of bacteria, which was similar to that of PRS1 (42). In vertebrates, two toxins containing DM9 domains were found to be highly expressed in venomous fish T. nattereri and snake B. jararaca (10,11). Increasing evidences suggested that DM9CPs played important roles in the host immune responses. However, the biological function of DM9CPs in innate immunity still remains largely unknown. Recently, CgDM9CP-1 was isolated and characterized as a member of lectin family (C. gigas lectin 1) with the application potential as a research and clinical tool for probing glycans (22). Structurally, CgDM9CP-1 is only composed of two DM9 domains without any other domain, which provides an ideal protein model to study the biological function DM9 domain. The amino acid sequence analysis revealed that CgDM9CP-1 neither showed any homology to the known lectins nor contained any lectin domains or known carbohydrate binding motifs, but exhibited high similarity and identity to other DM9CP family members ( Figure 6B). More pronounced,  CgDM9CP-1 exhibited strong binding activities and broad binding spectrum toward different PAMPs as well as microbes, which was essential for pattern recognition of microbes during innate immune response. In addition, a number of immune receptors with carbohydrate binding activities are designated according its biological functions. For example, dendritic cell specific intracellular adhesion molecule-3 grabbing nonintegrin (DC-SIGN), which represented a mannose binding C-type lectin, was proved to be an important PRR toward various types of pathogens and involved in the modulation of immune responses of dendritic cells (43,44). Langerin, myeloid C-type lectin presented on the cell surface of Langerhans cells, was functioned as a PRR possessing broad microbial recognition spectrum with highly diverging avidity and selectivity (45,46). Fibrinogen-related protein 3 with high binding specificity to galactose was involved in the microbial phagocytosis in the snail Biomphalaria glabrata (47). Collectively, our findings strongly support that DM9CPs represent a novel type of PRR family, which sheds new light on the functional study of DM9CPs in the innate immunity. Mannose binding receptors have been reported to play important roles in the pattern recognition. For example, the macrophage mannose receptor could recognize mannan presented on fungal pathogens, and activate the Th17 signaling pathway during pathogen-specific host immune response (48). Dectin-2, a direct receptor for mannose-capped lipoarabinomannan, could induce pro-and anti-inflammatory cytokines production through Dectin-2-FcRγ signaling axis in the autoimmune encephalitis disease (49). In the present study, the glycan microarray with much higher throughput, which was complementary to that reported in Unno's study, showed that CgDM9CP-1 possessed high binding specificity toward mannosylated glycans. The association constant (Ka) of rCg-DM9CP-1 interaction with d-mannose was determined to be 8.16 ± 0.19 × 10 3 M −1 ( Table 2), while it was 2.00 ± 0.46 × 10 3 M −1 reported by Unno et al. (22). The difference of the Ka might due to the protein post-translational modification of native protein, for example, N-terminal of native CgDM9CP-1 was found to be modified by the acetyl group (22). Similar to the pattern recognition activities of known mannose receptors, rCgDM9CP-1 exhibited strong binding activity toward mannan. Moreover, rCgDM9CP-1 also displayed strong pattern recognition activities toward LPS, PGN, and β-1,3-glucan, indicating a more extensive microbial binding profile than that of previously reported mannose binding receptors. In C. gigas, CgDM9CP-1 was highly expressed in the tissues important for pathogen recognition and innate immune defense such as hepatopancreas, mantle and hemocytes (50), while not in hemolymph, suggesting that CgDM9CP-1 was not an opsonin. When the hemocytes encountered with microbes, CgDM9CP-1 was internalized from the cell membrane into cytoplasm accompany with the hemocyte phagocytosis, and colocalized with the engulfed microbes, indicating that CgDM9CP-1 was involved in the direct interaction with microbes and modulation of hemocyte phagocytosis. It has been reported that some cell surface bound PRRs can recognize its ligand and activate the intracellular signaling pathway to provoke potent immune responses against pathogens, while some other cell membrane PRRs mainly function as phagocytic receptor involved in the modulation of phagocytosis of microbial pathogens (51,52). Herein, CgDM9CP-1 seemed to act mainly as an immune receptor involved in the hemocyte phagocytosis in the innate immunity of C. gigas. Although CgDM9CP-1 lacked the transmembrane domain, the cell surface distribution was probably due to the interaction of CgDM9CP-1 with other membrane cofactors. For example, cell membrane protein CD14 without a transmembrane domain was found to locate on the cell membrane via the linkage to glycosylphosphatidylinositol (53). MD2 is an important immune receptor that proved to distribute on the cell membrane through the interaction with integral membrane protein TLR4 (54,55). The potential molecular mechanism of CgDM9CP-1 membrane localization remains to be investigated.
As a previously unidentified PRR, CgDM9CP-1 displayed its PAMP recognition activity with a distinctive structural basis. In the previous study, CgDM9CP-1 was computationally determined to be dimeric state using the PISA program (proteins, interfaces, structures, and assemblies). Herein, the molecular weight of rCg-DM9CP-1 was calculated to be approximately 17 kDa according to the main elution peak (about 17.5 ml elution volumes) from gel filtration chromatography, which is similar to the theoretical monomeric protein mass (about 16.5 kDa). Western blotting analysis revealed that there existed both monomeric and dimeric state of rCgDM9CP-1, and the abundance of monomeric form was much higher than that of dimeric form (data not shown). The results collectively indicated that rCgDM9CP-1 existed as both monomeric and dimeric form with different abundances. In order to reveal the molecular determinants underlying its pattern recognition activity, the crystal structures of rCgDM9CP-1, the corresponding mutants, and the complex with d-mannose ligand were solved using the single crystal native SAD phasing strategy. The crystal structure analysis revealed that rCgDM9CP-1 did not show any three dimensional homology to other known PRRs. In our present study, the ligand binding site was located on the boundaries between two DM9 domains, and the amino acid residues from two DM9 domains assembled together to form the ligand recognition motif. The side chains from Asp22 and Lys43 residues were found to be essential for the d-mannose and PAMP binding, and the mutation of these two amino acid residues significantly abolished the PAMP binding activity. Although the side chain of His52 was supposed to be involved in the stacking interactions between the hydrophobic portion of d-mannose to stabilize the carbohydrate binding activity, the mutation H52A did not exhibit significantly decrease of d-mannose or PAMP binding activity, indicating that His52 might not be an essential residues in the pattern recognition.
DM9 domain has been found to exist in various proteins from a number of species. The existence of DM9CPs in prokaryotic cells, such as Legionella pneumophila and Enterobacter cloacae, indicates that the DM9 domain is an ancient protein domain probably evolved from prokaryotes (56). Although DM9 domains are extensively found in vertebrates, such as bony fishes, reptiles and birds, it is quite unexpected that the domain is absent in mammals. It is also noteworthy that DM9CPs have not been identified in the Plantae Kingdom, suggesting that this protein domain probably has been lost during evolution. The uneven distribution of phylogenetic patterns of the DM9 domain is likely to reflect the natural selection during molecular evolution in innate immunity. Moreover, DM9 domains are usually found to exist as tandem arranged repeats in proteins, especially in different Drosophila species ( Figure S9 in Supplementary Material), which might contribute to the enhanced ligand binding activities. Interestingly, there are a number of DM9 domains which are fused with other protein domains. For example, the natterins reported in T. nattereri fish and viperid snake B. jararaca contain both DM9 domain and pore-forming ETX/MTX2 domain (10,57). In desert locust Schistocerca gregaria and the Mediterranean fruit fly Ceratitis capitata, DM9CPs were found to contain both DM9 domains and farnesoic acid O-methyl transferase domains (58,59). The fusion of DM9 domains with other domains suggested that these proteins probably exerted multiple biological functions involved in different biological pathways.

eThics sTaTeMenT
All animal-involving experiments of this study were approved by the Ethics Committee of Institute of Oceanology, Chinese Academy of Sciences.
aUThOr cOnTriBUTiOns SJ performed the native protein purification and glycan array analysis. SJ, ZJ, and XS performed cloning work, expression, and purification of recombinant proteins. MH, HZ, and LQ performed molecular interaction and microbial binding experiments. SJ, MH, JW, and GP crystallized the proteins. TW, EW, and GP solve the crystal structure. SJ and CL constructed the phylogenetic tree. SJ, LW, GP, and LS design research and wrote the manuscript.