Abstract
Galectins are a family of carbohydrate-binding proteins found in vertebrates in great abundance and diversity in terms of both structure and ligand-binding properties as well as physiological function. Proteins with clear relationships to vertebrate galectins are already found in primitive Bilateria. The increasing amount of accessible well-annotated bilaterian genomes has allowed us to reveal, through synteny analyses, a new hypothesis about the phylogenetic history of the galectin family in this animal group. Thus, we can trace the genomic localization of the putative ancestral Bilateria galectin back to the scallops as a still very primitive slow-evolving bilaterian lineage. Intriguingly, our analyses show that the primordial galectin of the Deuterostomata most likely exhibited galectin-8-like characteristics. This basal standing galectin is characterized by a tandem-repeat type with two carbohydrate recognition domains as well as by a sialic acid binding property of the N-terminal domain, which is typical for galectin-8. With the help of synteny, the amplification of this potential primordial galectin to the broad galectin cosmos of modern jawed vertebrates can be reconstructed. Therefore, it is possible to distinguish between the paralogs resulting from small-scale duplication and the ohnologues generated by whole-genome duplication. Our findings support a substantially new hypothesis about the origin of the various members of the galectin family in vertebrates. This allows us to reveal new theories on the kinship relationships of the galectins of Gnatostomata. In addition, we focus for the first time on the galectines of the Cyclostomata, which as a sister group of jawed vertebrates providing important insights into the evolutionary history of the entire subphylum. Our studies also highlight a previously neglected member of the galectin family, galectin-related protein 2. This protein appears to be a widespread ohnologue of the original tandem-repeat ancestor within Gnathostomata that has not been the focus of galectin research due to its nonclassical galactose binding sequence motif and the fact that it was lost during mammalian evolution.
1 Introduction
Galectins are small glycan-binding proteins that originally received their name because of their galactose-binding property. These lectins are widespread in the animal kingdom. The Porifera already possess galectin-like proteins (). However, the formation of a multifaceted galectin family is found only in Bilateria. These proteins share a complex secondary structure composed of β-sheets (named S1-S6 and F1-F5), which form the carbohydrate recognition domain (CDR) (, ). Only the S-strands and the loop regions connecting them and form together the glycan binding groove. The galectin cosmos of Gnathostomata is particularly complex. Based on structural features, galectins are assigned to three subgroups. The prototypical galectins possess only a single CRD. The same is true for the chimeric galectins. However, these proteins additionally have a longer N-terminal binding domain, which enables them to form pentamers. Tandem-repeat galectins, on the other hand, have two CRDs connected by a linker region that can vary in size. In addition, galectin-related proteins (GRPs) structurally correspond to prototypical galectins. However, minor amino acid changes in the specific binding motifs resulted in a loss of their galactose-binding capacity (). Based on the overall similarity of the CRD at the protein and gene structure level, it can be assumed that the GRPs can be readily assigned to the galectin family.
Each galectin or, better, each galectin CRD of Gnathostomata has its own carbohydrate-binding preference. For instance, the N-terminal CRD of galectin-8 preferentially binds sulfated and sialylated carbohydrates (). The C-terminal domain of this galectin, on the other hand, has a preference for histo-blood group antigen-like and poly-N-acetyllactosamine glycans (). In addition, many galectins have been shown to interact with a protein-binding partner. Prototypic galectins, for example, form homo or heterodimers with each other but can also form dimers with chemokines (). Another well-known example is the binding of galectin-8 with the autophagy receptor NDP52. This interaction is a crucial step in galectin-mediated selective autophagocytosis after vesicle damage and critical in defense against intracellular pathogens (, ). Galectins can serve diverse functions both intracellularly and extracellularly (). The role of galectins in the immune system is particularly multifaceted, ranging from pattern-recognition receptor function in pathogen recognition () to regulatory functions on specific immune cells (). Several Gnatostomata galectins (including galectin-1, -3, and -9) have important roles in the regulation of the adaptive immune response. In particular, their relevance in T-cell activation and B-cell differentiation is noteworthy (). Thus, a correlation between the expanding galectin family and the increased complexity of the immune system during vertebrate evolution might be speculated. The redundancy that occurs after whole genome duplication (WGD) usually results subsequently in massive gene loss (, ). The galectin ohnologues ubiquitously retained in the vertebrate genomes and diversified their function. This reflects their importance, probably in particular for the regulation of the immune system.
The various members of the galectin family have been well studied in mammals, especially in humans and mice, as the most prominent model animals. In addition, there are some studies on individual galectins, especially of Tetrapoda [e.g. chicken (, ) and Teleostei ()].
It has long been known that the nomenclature of galectins based on structural similarities does not provide any information about their evolutionary relationship (). Moreover, the previous hypothesis on the evolution of vertebrate galectins is largely based on the findings of Houzelstein et al. from 2004 (). It was assumed [reviewed in ()] that the last common ancestor of Protostomia and Deuterostomia possessed an ancestral galectin with a CRD that corresponds in exon structure to the N-terminal CRD (referred hereafter as CRD1) of modern Gnatostomata tandem-repeat galectins. In chordates, the C-terminal CRD (hereafter CRD2) with its typical exon structure, characterized by a shorter middle exon, is supposed to have evolved by tandem duplication. Thus, the protovertebrate ancestors owned a tandem-repeat galectin with two different CRDs (CRD1 and CRD2), the exon structure of which corresponds to that of modern tandem-repeat Gnatostomata galectins. In the course of the two rounds of WGD that occurred during Gnatostomata evolution, one tandem-repeat galectin became four. So that according to the current hypothesis the tandem-repeat galectins of the jawed vertebrates namely 4, 8, 9 and 12 represent ohnologues. In vertebrates small-scale gene duplication is supposed to have resulted in the formation of the mono-CRD galectins. Another hypothesis suggests that the mono-CRD galectins-1 and -2 are evolved from a prevertebrate mono-CRD galectin (). The increasing quantity and quality of genomic data now makes it possible to look deeper into the phylogeny and evolution of galectins. These findings may improve our understanding of functional relationships in this protein family. To correctly identify the major galectins of Gnathostomata, we first looked at the genes surrounding individual galectins to identify overlapping syntenies. To understand how this variety of Gnathostomata galectins evolved during the evolution of Bilateria and where the original galectin locus is located, we used microsynteny studies as well as recent results on the reconstruction of the proto-vertebrate genome. Analyses of gene or protein structure also provide insights into the phylogenetic history of galectins. Finally, based on all these analyses, we can generate a model that explains the diversification of the various galectins of modern jawed vertebrates. To our knowledge, these findings allow us to propose the first substantial new hypothesis on the evolution of vertebrate galectins since 2004. This might help future research to develop new theories about the functional relationships of galectins based on their evolutionary context.
2 Material and methods
2.1 Sequences and genomic organization
Sequence information regarding the specific galectins and their syntenic genes, their genomic organization as well as chromosomal localization, and exon−intron structure was obtained from the NCBI gene and genome data viewer databases (https://www.ncbi.nlm.nih.gov/gene/ and https://www.ncbi.nlm.nih.gov/genome/gdv/). To cover the diverse Gnatostomata taxa, we manually screened the NCBI databases for the genomes of at least two representatives of each of the Chondrichthyes, Holostei (Actinopterygii without 3rd WGD, only Lepisosteus oculatus analyzed), Teleostei, Coelacanthidae (only Latimeria chalumnae), Dipnomorpha (only Protopterus annectens), Amphibia, Lepidosauria, Testudines, Aves, Monotremata, Metatheria, and Eutheria. We considered species whose genomes have a good assembly level. Exceptions are species that represent key stages of evolution such as L. chalumnae or Callorhinchus milii. These were used for the analyses despite the relatively low level of annotation.
For the macrosynteny analysis, the publication of Nakatani et al. () was used. By manual screening of the galectin localizations in the human genome, their positions could be assigned to the chromosomal regions derived from the respective scallop chromosomes.
2.2 Sequence alignments
Protein sequence alignments were performed using the Tcoffee algorithm implemented in Jalview version 2.11.2.5 ().
2.3 Sequence-based phylogenetic analyses
Phylogenetic analyses were performed based on sequences of 222 galectin proteins (Supplementary Table 1). Amino acid sequences of CRD1 and CRD2 were aligned with Tcoffee algorithm. Maximum Likelihood phylogenetic tree with 1000 bootstrap replicates were generated using MEGA X () and edited by the iTOL tool (interactive tree of life) ().
2.4 Protein structure analysis and molecular modeling
The structures of LGALS8-like proteins from Petromyzon marinus and Lytechinus variegatus were predicted using the intensive mode of the Phyre2 tool (). The structure of human LGALS8 was obtained from the AlphaFold database (AF-Q96DT0-F1) ().
Computation of the electrostatic potential using Adaptive Poisson Boltzmann Solver (APBS) and mapping to the molecular surface were performed using Phyton Molecule Viewer version 1.5.7 (https://ccsb.scripps.edu/mgltools/) (–).
3 Results and discussion
3.1 Genomic organization of Gnathostomata galectins
The increasing number of sequenced and well-annotated genomes allowed us to study the genomic distribution of galectins within Gnathostomata. Syntenic analyses helped to address identities of the galectin homologs up to the Chondrichthyes. We have restricted our analysis to the galectin-encoding genes (LGALS) that are present in a majority of the Gnathostomata classes, thus forming the backbone of the galectin family. This excludes, for example, LGALS10, which is found only in primates, or the large number of galectins similar to placenta-galectin, which are found almost exclusively in the Simiiformes. These are located in close proximity to LGALS4, just like LGALS7, which is present in many Amniota. Table 1 shows the genes colocalized with the respective galectin in close proximity. A well-conserved synteny of galectins exists throughout jawed vertebrates. One exception is LGALS8, which exists in a different genomic context in Actinopterygii compared with Chondrichthyes and Sarcopterygii (). Furthermore, in Chondrichthyes, we find only bloc1s3 in close proximity to galectin-4. Ryr1, actn4, ech1, and hnrnpl are located on the same chromosome, but they are separated by several million nucleotides. Moreover, LGALS12 is only found in Dipnotetrapodomorpha. The GRP LGALSL2, also called LGALSLA, exists in all Gnathostomata, but it has been lost in Mammalia.
Table 1
| LGALS1 LGALS2 | LGALS1B | LGALS3 | LGALS4 | LGALS8 | LGALS8 | LGALS9 | LGALS12 | LGALSL | LGALSL2 | GRIFIN |
|---|---|---|---|---|---|---|---|---|---|---|
| LGALS7, 13, 16, 14, 10* | Chondrichthyes & Sarcopterygii | Actinopterygii | ||||||||
| elfn2 | shroom3 | socs4 | bloc1s3 | ryr2 | tmem242 | nf1 | hnrnpul2 | ugp2 | vti1b | chst12 |
| gga1 | septin11 | fbxo34 | hnrnpl | mtr | rrm2 | wsb1 | chrm1 | vps54 | zfyve26 | lnfg |
| tmem184b | sqwahb | atg14 | ech1 | actn2 | klf11 | ksr1 | naa40 | peli1 | plek2 | eif3b |
| cacna1i | ccni | ktn1 | actn4 | heatr1 | cebpz** | nos2 | frmd8 | brat1 | ||
| cacng2 | peli2 | ryr1 | cebpz** | lyrm9 | pc | |||||
| nlk |
Syntenic genes of Gnathostomata galectins.
* tandem dublication of LGALS4 N-terminal CRD.
** in the wider vicinity.
LGALS1B, which is structurally closely related to LGALS1, is unique to Sauropsida (reptiles and birds). In these animals, it is located on a different chromosome than LGALS1. In Tetrapoda, LGALS1 and LGALS2 are in close genomic proximity to each other. Although galectins are present in the genomic context of LGALS1 or LGALS2 in all gnathostomes, only Telostomi has a galectin, with an amino acid sequence typical of LGALS1 in the glycan-binding groove that spans β-strands S4 and S5 and loop L4 between them (). A galectin similar in this important glycan-binding region to LGALS2 of tetrapods is already found in west African lungfish, but on Chr 1 part0 (NC_056725.1), the LGALS1-like LOC122794646 and the LGALS2-like LOC122801768 are approximately 700 mio nucleotides apart. In most of the Gnastostomata, LGALSL with LGALS8 and LGALSL2 with LGALS3 are located on one chromosome, although not in close proximity. In Mammalia, LGALS3 and LGALSL are found on different chromosomes, while LGALSL2 appears to be lost.
3.2 Origin of vertebrate galectins since the emergence of bilateria
Scallops are genomically well annotated and exhibit relatively slow genome evolution, making them good model organisms for studying the evolution of bilaterians (). For this reason, we examined the genomic distribution and structure of galectins from the bivalve Pecten maximus to gain insights into the origin of vertebrate galectins. Six galectin genes are found in the genome of P. maximus, three on chromosome 3, two on chromosome 9, and one on chromosome 15 (Figure 1). Two neighboring galectins on chromosome 3, galectin-4-like (LOC117323441) and LOC117323440, have a typical bivalve galectin structure of four tandem-repeat CRDs. This particular type of galectin has previously been studied in the oyster Crassostrea virginica, which also has two of these proteins, CvGal1 and CvGal2 (). Synteny, gene structure and amino acid sequence show a close relationship of these galectins between oyster and scallop. In the oyster, CvGal1 and 2 are produced by hemocytes, the phagocytes of the bivalve (). Galectins bind exogenous glycans, for example, from pathogens, as well as endogenous glycans on the surface of hemocytes via their different CRDs (). Thus, they can function as phagocytosis-mediating opsonins.
Figure 1
If we go evolutionary one step further toward vertebrates and consider Echinodermata as primeval deuterostomes, we found only one galectin locus on one chromosome. The sea urchin Lytechinus variegatus is a good model organism to assess genomic galectin structure, as it has now been completely annotated at the chromosomal level. L. variegatus has a classic tandem-repeat galectin with two CRDs, galectin-8-like (LOC121418813). The homologous galectin-8 (LOC576472) of the purple sea urchin Strongylocentrotus purpuratus could be detected in high amounts in coelomic fluid () and is expressed in coelomocytes, the phagocytes of Echinodermata (). This suggests that in sea urchins, a major function of galectins is also pathogen defense. The sea cucumber Apostichopus japonicus has two galectins arranged in tandem in the genome. Again, these are tandem-repeat galectins with two (QCW05467, galectin-8) or three CRDs [PIK56913, AjGal1 ()]. AjGal1 is mainly expressed in coelomocytes, binds microorganisms and has antimicrobial activity ().
The tunicates, as the primitive chordates and closest living relatives of vertebrates, are not a good model to study the phylogenetic evolution of galectins because they are among the fastest evolving metazoans and have poor synteny conservation (). However, Ciona intestinalis has three tandem-repeat galectins with two CRDs each on chromosomes 4 and 6. The galectin-6 and galectin-6-like genes are arranged in tandem on chromosome 6 and show very high sequence identity. Galectin-9 (CiLgals-a) and galectin-6 (CiLgals-b) are expressed in hemocytes of the pharynx of C. intestinalis. This expression is induced by inflammatory processes ().
To date, nothing is known about the galectins of Cyclostomata on a functional level. In the genome of the sea lamprey Petromyzon marinus, seven galectins or galectin-related genes are found, one each on chromosomes 17, 29, and 31 and two each on chromosomes 33 and 53. Only two, galectin-8 and galectin-8-like (LOC116948446), are classical tandem-repeat galectins. In addition, a galectin-3-like (LOC116953901) and two galectin-related genes (LOC116943127 and LOC116953921) were detected with only one CRD. The uncharacterized protein LOC116947959 on chromosome 31 has four potential CRDs.
3.3 Exon−intron structure of bilaterian galectins
For the classification of bilaterian galectins, it is instructive to look at exon−intron organization. In Gnathostomata, it is known that each individual CRD is encoded by three exons. In tandem-repeat-type galectins, the large middle exon of the first CRD comprises the codons encoding the amino acids of β-strands S3-F4 (Figure 1A). In the second CRD, the middle exon is smaller and encodes S3-F3 (). Most Gnathostomata galectins that have only one CDR correspond in their exon−intron structure to the second CRD of tandem-repeat galectins. This is the case for LGALS1, LGALS1B, LGALS2, LGALS3, Grifin, and the galectin-related genes LGALSL and LGALSL2. Only the mono-CRD galectins located in the neighborhood of LGALS4, namely, LGALS7, LGALS10, LGALS13, LGALS14, and LGALS16, correspond from the exon−intron structure to the first CRD of the tandem repeat galectins (Figure 1B).
In Scallop, all galectins have an exon−intron structure with a large S3-F4 coding exon (Figure 1B). A short S3-F3 coding exon does not occur. Exceptions are galectins with four tandem-repeat CRDs. Among these, the first CRD is encoded by only two exons. A small exon comprises the S1 and F2 β-strands, and a larger exon comprises all the others, namely, S3-F1.
Typical for deuterostomes is the appearance of the exon−intron structure characteristic of Gnathostomata galectins with a longer S3-F4 exon in the N-terminal and a shorter S3-F3 exon in the C-terminal CRD. This structure is found in tandem-repeat galectins of echinoderms, Tunicata and Cyclostomata (Figure 1B). Additionally, LOC116947959 of P. marinus encoding four CRDs has both S3-F4 (CRD1 and 3) and S3-F3 exons (CRD2 and 4). Exceptions are LGALS6 and LGALS6-like of C. intestinalis. Both tandem CRDs of these galectin genes have only the longer S3-F4 exon. In Cyclostomata, for the first time, in addition to the tandem-repeat type, galectins or galectin-related genes with a mono-CRD emerge. All three mono-CRD galectins of P. marinus have the short S3-F3 exon.
It should be mentioned that the evolutionarily more ancient Porifera, which form a sister group to Eumetazoa, also possess galectins. Exemplary is the genomically sequenced and annotated sponge Amphimedon queenslandica, which has been used as a model organism to study the evolution of metazoa (). A. queenslandica has three galectin genes, LOC105313191, LOC109582911, and LOC105315566. All three galectins have only one CRD and are encoded by only one exon. Interestingly, all three have a signal peptide sequence (Sec/SPI) (Supplementary Table 2). It is typical for deuterostomes and for bivalve galectins that they do not have such a classical secretion signal. If bilateria galectins are secreted, they are transported via an unconventional pathway ().
3.4 Amino acid sequence-based phylogenetic analysis
To estimate the ancestral relationship between the scallop, Echinodermata, Tunicata and Cyclostomata galectins and members of the Gnathosotomata galectin family, we start with a traditional sequence alignment-based phylogenetic analysis. For this, the amino acid sequences of CRD1 and CRD2 were separately subjected to maximum likelihood analysis (Figure 2). As described in earlier studies (), these analyses clearly show clustering of the individual Gnathostomata galectin groups. However, for some galectins this is only weakly supported by bootstrap (BT) analyses. This is particularly evident for galectins-3, -4 and -9, while galectin-8 and grifin, for example, show strong sequence similarity from cartilaginous fish to mammals. Looking at the individual alignments of the CRDs of galectin-3, -4 and -9, a distinct deviance in the flexible loop regions connecting the β-sheets S3 and S4 as well as S4 and S5 is particularly evident here. It is known that the amino acid residues in these loops are important for the oligosaccharide binding specificity of galectins (). Therefore, it could be speculated that binding specificities of galectins-3, -4, and -9 are different in the Gnathostomata classes. In contrast, the high sequence homology in galectin-8 let suggest that its oligosaccharide binding specificity is highly conserved across gnathostomes.
Figure 2
With respect to Cyclostomata galectins, it appears that both CRD1 and CRD2 of galectin-8-like from P. marinus cluster to that of galectin-4 of cartilaginous fish, albeit with low BT support of 60% and 39%, respectively. In contrast, the CRD1 of lamprey galectin-8 clusters to galectin-4 of bony vertebrates, as do the two CRD1s of LOC116947959. Again, this clustering is only very weakly supported by the BT test. The CRD of galectin-3-like on chromosome 53 of P. marinus clusters into the galectin-3 group of Gnathostomata. The most remarkable result of this phylogenetic analysis is that both galectin-related protein-like proteins of the sea lamprey cluster with a high BT support of 99% to the galectin-related protein and galectin-related protein 2 of the jawed vertebrates (Supplementary Figure 1). This could be an indication that these previously understudied members of the galectin family evolved very early in the evolution of vertebrates, even before the gnathostome-cyclostome split.
However, the sequence-based phylogenetic analysis does not allow valid statements concerning a possible evolutionary ancestry of the vertebrate galectines from the more basal deuterostomes and scallops. The CRD1 of Echinodermata, C. intestinalis and the scallop P. maximus outgroup to the vertebrate galectins without a clear ancestral relationship among them. Furthermore, the CRD2 of the galectin-8 proteins of the Echinodermata cluster only very weakly to the galectin-8 proteins of the Gnathostomata.
Due to the well-known limitations of sequence-based analyses (), we characterize the path of galectin genes through evolution by considering synteny.
3.5 Syntenic analyses
To investigate the origin of galectins in higher vertebrates, we looked at the chromosomal distribution of genes conserved in Gnathostomata colocalized with each galectin in the more primitive bilaterian species. Interestingly, almost all of the genes colocalized with the galectins LGALS3, LGALS8, LGALS12, LGALSL, and LGALSL2 in Gnathostomata were found on chromosome 3 in P. maximus (Supplementary Figure 2A). Three galectin genes are located on this chromosome in the scallop. The Gnathostomata genes colocalized with Griffin and LGALS1 and LGALS2 are on chromosome 1 in P. maximus, and those of LGALS9 are on chromosome 8. There is no galectin gene on any of these P. maximus chromosomes. The only galectin of the echinoderm L. variegatus is located on chromosome 7, and all galectin synteny genes of P. maximus chromosome 3 are also found on this chromosome (Supplementary Figure 2B). This suggests that this L. variegatus chromosome 7 has its origin in chromosome 3 of P. maximus and that the galectin-8 of the sea urchin can be traced back to one of the three bivalve galectins located there. Both the scallop and the sea urchin have 19 chromosomes. Apparently, the other P. maximus galectins on chromosomes 9 and 15 were not inherited to the deuterosomes. Synteny analyses in the ancestral chordates, the tunicates, are unfortunately not useful for our analyses because these organisms underwent extensive genomic rearrangements compared to the other chordate subphyla.
In vertebrates, the analysis of galectine genes in the genome becomes more complex due to ancient polyploidization events. During the evolution from invertebrate chordates to gnathostomes, two rounds of WGDs occurred, which tremendously increased the complexity of the genomes. It is suggested that the gnathostome-cyclostome split occurred most likely soon after the 1st WGD, followed by cyclostome-specific genome triplication (). Interestingly, the genomes of modern lampreys seem to indicate remarkably low rates of interchromosomal rearrangement following hexaploidization (). The sea lamprey P. marinus has a total of seven galectins on five chromosomes (Supplementary Figure 2C). Since the colocalized genes known from the scallop and sea urchin are also found on these chromosomes, it can be concluded that these genomic regions can be referred back to chromosomes 3 of P. maximus and 7 of L. variegatus, respectively (Figure 3). This suggests that the lamprey galectins represent paralogs of a protovertebrate galectin that is likely related to the Echinodermata galectin. The paralogs probably arose from both polyploidy events and small-scale duplication ().
Figure 3
Among the gnathostomes, there are nine galectins or galectin-related proteins widely distributed across the different classes, which we have examined in more detail. These are the tandem-repeat galectins LGALS4, 8, 9 and 12, the prototypical galectins LGALS1/2 and Grifin, LGALS3 as the only representative of the chimera and the galectin-related proteins LGALSL and LGALSL2 (also referred to as LGALSLA). We further consider LGALS1 and LGALS2 as one locus, since they emerged in tandem late from Dipnotetrapodomorpha onward. In addition, a number of other galectins have arisen in the individual Gnathostomata classes by small-scale duplication. One example is LGALS7, which arose in the Amniota probably by tandem duplication of the N-terminal CRD of LGALS4.
Since Gnatosthomta have undergone two tetraploidization events during their evolution, theoretically four ohnologues should have arisen from the single ancestral galectin locus. In addition, after each of the two genome duplications, extensive chromosome rearrangements (fusions, fissions and translocations) have occurred, which complicates the identification of the ohnologues or can also lead to the loss of ohnologues. Nakatani et al. (
3.6 The sialic acid binding domain of Galectin-8 is typical for Deuterostomia
In terms of its amino acid sequence, tandem-repeat galectin-8 is one of the most conserved galectins within gnathostomes. It is essential for the vertebrate lineage and is found in all classes of jawed vertebrates. LGALS8 plays a critical role in intracellular pathogen defense as well as autophagy (
Figure 4

Comparison of the Primary and Tertiary Structure of Human Galectin-8 with Selected Deuterostomata and Scallop Galectins. (A) Sequence alignment of CRDs of galectin-8 and 8-like proteins from H sapiens, C milli, P. marinus and L. variegatus as well as the P. maximus galectins localized on chromosome 3. The intensity of the blue shades with which the amino acids are labeled illustrates the degree of conservation. Amino acids relevant for the binding of sulfated and sialylated oligosaccharides and for the binding of extended carbohydrates are marked in green and yellow, respectively. The arginine of the long S3-S4 loop, which is critical for the sialic acid interaction, is highlighted in red. The positions of β-strands S2-S6 involved in carbohydrate ligand binding are indicated as black bars above the alignment. (B) 3D models of the CRDs of human galectin-8 and the galectin-8-like proteins of P. marinus and L. variegatus. The electrostatic potential of the protein surface is indicated by red (negative) and blue (positive) colors. Regions corresponding to the known subsites of the ligand binding grooves of human galectin-8 are indicated by dashed circles. The positions of amino acids relevant for binding are indicated by arrows. Subsite and amino acids relevant for lactose recognition marked in dark blue, for sialylated and sulfated oligosaccharides in green (red, critical arginine for strong affinity to sialic acid of human galectin-8 CRD1), and those for extended oligosaccharides in yellow.
Sialic acids are found prominently expressed in deuterostomes and almost absent in the protostome lineage (
Considering the predicted 3D structure of the CRDs of the galectin-8 proteins from humans, P. marinus and L. variegatus, a positively charged carbohydrate binding groove can be identified in the N-terminal CRDs (Figure 4B). Additionally, clearly visible are three subsite pockets, which mediate the binding of lactose (blue site), sialylated and sulfated oligosaccharides (green site) and extended carbohydrates of longer oligosaccharides (yellow site) in human galectin-8 (
3.7 Hypothetical model of the evolution of gnathostomata galectin cosmos
Based on the assumption that the proto-invertebrate ancestor, similar to Echinodermata, possessed only one galectin locus in the genome, we have attempted to elicit the evolutionary complexity of the galectin family of modern Gnathostomata via synteny comparisons (Figure 5). The tandem-repeat galectin typical of Deuterostomia, with an N-terminal CRD characterized by a large S3-F4 exon and a C-terminal CRD possessing a smaller S3-F3 exon, should also have been present in the proto-vertebrate ancestor. During the 1st round of WGD, this tandem-repeat galectin was duplicated together with the whole chromosome. Thereafter, the proto-cyclosostomes split off. Chromosomal rearrangements and fusions subsequently occurred in the proto-gnastostomata lineage (
Figure 5

Schematic Representation of the Hypothesis of Gnathostomata Galectin Genesis from the Protovertebrate Ancestor. The color of the arrows indicates the homology of chromosomal regions with P. maximus chromosomes 3 (red), 1 (green) and 8 (blue). Syntenic genes that, based on their ohnology and position, allow predictions about the origin of the respective galectins during the two rounds of WGD as well as their evolutionary relationship are presented in boxes below. Created with BioRender.com.
LGALS9 is the last of the nine major galectins of Gnathosomata. This galectin is located in a chromosomal region corresponding to chromosome 8 of P. maximus and differs from all other nine galectins with respect to its localization. It is very likely that it arose by duplication and subsequent translocation of one of the tandem-repeat galectins after the completed 2nd WGD. Based on the sequence similarity, it can be assumed that LGALS8 represents the donor gene. In most previous studies, LGALS9, based on its structure, is classified as one of the four ohnologues of a proto-vertebrate tandem-repeat galectin, together with LGALS4, 8, and 12. Based on the more recent macrosynteny data, we conclude that this is not the case.
4 Conclusion
Galectins have evolved within Bilateria from proteins originally functioning as opsonins to a large family of regulatory factors of diverse physiological processes. In Molluska and in the invertebrate Deuterostomata, the main function of galectins seems to be the initiation of phagocytosis of pathogens. For this mechanism of innate immune defense, galectins bind both pathogens and primitive immune cells. It can be assumed that in vertebrates, together with the formation of a more complex immune system and the development of the adaptive immune response with very different cell types, the immunoregulatory roles of galectins have become more important. We were able to infer how this vertebrate galectin cosmos fanned out using various approaches, particularly synteny analyses. Notably, we also included previously neglected galectins, such as the galectin-related protein LGALSL2. This galectin deserves further analysis in the future, including functional analysis, as it may have broader relevance in all other jawed vertebrates except Mammalia. Furthermore, we hope that the results of our study will direct the attention of future research not only to the classical galectins, but also to the galectin-related proteins, which have so far been rather neglected. These proteins seem to have evolved early in vertebrate evolution and therefore probably belong to the basic set of vertebrate galectins. However, little is known about their physiological function. Moreover, a closer look at the galectins of Cyclostomata as a sister group of gnathostomes could provide interesting insights into the function of the immune system. Cyclostomata, similar to Gnathostomata, have convergently evolved an adaptive immune system (
Statements
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.
Author contributions
JG and SG conceived the study. JG performed the analyses and wrote the original draft, JG and SG reviewed and edited the manuscript. All authors contributed to the article and approved the submitted version.
Funding
The publication of this article was funded by the Open Access Fund of the Research Institute for Farm Animal Biology (FBN).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2023.1147356/full#supplementary-material
Supplementary Figure 1Multiple sequence alignment of the CRD of the galectin-related protein and galectin-related protein 2 of Gnatostomata and the two galectin-related protein-like proteins of P. marinus. Amino acids are colored according to the Clustral color scheme.
Supplementary Figure 2Schematic Overview of Chromosomal Localization of Galectins and Their Syntenic Genes of Scallop, Green Sea Urchin and Sea Lamprey. (A) P. maximus chromosome 3 (NC_047017.1), (B) L. variegatus chromosome 7 (NC_054746.1), and (C) P. marinus chromosomes 17 (NC_046085.1), 29 (NC_046097.1), 31 (NC_046099.1), 33 (NC_046101.1), and 53 (NC_046121.1). Red, galectin genes.
Supplementary Table 1List of galectins used in the phylogenetic analyses.
Supplementary Table 2Signal peptide prediction of galectins using SignalP-5.0 (Eukarya) (https://services.healthtech.dtu.dk/service.php?SignalP-5.0) (
References
1
GardèresJBourguet-KondrackiM-LHamerBBatelRSchröderHCMüllerWEG. Porifera Lectins: diversity, physiological roles and biotechnological potential. Mar Drugs (2015) 13(8):5059–101. doi: 10.3390/md13085059
2
SanjurjoLBroekhuizenECKoenenRRThijssenVLJL. Galectokines: the promiscuous relationship between galectins and cytokines. Biomolecules (2022) 12(9). doi: 10.3390/biom12091286
3
Di LellaSSundbladVCerlianiJPGuardiaCMEstrinDAVastaGRet al. When galectins recognize glycans: from biochemistry to physiology and back again. Biochemistry (2011) 50(37):7842–57. doi: 10.1021/bi201121m
4
ModenuttiCPCapurroJIBDi LellaSMartíMA. The structural biology of galectin-ligand recognition: current advances in modeling tools, protein engineering, and inhibitor design. Front Chem (2019) 7:823. doi: 10.3389/fchem.2019.00823
5
IdeoHMatsuzakaTNonakaTSekoAYamashitaK. Galectin-8-N-domain recognition mechanism for sialylated and sulfated glycans. J Biol Chem (2011) 286(13):11346–55. doi: 10.1074/jbc.M110.195925
6
StowellSRArthurCMSlaninaKAHortonJRSmithDFCummingsRD. Dimeric galectin-8 induces phosphatidylserine exposure in leukocytes through polylactosamine recognition by the c-terminal domain. J Biol Chem (2008) 283(29):20547–59. doi: 10.1074/jbc.M802495200
7
HundelshausenPvWichapongKGabiusH-JMayoKH. The marriage of chemokines and galectins as functional heterodimers. Cell Mol Life Sci (2021) 78(24):8073–95. doi: 10.1007/s00018-021-04010-6
8
LiSWandelMPLiFLiuZHeCWuJet al. Sterical hindrance promotes selectivity of the autophagy cargo receptor NDP52 for the danger receptor galectin-8 in antibacterial autophagy. Sci Signal (2013) 6(261):ra9. doi: 10.1126/scisignal.2003730
9
KimB-WHongSBKimJHKwonDHSongHK. Structural basis for recognition of autophagic receptor NDP52 by the sugar receptor galectin-8. Nat Commun (2013) 4:1613. doi: 10.1038/ncomms2606
10
ArthurCMBaruffiMDCummingsRDStowellSR. Evolving mechanistic insights into galectin functions. Methods Mol Biol (2015) 1207:1–35. doi: 10.1007/978-1-4939-1396-1_1
11
VastaGR. Galectins as pattern recognition receptors: structure, function, and evolution. Adv Exp Med Biol (2012) 946:21–36. doi: 10.1007/978-1-4614-0106-3_2
12
LiuF-TStowellSR. The role of galectins in immunity and infection. Nat Rev Immunol (2023), 1–16. doi: 10.1038/s41577-022-00829-7
13
ZhangJ. Evolution by gene duplication: an update. Trends Ecol Evol (2003) 18(6):292–8. doi: 10.1016/S0169-5347(03)00033-8
14
KuzminEVanderSluisBNguyen BaANWangWKochENUsajMet al. Exploring whole-genome duplicate gene retention with complex genetic interaction analysis. Science (2020) 368(6498). doi: 10.1126/science.aaz5667
15
RapoportEMMatveevaVKKaltnerHAndréSVokhmyaninaOAPazyninaGVet al. Comparative lectinology: delineating glycan-specificity profiles of the chicken galectins using neoglycoconjugates in a cell assay. Glycobiology (2015) 25(7):726–34. doi: 10.1093/glycob/cwv012
16
García CaballeroGFlores-IbarraAMichalakMKhasbiullinaNBovinNVAndréSet al. Galectin-related protein: an integral member of the network of chicken galectins 1. from strong sequence conservation of the gene confined to vertebrates to biochemical characteristics of the chicken protein and its crystal structure. Biochim Biophys Acta (2016) 1860(10):2285–97. doi: 10.1016/j.bbagen.2016.06.001
17
VastaGRAhmedHDuS-JHenriksonD. Galectins in teleost fish: zebrafish (Danio rerio) as a model species to address their biological roles in development and innate immunity. Glycoconj J (2004) 21(8-9):503–21. doi: 10.1007/s10719-004-5541-7
18
HouzelsteinDGonçalvesIRFaddenAJSidhuSSCooperDNWDrickamerKet al. Phylogenetic analysis of the vertebrate galectin family. Mol Biol Evol (2004) 21(7):1177–87. doi: 10.1093/molbev/msh082
19
VerkerkeHDias-BaruffiMCummingsRDArthurCMStowellSR. Galectins: an ancient family of carbohydrate binding proteins with modern functions. Methods Mol Biol (2022) 2442:1–40. doi: 10.1007/978-1-0716-2055-7_1
20
JohannesLJacobRLefflerH. Galectins at a glance. J Cell Sci (2018) 131(9). doi: 10.1242/jcs.208884
21
NakataniYShingatePRaviVPillaiNEPrasadAMcLysaghtAet al. Reconstruction of proto-vertebrate, proto-cyclostome and proto-gnathostome genomes provides new insights into early vertebrate evolution. Nat Commun (2021) 12(1):4489. doi: 10.1038/s41467-021-24573-z
22
WaterhouseAMProcterJBMartinDMAClampMBartonGJ. Jalview version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics (2009) 25(9):1189–91. doi: 10.1093/bioinformatics/btp033
23
KumarSStecherGLiMKnyazCTamuraK. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol (2018) 35(6):1547–9. doi: 10.1093/molbev/msy096
24
LetunicIBorkP. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res (2021) 49(W1):W293–6. doi: 10.1093/nar/gkab301
25
KelleyLAMezulisSYatesCMWassMNSternbergMJE. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc (2015) 10(6):845–58. doi: 10.1038/nprot.2015.053
26
VaradiMAnyangoSDeshpandeMNairSNatassiaCYordanovaGet al. AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res (2022) 50(D1):D439–44. doi: 10.1093/nar/gkab1061
27
BakerNASeptDJosephSHolstMJMcCammonJA. Electrostatics of nanosystems: application to microtubules and the ribosome. Proc Natl Acad Sci U.S.A. (2001) 98(18):10037–41. doi: 10.1073/pnas.181342398
28
SannerMF. Python: A programming language for software integration and development. J Mol Graph Model (1999) 17(1):57–61.
29
SannerMFOlsonAJSpehnerJC. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers (1996) 38(3):305–20. doi: 10.1002/(SICI)1097-0282(199603)38:3<305::AID-BIP4>3.0.CO;2-Y
30
BhatRChakrabortyMGlimmTStewartTANewmanSA. Deep phylogenomics of a tandem-repeat galectin regulating appendicular skeletal pattern formation. BMC Evol Biol (2016) 16(1):162. doi: 10.1186/s12862-016-0729-6
31
HsiehT-JLinH-YTuZHuangB-SWuS-CLinC-H. Structural basis underlying the binding preference of human galectins-1, -3 and -7 for Galβ1-3/4GlcNAc. PloS One (2015) 10(5):e0125946. doi: 10.1371/journal.pone.0125946
32
WangSZhangJJiaoWLiJXunXSunYet al. Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat Ecol Evol (2017) 1(5):120. doi: 10.1038/s41559-017-0120
33
VastaGRFengCTasumiSAbernathyKBianchetMAWilsonIBHet al. Biochemical characterization of oyster and clam galectins: selective recognition of carbohydrate ligands on host hemocytes and perkinsus parasites. Front Chem (2020) 8:98. doi: 10.3389/fchem.2020.00098
34
de La BallinaNRMarescaFCaoAVillalbaA. Bivalve haemocyte subpopulations: a review. Front Immunol (2022) 13:826255. doi: 10.3389/fimmu.2022.826255
35
DheillyNMRaftosDAHaynesPASmithLCNairSV. Shotgun proteomics of coelomic fluid from the purple sea urchin, strongylocentrotus purpuratus. Dev Comp Immunol (2013) 40(1):35–50. doi: 10.1016/j.dci.2013.01.007
36
NairSVDel ValleHGrossPSTerwilligerDPSmithLC. Macroarray analysis of coelomocyte gene expression in response to LPS in the sea urchin. identification of unexpected immune diversity in an invertebrate. Physiol Genomics (2005) 22(1):33–47. doi: 10.1152/physiolgenomics.00052.2005
37
ZhangCXueZYuZWangHLiuYLiHet al. A tandem-repeat galectin-1 from apostichopus japonicus with broad PAMP recognition pattern and antibacterial activity. Fish Shellfish Immunol (2020) 99:167–75. doi: 10.1016/j.fsi.2020.02.011
38
BernáLAlvarez-ValinF. Evolutionary genomics of fast evolving tunicates. Genome Biol Evol (2014) 6(7):1724–38. doi: 10.1093/gbe/evu122
39
VizziniAParrinelloDSanfratelloMASalernoGCammarataMParrinelloN. Inducible galectins are expressed in the inflamed pharynx of the ascidian ciona intestinalis. Fish Shellfish Immunol (2012) 32(1):101–9. doi: 10.1016/j.fsi.2011.10.028
40
SrivastavaMSimakovOChapmanJFaheyBGauthierMEAMitrosTet al. The amphimedon queenslandica genome and the evolution of animal complexity. Nature (2010) 466(7307):720–6. doi: 10.1038/nature09201
41
PopaSJStewartSEMoreauK. Unconventional secretion of annexins and galectins. Semin Cell Dev Biol (2018) 83:42–50. doi: 10.1016/j.semcdb.2018.02.022
42
Bum-ErdeneKLefflerHNilssonUJBlanchardH. Structural characterisation of human galectin-4 n-terminal carbohydrate recognition domain in complex with glycerol, lactose, 3’-sulfo-lactose, and 2’-fucosyllactose. Sci Rep (2016) 6:20289. doi: 10.1038/srep20289
43
SomA. Causes, consequences and solutions of phylogenetic incongruence. Brief Bioinform (2015) 16(3):536–48. doi: 10.1093/bib/bbu015
44
KuzminETaylorJSBooneC. Retention of duplicated genes in evolution. Trends Genet (2022) 38(1):59–72. doi: 10.1016/j.tig.2021.06.016
45
NielsenMIStegmayrJGrantOCYangZNilssonUJBoosIet al. Galectin binding to cells and glycoproteins with genetically modified glycosylation reveals galectin-glycan specificities in a natural context. J Biol Chem (2018) 293(52):20249–62. doi: 10.1074/jbc.RA118.004636
46
CagnoniAJTroncosoMFRabinovichGAMariñoKVElolaMT. Full-length galectin-8 and separate carbohydrate recognition domains: the whole is greater than the sum of its parts? Biochem Soc Trans (2020) 48(3):1255–68. doi: 10.1042/BST20200311
47
VarkiA. Essentials of glycobiology. 4th edition. Cold Spring Harbor (NY: Cold Spring Harbor Laboratory Press (2022). Available at: https://www.ncbi.nlm.nih.gov/books/NBK579918/.
48
Harduin-LepersA. The vertebrate sialylation machinery: structure-function and molecular evolution of GT-29 sialyltransferases. Glycoconj J (2023). doi: 10.1007/s10719-023-10123-w
49
TeppaREPetitDPlechakovaOCogezVHarduin-LepersA. Phylogenetic-derived insights into the evolution of sialylation in eukaryotes: comprehensive analysis of vertebrate β-galactoside α2,3/6-sialyltransferases (ST3Gal and ST6Gal). Int J Mol Sci (2016) 17(8). doi: 10.3390/ijms17081286
50
BornhöfftKFGoldammerTReblAGaluskaSP. Siglecs: a journey through the evolution of sialic acid-binding immunoglobulin-type lectins. Dev Comp Immunol (2018) 86:219–31. doi: 10.1016/j.dci.2018.05.008
51
CaballeroGGManningJCLudwigA-KRuizFMRomeroAKaltnerHet al. Members of the galectin network with deviations from the canonical sequence signature. 1. galectin-related inter-fiber protein (GRIFIN). Trends Glycosci Glycotechnol (2018) 30(172):SE1–9. doi: 10.4052/tigg.1726.1SE
52
KimJLeeCKoBJYooDAWonSPhillippyAMet al. False gene and chromosome losses in genome assemblies caused by GC content variation and repeats. Genome Biol (2022) 23(1):204. doi: 10.1186/s13059-022-02765-0
53
HerrmannTKarunakaranMMFichtnerAS. A glance over the fence: using phylogeny and species comparison for a better understanding of antigen recognition by human γδ T-cells. Immunol Rev (2020) 298(1):218–36. doi: 10.1111/imr.12919
54
Almagro ArmenterosJJTsirigosKDSønderbyCKPetersenTNWintherOBrunakSet al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol (2019) 37(4):420–3. doi: 10.1038/s41587-019-0036-z
Summary
Keywords
galectin, lectins, glycans, evolution, vertebrate, galactose binding lectin, sialic acid binding, galectin-8
Citation
Günther J and Galuska SP (2023) A brief history of galectin evolution. Front. Immunol. 14:1147356. doi: 10.3389/fimmu.2023.1147356
Received
18 January 2023
Accepted
13 June 2023
Published
29 June 2023
Volume
14 - 2023
Edited by
Stevan Springer, University of Prince Edward Island, Canada
Reviewed by
Herbert Kaltner, Ludwig Maximilian University of Munich, Germany; Yongbo Bao, Zhejiang Wanli University, China
Updates

Check for updates
Copyright
© 2023 Günther and Galuska.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Juliane Günther, guenther.juliane@fbn-dummerstorf.de
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.