Analysis of α-Dystroglycan/LG Domain Binding Modes: Investigating Protein Motifs That Regulate the Affinity of Isolated LG Domains

Dystroglycan (DG) is an adhesion complex that links the cytoskeleton to the surrounding extracellular matrix in skeletal muscle and a wide variety of other tissues. It is composed of a highly glycosylated extracellular α-DG associated noncovalently with a transmembrane β-DG whose cytodomain interacts with dystrophin and its isoforms. Alpha-dystroglycan (α-DG) binds tightly and in a calcium-dependent fashion to multiple extracellular proteins and proteoglycans, each of which harbors at least one, or, more frequently, tandem arrays of laminin-globular (LG) domains. Considerable biochemical and structural work has accumulated on the α-DG-binding LG domains, highlighting a significant heterogeneity in ligand-binding properties of domains from different proteins as well as between single and multiple LG domains within the same protein. Here we review biochemical, structural, and functional information on the LG domains reported to bind α-dystroglycan. In addition, we have incorporated bioinformatics and modeling to explore whether specific motifs responsible for α-dystroglycan recognition can be identified within isolated LG domains. In particular, we analyzed the LG domains of slits and agrin as well as those of paradigmatic α-DG non-binders such as laminin-α3. While some stretches of basic residues may be important, no universally conserved motifs could be identified. However, the data confirm that the coordinated calcium atom within the LG domain is needed to establish an interaction with the sugars of α-DG, although it appears that this alone is insufficient to mediate significant α-DG binding. We develop a scenario involving different binding modes of a single LG domain unit, or tandemly repeated units, with α-DG. A variability of binding modes might be important to generate a range of affinities to allow physiological regulation of this interaction, reflecting its crucial biological importance.


INTRODUCTION
In the tissues of multicellular animals, different cell types establish intercellular molecular contacts (junctions) as well as cellular-extracellular ones. Accordingly, connections established between the extracellular matrix (ECM) and appropriate adhesion complexes at the cell surface support the functional morphology and physiology of different tissues during development and in adult animals. A well-conserved array of ECM proteins, cell adhesion receptors and ECM proteases has been identified in animals (Özbek et al., 2010;Hynes, 2012). In this context, dystroglycan is probably the major non-integrin adhesion complex for the formation of molecular contacts that stabilize the interface between cell membranes in skeletal muscle and the specialized ECM surrounding them (i.e., basement membranes) (Adams and Brancaccio, 2015) (see Figure 1A).
Dystroglycan (DG) is composed of two subunits, α-DG which is extracellular and highly glycosylated, and the transmembrane β-DG, whose cytoplasmic domain links the whole adhesion complex to dystrophin and the actin cytoskeleton in skeletal muscle. In addition to skeletal and cardiac muscles, DG has a wide tissue distribution and interacts with utrophin and dystrophin isoforms within the cytoplasm in a variety of tissues, such as the central and peripheral nervous system, lung and kidney (Barresi and Campbell, 2006).
In striking contrast with the integrin family of cell-ECM adhesion receptors, there is only one DG gene (DAG1), and no physiologically relevant alternative splicing has been reported so far. Therefore, different α-DG binding affinities for various extracellular ligands seem to depend on the degree of α-DG glycosylation, which might vary in different tissues (McDearmon et al., 2006;Sciandra et al., 2013). In fact, α-DG interacts with members of the laminin family of ECM glycoproteins and several other ECM-associated binding partners through its ability to recognize the so-called laminin-globular (LG) domains (see Figure 1B for a summary of the domain architecture of the binding proteins discussed here). LG domains were originally identified within laminins (Beck et al., 1990) but can be found in several proteins and proteoglycans located in the ECM as well as in a variety of otherwise diverse proteins (Talts et al., 1998). The LG domains that bind α-DG likely represent a distinct subset within the entire LG-domain family and are expected to share common structural features that allow them to recognize α-DG. However, sequence analysis demonstrates a limited degree of homology amongst different LG domains, and it is possible that other members of the LG domain family remain to be identified (Rudenko et al., 2001).
Although other proteins and proteoglycans harboring LG domains (for example, thrombospondins, Adams and Lawler, 2011) might be additional α-DG binders, the focus of this Review is on the established and physiologically relevant molecular partners whose binding behaviors have been tested both in vivo and in vitro. We examine LG domains demonstrated experimentally to bind α-DG for potential common structural features and compare these domains with LG domains that do not bind α-DG in vitro.
Laminins and certain other LG-containing binding partners have high, or very high (K d 's within the low nanomolar range) binding affinities for α-DG (Sciandra et al., 2013). The interaction depends on recognition of O-linked sugar moieties that protrude from the central mucin-like domain of α-DG. Multiple analyses from different groups have led to an increased understanding of the glycosylation moieties of O-mannosylated α-dystroglycan, whose complex structure also includes ribitol-phosphate, a newly identified glycosylation unit in mammals (for further details see recent reviews by Manya and Endo, 2017;Sheikh et al., 2017;Kanagawa and Toda, 2018).
Like legume lectins as well as animal galectins, with which they share a similar fold (Rudenko et al., 2001), the LG domains establish crucial protein-sugar interactions to stabilize sarcolemma and other plasma membranes. In this regard, it has recently been shown (Briggs et al., 2016) that the recombinant LG4-LG5 pair of murine laminin-α2 can be crystallized in combination with a polysaccharide that is found in α-DG and belongs to its glycan-repeated scaffold (Willer et al., 2014); this important study also demonstrates the crucial role of a coordinated calcium cation for binding. The disaccharide portion of the sugar molecule identified in the structure is coordinated only to the LG4 domain (Briggs et al., 2016), and any additional contribution of protein-protein interactions in the DG/LG binding, possibly involving portions of the core protein of DG, has not so far been confirmed (Bozic et al., 2004). A point of curiosity is that, while many LG-containing proteins include multiple LG domains, a few, such as the slits, contain only a single domain. It is not clear whether this distinction reflects a different mode of binding.
Although in the past 20 years considerable biochemical and structural data (see Table 1) have been collected on laminins (the prototypical LG-containing DG binding partners) and on several other DG binding partners, several key questions concerning dystroglycan-LG binding remain unanswered. For example, is it possible to identify common molecular characteristics (ideally short linear motifs) that would define the propensity to bind α-DG in this domain family? On the other hand, are there some identifiable structural features shared by non-binding LG-domains? These questions could have important biological implications, and to review them we have combined two separate lines of analysis. Firstly, we consider the general features of LG domains and re-evaluate the available information on the interaction of α-dystroglycan to its binding partners that share LG domains; secondly, we have carried out homology modeling of LG domains of unknown structure, with a specific focus on two paradigmatic cases, namely the LG domains of human slit and agrin. The modeling results are reported in the paragraphs corresponding to the different binding partners of DG in the sections below, and the methods employed are described in the section "Methods for sequence analysis and homology modeling" of the Supplementary Material.

GENERAL FEATURES OF LG DOMAINS
LG domains have a globular/spherical shape and are commonly composed of 160-180 amino acids, mostly organized as a jellyroll sandwich with two antiparallel seven-stranded β sheets Le et al., 2011). For example, the structure of human perlecan LG3 has been solved both in calcium-bound (PDB: 3SH5) and apo form (PDB: 3SH4) (Le et al., 2011), and comprises 14 β-strands (from A to N) and 2 α-helices (Figure 2). The specific β-strand order, as found in perlecan LG3, JIHCLAN and GFEDKBM (in the two opposing β-sheets, respectively, see Figure 2) applies broadly to all the LG domains FIGURE 1 | (A) The dystrophin-glycoprotein complex (DGC). Dystroglycan is embedded within the membrane as a pivotal member of the complex establishing multiple contacts with extracellular, transmembrane, and intracellular partners. In skeletal muscle it creates a "molecular bridge" between cells and the surrounding tissues offering stability upon muscle contraction. (B) Domain architecture of representative α-dystroglycan binding partners. LG domains representing α-DG-binding sites are boxed in green. Each domain has been assessed using InterProScan 5.2 (https://www.ebi.ac.uk/interpro/) database as described in the Supplementary section. Codes: LN-N-term. (Laminin, N-terminal), LN-EGF (Laminin-type epidermal growth factor-like), LN IV (Laminin IV domain), LNαDI (Laminin alpha, domain I), LNDII (Laminin Domain II), NtA (N-terminal of agrin), SEA (Sperm protein, Enterokinase and Agrin), EGF-like (Epidermal growth factor-like), IG-like (Immunoglobulin-like), Leu-r (leucine-rich), Cys-r (cysteine-rich). Not to scale.
LG1 might represent the best binder in the LG1-2 tandem. Affinity is influenced by a splice site (A: KSRK) on LG2.
The variant with KSRK binds less strongly.
The KVR motif is conserved The list includes the laminin-α3 chain (reported not to bind DG). α-DG binding has not yet been confirmed for the two paralogs of Slit-2.
In bold: LG domains which are able to bind α-DG in an isolated fashion, and potential templates used for computational modeling. Laminin-α3 chain sequence refers to the isoform 3A.
Frontiers in Molecular Biosciences | www.frontiersin.org whose 3D structures have been resolved, although in some case the strands at the edges of the sheets are not well-defined. Thus, human perlecan LG3 structure is used as a reference in this Review, that mainly focuses on human LG domains. It should be noted that the two α-helices following the B and K strand in perlecan LG3 are not uniformly found in other LG domains of known structure. The conserved Ca 2+ coordination site in LG domains of known structure comprises side chain (acidic or amide) and backbone carbonyl groups on three loops that come together on one edge of the "clam"-like LG domain, namely the loops between β-strands D-E, F-G, and J-K (see Figures 2,  3). In more detail these are (i) a highly conserved acidic side chain (usually Asp but occasionally Glu) on the D-E loop, (ii) a backbone carbonyl group on the F-G loop, and (iii) a backbone carbonyl and a side chain acidic or amide group on the long J-K loop. Water molecules likely provide two hydrogen bonds to fully coordinate the Ca 2+ ion, and the recently resolved crystal structure of laminin-α2 LG4 in the presence of a co-crystallized α-DG polysaccharide (Briggs et al., 2016) indicates that these waters are displaced by Ca 2+coordinating oxygen atoms on the carbohydrate, presumably when an LG domain undergoes Ca 2+ -dependent binding. The structure reported by Briggs et al. also shows that in addition to coordinating the LG domain Ca 2+ ion, the disaccharide moiety interacts directly with the laminin α2 LG4 domain through an interesting stacking interaction between R2803 and the GlcA1 ring, and via hydrogen bonds between backbone amide NH groups (D2873 and I2874) with the GlcA3 ring carboxyl group and between the backbone carbonyl of G2826 and GlcA1 ring hydroxyl group ( Figure 3A). Although the carbohydratestacking arginine of laminin α2 LG4 is not conserved amongst LG domains (see Figure S1), backbone amide groups in equivalent hydrogen-bonding positions, and especially a highly conserved glycine equivalent to G2826 (see Figure 3B and Table 2) may provide common interaction points for LG domains that bind α-dystroglycan.

STRUCTURAL ANALYSIS AND MODELING OF DIFFERENT α-DG BINDING
LG DOMAINS Table 1 summarizes key properties of LG domains present in proteins and proteoglycans that are known to bind α-DG and which are discussed in detail below. Particular emphasis is given to the LG-domains that bind as isolated single domains or as a tandem array. In some cases (e.g., laminin-α3-containing laminin isoforms), the possibility of α-DG binding has not been tested yet, although sequence homologies with domains having known structures allows these to be modeled and inferences made about their likely Ca 2+ and α-DG binding.

Laminins
Laminins are hetero-trimers composed of single α, β, and γ chains. They represent the most extensively biochemically characterized family of α-DG binding partners. In fact, lamininglobular (LG) domains are named after the five domains commonly found at the C-terminal portion of all laminin α chains, typically separated by short stretches of linking amino acids (Beck et al., 1990;Timpl et al., 2000). Extensive binding analysis Talts et al., , 2000 as well as crystallization and X-ray structural analysis, carried out by Hohenester and colleagues (Hohenester et al., 1999;Tisi et al., 2000;Harrison et al., 2007;Carafoli et al., 2009) on several laminin LG domains expressed recombinantly, suggests the presence of a hierarchy of binding modes and affinities (see Table 1). Affinity studies have shown that the presence of tandem arrays seems to be required for efficient binding to DG (e.g., two domains as in LG4-5, or three as in LG1-3); in only a few cases can binding be supported by an isolated LG domain. Examples of the latter are LG4 of the laminin α1 chain (probably the most efficient binder) (Durbeej et al., 2001) or, with reduced affinity, LG3 of α2 or LG4 of α5 (see Figure 4).
From the viewpoint of function, the terminal LG domains seem to represent a binding hot-spot. For example, it has been observed that binding of LG4-5 of laminin α1 induces tyrosine phosphorylation of syntrophin, initiating a signaling pathway (Zhou et al., 2006). In a similar line of enquiry, the functional role of a specific stretch of residues spanning the loop which connects the adjacent E and F β-strands of the LG4 module and is important for syndecan and heparin binding in laminin-332 (Utani et al., 2001), has been analyzed in all the five isoforms of murine laminin α via a panel of synthetic peptides (Suzuki et al., 2003). The analysis suggested that this region is involved in several biological activities, including cell attachment and neurite outgrowth, although α-DG binding was not tested (Suzuki et al., 2003). The latter observation is consistent with the position of this loop at the opposite side of the LG domains from the Ca 2+ and carbohydrate binding regions that are involved in α-DG binding (see Figure 2).
As far as calcium is concerned, it is generally accepted that this metal is required to achieve tight α-DG binding (Ervasti and Campbell, 1993;Brancaccio et al., 1995). In fact, a calcium ion must be bound to LG4, but not LG5, for efficient binding of murine laminin-α2 to α-DG (Wizemann et al., 2003), and, more recently, the importance of Ca 2+ in LG4 of murine laminin α1 for chelating disaccharide units present on α-DG has been demonstrated (Briggs et al., 2016). Underscoring the importance of Ca 2+ for α-DG binding, crystal structures of a number of LG domains have been solved that contain a Ca 2+ (or Mg 2+ ) ion within a structurally homologous binding site. These include LG domains of agrin, laminins, neurexins, and perlecan ( Figure 5A).
The Ca 2+ site of LG domains is a weak site with just two (laminins; agrin) or one (neurexins; perlecan; slits) negativelycharged side chains involved in chelating the metal ion ( Table 2). The relatively weak binding affinity of LG domains for Ca 2+ is consistent with a matching of Ca 2+ -binding affinity to the physiological calcium concentration in the extracellular milieu, which is in the low millimolar range (Hofer and Brown, 2003).
Based on sequence analysis, it has been suggested that laminin α3, α4, and α5 chains would not bind calcium . It has been shown, however, that EDTA does abolish FIGURE 2 | Crystal structure of perlecan LG3 domain. β-strands are annotated according to the scheme of Le et al. (2011). In this representation (A) strands M and N are "fused" and a short sequence in the D-E loop is missing density in the crystal structure. The Ca 2+ ion in the perlecan LG3 Ca 2+ binding site (ligand residues are highlighted) is shown as a sphere. (B) homology model of a slit-2 LG domain constructed on the perlecan LG3 structure template. The slit-2 model produces a potential Ca 2+ -binding domain with a very similar arrangement of backbone and side chain ligands for Ca 2+ as in the LG3 domain, suggesting that the slit-2 LG domain is likely to bind Ca 2+ . See Figure 8 for additional interpretation of slit LG domain homology models. LG domain side chain and backbone carbonyl ligands coordinating the Ca 2+ ion are identified with black bold and italic annotations, respectively. LG domain groups making direct interaction with the carbohydrate are identified with blue annotations; these are: R2803 side chain stacking with the Xyl2 ring; G2826 backbone carbonyl with a GlcA1 ring hydroxyl group and D2783 and I2874 backbone amide NH groups with the GlcA3 carboxyl. The two carbohydrate oxygen atoms coordinating the Ca 2+ ion are identified with red stars (Briggs et al., 2016), PDB:5IK5. (B) Structure of perlecan overlaid with the disaccharide unit shown in (A). To construct the model in (B) the perlecan LG (PDB:3SH5) and laminin-α2 LG4 domains were superimposed, and the disaccharide from laminin-α2 directly extracted onto perlecan. Apart from a small difference in the positions of the Ca 2+ ions in the two structures, the disaccharide "fits" into the perlecan structure and provides equivalent Ca 2+ coordinating ligands as in the laminin-α2 LG4 structure. Equivalent groups providing potential hydrogen bonds with carbohydrate (G4276; V4324; A4325) are annotated in blue. The similarities suggest that perlecan should coordinate carbohydrate units of α-DG in a manner similar to that of laminin-α2 LG4.
binding of α-DG to both a laminin-α5 LG domain fragment (including LG1 to LG5) expressed in bacteria and labeled with biotin (Shimizu et al., 1999), and to α5-containing laminin  or laminin-α5 and specific deletion mutants recombinantly expressed in eukaryotic cells (Ido et al., 2004). Homology models obtained here suggest that weak Ca 2+ binding sites might be formed from non-canonical arrangements of candidate Ca 2+ ligands (Figure 4), potentially explaining the effects of EDTA on α-DG binding to laminin-α5 LG domain fragments. Laminin-α3 is likely to represent a paradigmatic α-DG non-binder (Ferletta et al., 2003;Kikkawa et al., 2004). In fact, the nature of potential cell-surface laminin-α3 binders with roles in cell adhesions are unknown. A synthetic peptide based on the murine laminin-α3 sequence that spans the loop connecting the two adjacent E and F β-strands (within its LG4 module) does not affect cell attachment (Suzuki et al., 2003), whereas the human-based peptide does (Yokoyama et al., 2005). However, the terminal LG4-5 domains of laminin-α3 can bind syndecan 1 and 4 (Utani et al., 2001;Carulli et al., 2012). Homology modeling of laminin-α3 LG domains presented here ( Figure 5B) supports the conclusion that these domains are unlikely to bind Ca 2+ . We propose that this may contribute to low α-DG affinities.

Agrin
Neuronal agrin is a heparan sulfate proteoglycan secreted presynaptically which plays an important role in the maturation and stability of the postsynaptic element at the neuromuscular junction (NMJ) (Ruegg and Bixby, 1998). It harbors three LG domains within its C-terminal region, and analysis of isolated domains produced recombinantly showed that the first two (LG1 and LG2) are sufficient for DG binding whereas the last one, LG3 (whose structure has been solved, Stetefeld et al., 2004), lacks or shows very weak binding to α-DG yet is sufficient in isolation for the maturation of the NMJ (Gesemann et al., 1996). In addition, NMR studies have shown that the LG3 domain of agrin binds sialic acid in a Ca 2+ -dependent manner, whilst binding the glycosaminoglycans heparin and heparan sulfate bind independently of Ca 2+ . It remains unclear whether these observations may be relevant for α-dystroglycan binding to agrin (Sallum et al., 2007). The LG domains of agrin are separated by EGF-like modules. Interestingly, a splice variant that includes an inserted basic stretch of residues (site A: KSRK) in the loop between the J and K strands, which introduces a heparin binding site within its LG2 domain, binds α-DG less efficiently (Campanelli et al., 1996;Gesemann et al., 1996;O'Toole et al., 1996). The homology models of agrin LG2 in Figure 6 show that the basic site A sequence (KSRK) likely lies adjacent to a putative Ca 2+ binding site spatially homologous to the Ca 2+ sites in LG domains of known structure, such as perlecan (see Figures 2, 6). Additional agrin splicing isoforms of different lengths (0, 8, 11, or 19 a.a.), involve a site (B) located between the last EGF-like module and the final LG3 domain. The agrin isoform A0B0 (expressed FIGURE 4 | Homology models of laminin LG domains. Laminin-α5 LG3 (A) and laminin-α5 LG4 (C) built on the crystal structure template of murine laminin-α1 LG4 (B, PDB:2JD4, NCBI Code: P19137). Neither the LG3 nor the LG4 domain of laminin-α5 has a canonical Ca 2+ binding site. However, each of the laminin-α5 LG domain models places potential Ca 2+ chelating ligands near the expected position of a Ca 2+ ion. Conformational flexibility in the loops on the edge of these domains may allow weak Ca 2+ binding that could explain observations of EDTA attenuation of α-DG binding by laminin-α5. The laminin-α5 LG4 model has a basic patch (RHR) in a topologically-equivalent location on the B-C loop as other basic patches of other known α-DG-binding LG domains including laminin-α1 LG4 (middle). Note that 2JD4 is the murine laminin-α1 LG4 structure; the human laminin-α1 LG4 equivalent has RKK rather than RKR in this basic patch motif. In red: residues involved in Ca 2+ binding (italic numbers denote backbone carbonyl ligands), in purple: Arg residues of the basic patch.
by skeletal muscle cells) shows the tightest binding to α-DG (Gesemann et al., 1996(Gesemann et al., , 1998. Several missense mutations (reported in green in Figure S1), mostly homozygous, have been identified in LG domains of agrin (Huzé et al., 2009;Maselli et al., 2012;Karakaya et al., 2017;Xi et al., 2017;Zhang et al., 2017); these are located in β-strands or inter-strand loops that may be involved in α-DG binding. The missense mutations that cause congenital myasthenic syndrome (CMS) affect the agrin LG2 domain (namely, G1675S, R1698C, G1709R, V1727F, and A1768P), which is known to be part of the α-DG binding site together with LG1 (Gesemann et al., 1996). The extent to which these mutations might affect α-DG binding and how the agrin/α-DG connection would relate to the range of CMS phenotypes have yet to be determined. For example, an instability of agrin and impaired clustering of the acetylcholine receptor (AChR) has been observed in cells containing the R1698C mutant (Xi et al., 2017), which, according to our model, is expected to lie on the E-F loop on the opposite side of the domain to the Ca 2+ coordination site. The neighboring G1709R that lies on the FG loop near the Ca 2+ site was not reported to affect α-DG binding (Huzé et al., 2009). The structural context of these mutations is illustrated on the model of human agrin LG2 in Figure 7.

Perlecan
Another heparan sulfate proteoglycan which binds α-DG tightly (Friedrich et al., 1999; i.e., with Kd within the low nanomolar range,  is perlecan. The C-terminal portion of perlecan is termed domain V and includes three LG domains and several EGF-like domains organized in the following order: LG1-EG1-EG2-LG2-EG3-EG4-LG3. The recombinant domain V, which includes all the LG domains (LG1-3), shows the tightest binding to α-DG, but LG1-2 and LG2-3 pairs can also bind; conversely, isolated perlecan LG domains did not bind in the range of ligand concentrations explored (i.e., up to 500 nM) (Friedrich et al., 1999). Perlecan domain V, renamed "endorepellin" due to its angiostatic activity (Gonzalez et al., 2005), has been involved in multiple biological functions related to its ability to modulate cytoskeletal dynamics in a calcium-dependent manner (Le et al., 2011). The highresolution crystal structure of human perlecan LG3 has been solved (see Figures 2, 4A) and the presence of a Ca 2+ ion confirmed, highlighting some structural similarities with other LG domains (namely, LG3 of agrin, LG2 of neurexin1α, LG5 of laminin-α2 and LG1 of neurexin1β) (Le et al., 2011). A superposition of the perlecan LG domain structure with the structure of laminin-α2 LG4 that contains a co-crystallized polysaccharide unit (Briggs et al., 2016) indicates that the α-DG binding edge of the perlecan LG domain is likely to bind disaccharide units in a similar fashion to that of laminin LG4 (Figure 3).

Pikachurin
Pikachurin is a retinal ECM protein localized to the synaptic cleft in the photoreceptor ribbon synapse of the central nervous system, which binds α-DG in a Ca 2+ -dependent fashion (Sato et al., 2008). Within its C-terminal portion, pikachurin harbors three LG domains separated by single EGF-like domains. Binding experiments on recombinantly produced isolated domains have shown that LG2-3 are sufficient for binding with an affinity similar to LG1-3 (Kanagawa et al., 2010). This implicates pikachurin LG1 as another LG domain that could be included within the group of "non-binders" (see below). LG3 alone is unable to induce DG clustering (Omori et al., 2012), indicating that a tandem array (doublet) of LG domains is required for binding. and Ca 2+ -free (blue) perlecan. Ca 2+ -bound perlecan PDB:3SH5; Ca 2+ -free perlecan PDB:3SH4. The structure of the binding site is maintained in the absence of Ca 2+ . Similar observations have been made with agrin in its free and Ca 2+ -bound states (Stetefeld et al., 2004). The Ca 2+ ligands are highlighted; italic type denotes backbone carbonyl ligands. (B) Homology model of laminin-α3 LG4. The model was built on the crystal structure template of laminin-α1 LG4 (PDB:2JD4). This domain lacks potential Ca 2+ -chelating acidic or amide groups on the D-E and J-K loops (see residues labeled in red); the side chain carboxyl (D3051) on the F-G loop is poorly positioned to chelate Ca 2+ . This domain is predicted to have a very weak Ca 2+ binding site.

Neurexins
Neurexins are neuron-specific cell surface proteins expressed at the presynaptic terminal, that bind neuronal α-DG (Sugita et al., 2001;Südhof, 2008). For an overview of the role of neurexins in synaptic organization the reader is referred to a recent review (Rudenko, 2017). In humans, there are three paralogous genes coding for neurexins 1, 2, and 3. Most of the work concerning the neurexin-DG axis has been carried out on neurexin1. Typically, the gene has two promoters: α (upstream) and β (downstream), encoding longer neurexins-α and shorter neurexins-β, each harboring LG domains within their extracellular portions. In neurexin-1, the LG domains of neurexin-α are organized with interspersed EGF-like domains as follows: LG1-EGF1-LG2-LG3-EGF2-LG4-LG5-EGF3-LG6 (see Figure 1B). Neurexin-β features only one LG domain. Interestingly, an extensive analysis of recombinant isolated domains showed that only LG2 and LG6 of neurexin1α and LG1 of neurexin1β (identical to LG6 of neurexins-α) could bind α-DG tightly and, most relevantly, in an isolated fashion. There is a plethora of neurexin splicing isoforms (Missler et al., 1998), and the forms with no inserts appear to bind better to α-DG (Sugita et al., 2001). A large body of structural work on neurexin-1 has been carried out by Rudenko and coworkers (Rudenko et al., 1999;Sheckler et al., 2006;Shen et al., 2008;Chen et al., 2011), showing that their LG domains conform to the typical jelly-roll fold motif and revealing unexpected structural similarity to legume lectins and human galectins (Rudenko et al., 1999). In fact, it has been suggested that neurexins might bind carbohydrates, and that this could have a role in the interaction with DG. In addition, neurexins bind tightly to other protein partners such as αlatrotoxin and neuroligin, and alternative splicing also modulates such binding activities (Rudenko et al., 1999). It has been shown that LG2 of neurexin1α binds calcium with K d ≈ 400 µM, and that splicing may influence the affinity (Sheckler et al., 2006). Similarly, splicing was also shown to modulate the affinity for calcium of the LG domain of neurexin1β (Shen et al., 2008). However, it remains unclear how this effect may relate to α-DG binding. Interestingly, binding of α-DG and neurexophilin-1 to neurexin1α are mutually exclusive, suggesting overlapping binding epitopes, and that the binding of α-DG to neurexin can also influence the formation of trans-synaptic neurexinneuroligin complexes (Reissner et al., 2014).

Slits
Slit-2 is another neuronal protein that acts as a secreted axonal cue guidance factor and is reported to bind α-DG (Wright et al., 2012). Slit-2 is the only binding partner of α-DG that has a single LG domain (the other one, neurexin1β, being a cell surface receptor; see above). Although the binding constant has not been determined, it would be predicted to bind α-DG tightly. It has been shown that a pair of basic residues (in the sequence KVR) is important for DG binding, as is the presence of a coordinated calcium ion (Wright et al., 2012). There are two additional highly homologous slit paralogs, known as slit-1 (in which the KVR is not conserved) and slit-3 (in which it is), however there are no reports on their potential α-DG binding properties as yet.
Homology models of the LG domains of the slits (Figure 8) support the presence of a canonical Ca 2+ site in slit-2, with an arrangement of potential Ca 2+ -chelating groups that is similar to that of the structurally-defined Ca 2+ site of perlecan (see Figure 2, with the model of slit-2 on the right for comparison); the latter binds Ca 2+ with a K d ≈ 100 µM (Le et al., 2011).
The KVR sequence required for tight binding of α-DG in slit-2 lies on the same edge of the domain as the Ca 2+ site (Figure 8) and likely occupies a structurally equivalent position on the B-C strand loop as the corresponding RKK sequence of human laminin-α1 LG4 (RKR in murine laminin-α1 LG4; Figure 4) and the RHR sequence of laminin-α5 LG4 (Figure 4). We speculate that this basic "patch" may serve as an additional site for the FIGURE 6 | Homology models of agrin LG2 KSRK-containing splice variant. Models were obtained using as template: (A) human perlecan LG3 (3SH5), (B) chicken agrin LG3 (1PZ7) and (C) murine laminin-α1 LG4 (2JD4). The potential Ca 2+ binding site is relatively well defined and independent of the structural template. The Ca 2+ ions modeled into the putative Ca 2+ binding sites are shown as spheres. The position of the KSRK sequence (green) that lies in the loop between the J and K strands (see text) is not well-constrained in the models. These models illustrate the variability in structures of modeled LG domains obtained when using a range of different structural templates. Note that the strongest sequence homology (≈35% identity) is between human agrin LG2 and perlecan LG3. recognition of another disaccharide unit, thus strengthening the interaction between α-DG and these LG domains including slit-2 (see below); the stacking interaction between R2803 and the carbohydrate Xyl2 sugar ring in the laminin-a2 LG4 structure co-crystallized with a LARGE polysaccharide unit (Briggs et al., 2016) provides a possible context for this type of interaction (see Figure 3).
The proposed structural basis for binding of slit-2 to α-DG is additionally supported by the observation that the Ca 2+binding edge of the domain can be modeled to interact with a disaccharide moiety in a manner similar to that identified in the structure of laminin-α2 LG4 co-crystallized with a polysaccharide moiety (Briggs et al., 2016) (i.e., very similar to the structure of perlecan LG overlaid with the disaccharide shown in Figure 3). Comparison of the equivalent LG models of slit-1 and slit-3 (Figure 8) indicates (a) that slit-3 is predicted to have a "weaker" binding site for Ca 2+ , although it has a KVR sequence in the same location as that of slit-2, and (b) that slit-1 lacks the KVR sequence and thus may have a weaker interaction with DG compared to slit-2. In addition, slit-3 lacks the highly conserved glycine residue (G1222 in slit-2; G2826 in murine laminin α2 LG4; G4276 in perlecan, see Figure 3; this Gly residue is underlined in Table 2) that may be important for maintaining structure near the Ca 2+ binding site.

A VISUALIZATION CODE FOR BINDING AFFINITIES OF LG DOMAINS TOWARD α-DG
The binding affinities of recombinantly expressed LG domains from several DG binding partners have classically been measured or estimated by semi-quantitative solid-phase binding assays or by more quantitative Surface Plasmon Resonance. In some cases, very high apparent affinities (K d 's within the range 0.1-1 nM) have been reported (for a comprehensive review, see Sciandra et al., 2013). Data from these studies have allowed us to construct a scheme to characterize α-DG binding of LG domains  Figure 2). Slit-2 has a similar arrangement of potential Ca 2+ -chelating groups (red), as perlecan (see Figure 2). In isoforms 2 and 3 the KVR sequence (basic residues in purple) that supports α-DG binding is adjacent to the Ca 2+ site; for comparison, the basic KSRK sequence that hinders α-DG binding in the agrin LG2 splice variant (see Figure 6) would lie to the right of the Ca 2+ binding site in this view.

FIGURE 9 | A color code for
LG domain-α-dystroglycan binding affinities. Based on available biochemical information, the α-DG binding partners are classified by the binding affinities of their multiple LG domains, represented using the color code reported in the figure. The affinity scale has been estimated semi-quantitatively based on available binding data (see Sciandra et al., 2013): tighter affinity refers to the lower nanomolar range (0.1-1 nM), medium affinity to the higher nanomolar range (10-100 nM), weak or very weak binding within the micromolar range. The laminin (LN) sub-family (referring to the five different α chains, A1 to A5) is on the left whilst agrin (AGR), perlecan (PRL) and other binding partners (PKC: pikachurin, NRX: neurexin, SLT2: Slit-2) are on the right. The drawings are not to scale. Numbers between domains indicate the number of amino acids in the linkers separating them.
according to the affinity and requirements for multiple LG domains (Figure 9).
It is evident that a tandem array of 2 or 3 LG domains in a row is often required for strong α-DG binding (brown in Figure 9), and these combinations (a doublet in laminin-α2; a triplet in perlecan) constitute the tightest binding detected to date. Orange and red both indicate a reduced but still significant affinity. Importantly, the few domains that apparently bind α-DG in an isolated fashion do so with relatively high affinity (see also below). However, these are not universal properties: LG3 of laminin-α2 can bind but with a weaker affinity, and tandem combinations of laminin-α4 (LG1-3 and LG4-5) and laminin-α5 (LG4-5) show a weaker binding than the tandem LGs noted above. Amongst the laminins, laminin-α3 LG domains stand out as having negligible affinity for α-DG.
Overall, it appears that the LG assembly that most favors α-DG binding is one formed by at least two sequential LG domains. Such an assembly can be found in the LG4-5 of laminins α1, α2, α4, and α5, in agrin (LG1-2) and pikachurin (LG2-3); tight or moderate binding by a three LG-domains array is found in laminin-α2 and α4 as well as in perlecan (LG1-3). Overall, the tandem arrays found in laminin-α2 (skeletal muscle isoform) and perlecan (expressed at the neuromuscular junction, NMJ) show the tightest binding to dystroglycan. This is in accordance with the biological importance of the DG adhesion complex for the stability of adult skeletal muscle and of the post-synaptic element within the peripheral nervous system (i.e., at the NMJ). Only in a limited number of cases (reported in red in Figure 9) does an isolated LG bind strongly. These include LG4 of laminin-α1, LG4 of laminin-α5, LG2, and LG6 of neurexin1α, LG1 of neurexin1β (which has an identical sequence to LG6 of neurexin1α) and LG1 of slit2.

Ca 2+ Coordination
As described above, the typical LG module comprises a compact sandwich in which 2 β-sheets (formed by 7 β-strands each) pack against each other, and all LG domains of known structure contain a conserved Ca 2+ coordination site. Since LG domains function in the relatively high Ca 2+ concentration of the extracellular matrix, their Ca 2+ binding affinity is very weak in comparison with intracellular proteins that chelate Ca 2+ strongly such as calmodulin (4 acidic Ca 2+ -coordinating side chains, with K d 's within the range of 1-0.01 µM, Linse et al., 1991), and is likely to be more similar to that of the sarcoplasmic reticulum Ca 2+ binder calsequestrin, for example, whose Ca 2+ -binding sites contain only one or two acidic groups and whose K d is in the millimolar range (Sanchez et al., 2012).
In the two examples of LG domains for which structures with and without Ca 2+ are known (agrin and perlecan), the Ca 2+ binding site is essentially pre-formed (Figure 5A), indicating that Ca 2+ binding is not required for the overall fold of the LG domain. This contrasts with very tight Ca 2+ binders involved in Ca 2+ -induced allosteric regulation, such as calmodulin, in which structure is induced in a disordered Ca 2+ -free binding site when Ca 2+ binds. However, NMR studies on perlecan LG domains indicate that Ca 2+ binding, although not stringently required for correct folding, stabilizes the LG domain structure (Le et al., 2011). Although the Ca 2+ site in all LG domains is at best a weak binding site and includes only one or two acidic side chain calcium-chelating ligands (see Table 2), LG domains known to lack α-DG binding activity may be especially poor Ca 2+ binders. Homology models of the laminin-α3 LG domains, for example, indicate that the groups putatively responsible for Ca 2+ coordination, located at the edge of the LG domain constitute a far from ideal set of potential Ca 2+ -chelators ( Figure 5B).

Linear α-DG-Binding Module(s)
It is expected that those LG domains able to bind α-DG as single domains may display some relevant sequence features that would be necessary for α-DG binding, and likewise, that useful information might be retrieved by comparative analysis of those domains which are reported not to bind α-DG. For this reason, we performed a series of alignments of selected protein sequences, with reference to human sequences. The alignment of so-called "isolated binders" is reported in Figure 10, and the LG domains of laminin-α3, which does not bind α-DG, are examined in Figure 11. An alignment of all the LG sequences under consideration was also made ( Figure S1). Although perlecan domains have not been reported to bind α-DG in isolation (Friedrich et al., 1999;, the sequence of human perlecan LG3 domain has also been included in the alignments in order to compare all the human sequences within a typical LG-domain secondary structure framework. Based on the alignment in Figure 10, no universal short linear DG-binding motif(s) can be identified, and it is likely that a combination of sequence and structural features is required. The laminin α2 LG4 LARGE carbohydrate crystal structure (Briggs et al., 2016) shows that hydrogen bond interactions between the LG domain and carbohydrate sugar rings involve backbone amide groups (Figure 3) and these potentially important interactions may be rather relaxed with respect to amino acid sequence requirements and therefore not apparent in searches for sequence homologies. It is notable that the J strand is less conserved in the binders (Figure 10) compared to non-binders (Figure 11) and, overall, the J strand appears to be the least conserved within the entire LG domain β-strandscaffold. In contrast to other strands, none of the residues of the J strand are highlighted in black or gray, indicating a high degree of amino acid variability (see Figure S1). This variability could play a functional role, for example in modulating α-DG recognition.

Recurring Basic Residues Within Specific Structural Elements
The presence of scattered patches of basic residues in and around the Ca 2+ cleft of LG domains has been suggested to represent an important requirement for α-DG binding (Harrison et al., 2007). Mutation to alanine of the two basic residues within the KVR sequence on the B-C loop of slit-2 (Wright et al., 2012) and the topologically-equivalent RKR (to AKA) in laminin-α1 LG4 (Harrison et al., 2007) abolishes α-DG binding. These observations are consistent with the expectation that basic residues facilitate the binding of LG domains to the negatively charged carbohydrate groups on α-DG. Although there were some notable discrepancies between the data collected by Harrison et al. (2007) and earlier mutational analysis from Andac et al. (1999), that might be ascribed to differences in the experimental conditions used and/or source of α-dystroglycan samples (see Harrison et al., 2007), the emerging scenario indicates that the basic residues whose mutation most perturbs α-DG binding fall on the edge of the domain that contains the FIGURE 10 | Multiple sequence alignment of selected isolated LG domains that are binders of α-DG. The sequences refer to all the strong binders identified and include also the LG3 module of the laminin α2 chain (reported to be a weaker binder). Secondary structure elements, as retrieved from the available 3D structure of human perlecan (3SH4/3SH5), are rendered as follows: β-strands, from A to N (red), α-helices (purple), calcium-coordinating amino acid positions (orange and indicated by an asterisk). Sequences were aligned in MUSCLE 3.8 and the alignment presented in Boxshade. Basic residues belonging to the loops neighboring the coordinated Ca 2+ are reported in blue, the ones belonging to the opposite side of the domain are turquoise. The specific LG domain sequences in the alignment are identified as follows: NRX1A/1B: neurexin α1/β1, LNA1/2/5: laminin α1/α2/α5, SLT2: slit-2, PRL: perlecan.
Ca 2+ -binding site (Figure 2), as expected for a role in binding to carbohydrate groups on α-DG. This observation, however, is not easily generalized to the whole family of LG domains which, whether binders or non-binders, have little overall net charge at physiological pH (see Table S1).
The sequence relationships of LG domains were also investigated by molecular phylogeny. The phylogenetic diagram resulting from the multiple sequence alignment showed the LN3-LG1 domain to be distinct from the other sequences analyzed, forming a separate branch in the tree. Within the other clades and sub-clades, there was no discrete segregation of known α-DG-binding LG domains from nonbinders (Figure 12, asterisks indicate known α-DG-binding LG domains). Although this analysis is limited to the subset of experimentally-tested LG domains, the results indicate that there are no clear sequence features that distinguish α-DG binding domains from non-binders in this set, pointing to an apparent paradox intrinsic to the very nature of LG domains.
Thus, the identified features that modulate DG binding affinity, namely the presence and "strength" of the Ca 2+ binding site and the presence of KVK-like basic patches, despite being FIGURE 11 | Multiple sequence alignment of selected isolated LG domains that are non-binders of α-DG. Laminin α3 has been reported not to bind α-DG. Therefore, it is assumed that none of its tandem LG domains would be able to bind α-DG. Secondary structure elements as retrieved from the available 3D structure of human perlecan (3SH4/3SH5): β-strands, from A to N (red), α-helices (purple), calcium-coordinating amino acid positions (orange and indicated by an asterisk). Basic residues belonging to the loops neighboring the coordinated Ca 2+ are reported in blue whereas the one belonging to the opposite side of the domain in turquoise. Code: LNA3_LG1 to LNA3_LG5 (1st to 5th module from laminin α3).
predictive of a possible interaction, cannot alone distinguish unequivocally between strong, weak, or non-binders of α-DG.

IS AFFINITY REGULATED BY MODULARITY?
Extending the analysis beyond single LG domains, tandem arrays of LG domains might constitute a way to modulate binding affinity toward α-DG through modular binding to its glycan scaffold. Indeed, the glycan polymer of α-DG has been defined as a tunable extracellular matrix protein scaffold for which increasing chain length during myogenesis enhances ligand-binding capacity (Goddeeris et al., 2013;Yoshida-Moriguchi and Campbell, 2015). Are multiple, tandem LG domains required for binding partners to recognize a series of disaccharide units on α-DG (termed matriglycan) (Willer et al., 2014)? Such a multiple binding mode could be a mechanism to increase the overall affinity between α-DG and LG domains in vivo (Figure 13). As a matter of fact, regulation of affinity through multivalency is a widely accepted concept in the lectin-carbohydrate field (Raman et al., 2005).
It is interesting to note that two LG domains that can act as isolated binders, (in red in inset to Figure 13), have not been found next to each other in any of the DG-binding proteins. It is tempting to propose that the assembly of repeated tight DG-binding LG domains would make the binding too strong to be compatible with normal physiology. Instead, the tandem assembly as well as the modularity of LG domains of different binding affinities could be a way to tune the interaction between α-DG and its binding partners (Figure 13). This notion could be helpful for the design of therapeutic laminin-or agrinbased molecules.
It is also unclear whether the presence of tandem arrays of LG domains may be important to favor their folding and stability or FIGURE 12 | Unrooted phylogenetic diagram of the protein sequence relationships of the LG domains analyzed in this study. The alignment of 246 positions was prepared in MAFFT and the Newick output rendered in iTOL. Codenames as in Table 1. Asterisks indicate known α-DG-binding LG domains.
is strictly necessary to achieve a physiologically relevant affinity toward α-DG. This is an important question and further work will be required in order to address it properly.
The presence of short basic motifs (KVR or similar) in the loop between the B and C strands of LG domains could be a factor to strengthen the affinity toward α-DG in binding partners possessing only one LG domain, such as slit-2. Another protein with only one LG domain is neurexin1β, which, as a presynaptic protein, might require a relatively tight binding affinity to recognize carbohydrate moieties protruding into the synaptic cleft from the postsynaptically-located α-DG. In this regard, it should be noted that the longer neurexins-α have LG domains which are able to bind DG in an isolated fashion (LG2 and LG6). However, the KVR motif does not appear to be conserved in neurexins (Figure 10), thus other factors could be involved. The fine regulation of the affinities between DG and these neuronal proteins may make important contributions in the central nervous system for the stability of synaptic elements (Hunter et al., 2017) and/or for neuronal pathfinding (Wright et al., 2012).
The polysaccharide polymer that protrudes from α-DG represents a scaffold with multiple attachment sites for ECM binding partners. In skeletal muscle, where α-DG is extensively glycosylated, it has been suggested that this glycan scaffold would act to prevent muscular dystrophy (Goddeeris et al., 2013;Yoshida-Moriguchi and Campbell, 2015). Nevertheless, it is tempting to speculate that too tight an interaction between α-DG and laminin-α2 could also be potentially dangerous for skeletal muscle physiology as it could impair some necessary conformational plasticity (implying a dynamic behavior based on cycles of fast attachment/detachment/re-attachment) at the interface between the sarcolemma of muscle fibers and FIGURE 13 | LG-domains binding modes toward α-DG. Model of how different tandem assembly combinations might be required for fine modulation of α-DG binding affinity. The distance between the two disaccharide units that interact with two adjacent LG domains has not been determined, and the cartoon only depicts an example scenario. Because of their size, tandem LG domains cannot bind to adjacent GlcA-Xyl disaccharide units. A disaccharide unit is indicated, with Xyl: Xylose and GlucA: Glucuronic acid. The grouping on the right illustrates that LG domains that are able to bind α-DG as isolated units (in red) have not been found as neighbors. For color code, see Figure 9.
the surrounding basement membranes. The presence of a limited amount of LG domains, with a distinctively tight binding affinity for α-DG, could favor such modulation, tuning the affinity of laminin for α-DG to an optimal degree for muscle physiology.

FUTURE PERSPECTIVE: BIOMEDICAL SIGNIFICANCE OF DIFFERENTIAL LG DOMAIN AFFINITIES TOWARD α-DYSTROGLYCAN
The importance of studying the molecular mechanism(s) that regulate the affinity between α-DG and its binding partners is emphasized by the frequent use of laminin isoforms in genetherapy approaches. For example, laminin-111 (containing the α1 chain) (Gawlik et al., 2010) as well as a miniaturized version of agrin (the so-called mini-agrin) (Moll et al., 2001), have been proposed as candidates for rescuing the severe congenital muscular dystrophy phenotypes in which laminin-211 is absent. Both of these contain LG domains known to interact strongly with α-DG. Recently, the crucial role played by the affinity of different binding partners toward α-DG has been highlighted by the finding that within the embryonic heart, agrin muscle isoform (A0B0), instead of laminin-2, is the preferential α-DG binder. During development, agrin binding to α-DG promotes the release of the transcription factor YAP (yes-associated protein 1) from sequestration by the dystrophin-glycoprotein complex (Morikawa et al., 2017), eventually allowing YAP to reach the nucleus and trigger the regeneration of cardiac myofibers (Bassat et al., 2017). Therefore, through a deeper knowledge of the molecular basis of LG domain affinity toward α-DG, strategies to modulate these interactions may be developed, and further therapeutic avenues for the treatment of severe neuromuscular disorders or for triggering regeneration procedures in the adult myocardium could be explored (Eroglu and Chien, 2017).

DATA AVAILABILITY
The authors declare that all data generated for this study are included in the manuscript and the Supplementary Files.

AUTHOR CONTRIBUTIONS
MGB and AB conceived the project. CD performed the modeling. JA performed the tree analysis. AB carried out the sequence alignment analysis. CD, MGB, JA, and AB analyzed the data. CD and AB wrote the paper. All authors reviewed and contributed to the various draft versions of the manuscript. All authors read and approved the final manuscript.