Towards Understanding Afghanistan Pea Symbiotic Phenotype Through the Molecular Modeling of the Interaction Between LykX-Sym10 Receptor Heterodimer and Nod Factors

The difference in symbiotic specificity between peas of Afghanistan and European phenotypes was investigated using molecular modeling. Considering segregating amino acid polymorphism, we examined interactions of pea LykX-Sym10 receptor heterodimers with four forms of Nodulation factor (NF) that varied in natural decorations (acetylation and length of the glucosamine chain). First, we showed the stability of the LykX-Sym10 dimer during molecular dynamics (MD) in solvent and in the presence of a membrane. Then, four NFs were separately docked to one European and two Afghanistan dimers, and the results of these interactions were in line with corresponding pea symbiotic phenotypes. The European variant of the LykX-Sym10 dimer effectively interacts with both acetylated and non-acetylated forms of NF, while the Afghanistan variants successfully interact with the acetylated form only. We additionally demonstrated that the length of the NF glucosamine chain contributes to controlling the effectiveness of the symbiotic interaction. The obtained results support a recent hypothesis that the LykX gene is a suitable candidate for the unidentified Sym2 allele, the determinant of pea specificity toward Rhizobium leguminosarum bv. viciae strains producing NFs with or without an acetylation decoration. The developed modeling methodology demonstrated its power in multiple searches for genetic determinants, when experimental detection of such determinants has proven extremely difficult.


INTRODUCTION
The symbiosis between leguminous plants (Fabaceae) and nodule bacteria (collectively called rhizobia) demonstrates an extremely high specificity. When the host plant interacts with a large number of soil microorganisms looking for an appropriate partner, it should be selective enough to avoid penetration of any pathogens, discriminate between different types of rhizobia, and allow symbiotic interaction with the most effective partner. Interactions between partners are carried out by a complicated interplay of signaling; however, several essential details of these molecular mechanisms are still under question (Denarie, 1996;Oldroyd, 2013;Walker et al., 2020).
The molecular crosstalk between partners starts at the early stages of the plant-rhizobia interaction and results in root nodule formation (Figure 1A, photos). Bacteria secrete molecules called Nodulation factors (Nod factors, NFs), which serve as the primary recognition "key" signal for a plant "lock. " The structure of NFs was discovered more than 30 years ago; this molecule generally consists of 3, 4, or 5 N-acelylglucosamines linearly oligomerized through (1,4)-β-linkages and decorated with a fatty acid residue and various small substitutions that play a crucial role in partner recognition (Denarie, 1996;Ritsema et al., 1996). All these decorations are genetically controlled and determine NF shapes, which are distinctive for different rhizobia species (or strains). Each rhizobial strain produces several slightly different NF molecules that may allow the strain to extend the spectrum of its potential hosts (Mergaert and Van Montagu, 1997). In contrast, the beneficial strategy for plants is to narrow their symbiotic specificity and interact with reliable symbiotic partners only. The co-evolution of these strategies results in cross-inoculation groups (CIGs; Baldwin et al., 1927;Sears and Carroll, 1927). This concept describes distinct groups, each containing both legumes and rhizobia strains, so that all plant-bacteria combinations within a group form effective and highly specific symbiotic relationships. Originally, CIGs were thought to be non-overlapped; each legume species and rhizobia strain was considered to belong to only one group Frontiers in Plant Science | www.frontiersin.org (Baldwin et al., 1927;Sears and Carroll, 1927). However, further studies demonstrated that real boundaries between CIGs are blurred, and many cases of non-canonical interactions between legumes and rhizobia from different groups exist (Provorov, 1994;Viprey et al., 2000). Even within a particular CIG, one can find variations in the symbiotic efficiency (Provorov, 1994). This variation within a CIG was demonstrated for the garden pea (Pisum sativum L.); Afghanistan and European pea landraces show different symbiotic success with Rhizobium leguminosarum bv. viciae depending on presence of the nodX gene in rhizobia (Firmin et al., 1993;Ovtsyna et al., 1999). This gene encodes an acetyltransferase that attaches a small acetylation decoration to NF ( Figure 1B) (Lie, 1978;Davis et al., 1988;Ovtsyna et al., 1999). The Afghanistan pea landraces can interact only with R. leguminosarum bv. viciae strains, which have the nodX gene, while European pea lines display relaxed specificity and can also be nodulated by strains lacking this gene ( Figure 1A). The reason for this variation relates to the difference in the NF receptors of pea subtypes. However, the particular pea receptors that interact with NFs are not known.
Plant receptors for NF perception belong to a family of LysM domain-containing receptor-like kinases (LysM-RLKs), and contain three LysM domains in the extracellular part of the protein, a transmembrane domain, and an intracellular kinase domain (Bensmihen et al., 2011;Buendia et al., 2018). All plant LysM-RLKs fall into two classes: LYK (with active kinase) and LYR (with non-active pseudo-kinase; Arrighi, 2006). A heterodimeric receptor of two LysM-RLKs from different classes is required for NF perception Arrighi, 2006;Zipfel and Oldroyd, 2017). The differences in receptor protein sequences, in theory, should reflect the difference in NF structure. However, in practice, different leguminous species with a relatively high level of between-species amino acid polymorphism in receptor genes effectively interact with the same rhizobial strains. Interestingly, a single amino acid variation in a particular LysM domain of the LysM-RLK NFR5 (Nod-factor receptor 5) has been shown to dramatically change the recognition of NFs by Lotus species (Radutoiu et al., 2007). The latter example resembles the case of Afghanistan and European pea cultivars demonstrating a relatively low level of genetic divergence and, at the same time, clearly different symbiotic phenotypes ( Figure 1A). To understand the molecular mechanisms of these different interactions, one needs to study NF-receptor interaction and the meaning of amino acid substitutions in this process. It is first necessary to investigate what is known about the interactions between pea receptors and NF. Expectedly, NF receptors should be homologs of the previously identified legume LysM-RLKs, but in practice, it is almost impossible to match known pea LysM-RLKs with their homologs in other leguminous plants, as LysM-RLKs families consist of a number of closely related receptors.
The existence of the specific pea receptor for acetylated NF was shown with a classic genetic approach more than 20 years ago. It was well-proven that the difference in interaction exhibited by acetylated and non-acetylated NF is controlled by a single plant gene called Sym2 (Kozik et al., 1995(Kozik et al., , 1996. Despite being described long ago, Sym2 has not yet been cloned, and the position of this gene in the pea genome ("the Sym2 region") was determined only in genetic mapping experiments (Kozik et al., 1996). The main candidates for Sym2 are Sym37, K1, and LykX genes that are located in the Sym2 region (Sulima et al., 2017). However, based on sequence differences between Afghanistan and European pea subtypes, LykX has amino acid segregating polymorphism, while Sym37 and K1 were identical between subtypes (Zhukov et al., 2008). Therefore, only LykX out of three candidates has the potential to be responsible for the phenotypic difference between Afghanistan and European peas. Since NF perception requires a heterodimeric receptor with LYR and LYK subunits, we considered the LykX gene as encoding the LYR subunit, and the conventional Sym10 gene as encoding the LYK subunit. The Sym10 gene was demonstrated to be required for the legume-rhizobia symbiosis development, and the possibility of Sym10-Sym37 and Sym10-K1 complexes formation was also shown (Kirienko et al., 2019).
The most obvious way to prove the action of LykX as the NF receptor is by investigating the direct binding of the isolated receptor with NF. However, the evidence for direct binding of acetylated and non-acetylated NFs to LykX protein is still lacking. Only several attempts to show such binding have been successful to date (Broghammer et al., 2012;Kirienko et al., 2018). The difficulty of this detection may be caused by the fact that legume NF receptors work in heterodimers (Xiao et al., 2014;Igolkina et al., 2018;Kirienko et al., 2019), whereas modern technologies are aimed to test the "one receptor-one ligand" model. Additionally, NF isolation and purification are very time-and resourceconsuming procedures, while the chemical synthesis of NF is extremely difficult due to the high complexity of its structure.
Therefore, as long as the demonstration of the physical interaction between receptor heterodimers and NFs remains challenging, molecular modeling is a suitable and appealing alternative to direct experimental approaches. Recently, based on this method, probable complexes of plant heterodimeric receptors with NF have been proposed, and the overlap between population polymorphisms in the contact zone between receptors in the complex has been analyzed (Igolkina et al., 2018). To date, molecular modeling provides the opportunity for the in silico mass-testing of interactions between NFs and plant receptors to highlight and support further targeted validation in experiments.
In this study, we exploited the merits of molecular modeling to analyze the effect of specific chemical modifications in NF produced by R. leguminosarum bv. viciae on its interaction with the LykX-Sym10 receptor. For all possible LykX-Sym10-NF complexes, we performed two independent in silico validationscanonical MD and an original thermochemical-based pipelineand obtained stable configurations in line with the observed pea phenotypes. The modeling analysis supported the LykX gene as a suitable determinant for Sym2.

Sequence Analysis
We analyzed 95 amino acid sequences of PsLykX receptor (MF155382-MF155469; MN187362-MN187364, and MN200353-MN200358; Sulima et al., 2017Sulima et al., , 2019 and eight Frontiers in Plant Science | www.frontiersin.org amino acid sequences of PsSym10 receptor (MN727808, MN727809, MN727810, and MN727811 -this work; AJ575250, AJ575251, AJ575252, and AJ575253 - Madsen et al., 2003). Among the LykX sequences used in this study, seven represented the Afghanistan pea phenotype and 85 represented the European pea phenotype. According to the Sulima et al. (2019), the Afghanistan pea phenotype has two alleles, Afghan and Tajik, which were present in our dataset in five and two variants, respectively. Therefore, we separately analyzed three subsets of PsLykX sequences based on the following pea alleles: Afghan, Tajik, and European. It should be noted that the names of alleles do not reflect the geographical origin of the corresponding pea samples. Within eight sequences of the PsSym10 receptor gene, three belonged to Afghanistan phenotype, and five to the European phenotype (Supplementary Table 1).
Most of the analyzed sequences were obtained from previous studies, but four alleles of PsSym10 (MN727808, MN727809, MN727810, and MN727811) were sequenced in the current study. DNA was extracted from young leaves (top or secondfrom-top node). Extraction was conducted according to the previously described CTAB protocol (Rogers and Bendich, 1985;Sulima et al., 2017). PCR was performed in 0.5 ml eppendorftype microcentrifuge tubes on an iCycler (Bio-Rad, Hercules, CA, United States) or Dyad (Bio-Rad) thermocycler using the ScreenMix-HS kit (Evrogen, Moscow, Russia). The PCR cycling conditions were as follows: 95°C (5 min), 35 cycles [95°C (30 s), Tm (varying depending on primers; 30 s), 72°C (1 min)], and 72°C (5 min). The PCR fragments were sequenced using the ABI Prism 3500xL system (Applied Biosystems, Palo Alto, CA, United States) at the Genomic Technologies, Proteomics, and Cell Biology Core Center of the ARRIAM (St. Petersburg, Russia). PCR primer sequences are listed in Supplementary Table 2. The resulting sequences have been deposited into the NCBI database (see accession in Supplementary Table 1).
Extra-membrane domains of LykX and Sym10 receptors contain 212 and 226 amino acids, respectively. Sequences for LykX and Sym10 were aligned separately in Mega X using the MUSCLE algorithm (Kumar et al., 2018). In each alignment, we analyzed polymorphic positions within and between allele subsets and focused only on polymorphic positions found between subsets. We hypothesized that the polymorphisms found between subsets are responsible for the difference in pea phenotypes.

Protein Homology Modeling
Three LysM-containing kinase crystals were used as templates for the homology modeling of extracellular domains of the receptors: 5JCD of the Oryza sativa OsCEBiP chitin receptor (Liu et al., 2016), 4EBZ of the Arabidopsis thaliana AtCERK1 chitin elicitor receptor kinase (Liu et al., 2012), and 5LS2 of the Lotus japonicus LysM-containing protein (Bozsoki et al., 2017).
For European and Afghan alleles of the LykX and Sym10 proteins, the extracellular and transmembrane domains were predicted with three separate methods: SWISS-MODEL (Waterhouse et al., 2018), Iterative Threading ASSEmbly Refinement (I-Tasser) (Zhang, 2009), and Phyre2 (Kelley et al., 2015) algorithms. The structure of the Tajik allele of LykX was obtained by a single amino acid mutation (ARG75PRO) in the European model and then relaxed using the MM approach. For all five obtained models (3 for LykX and 2 for Sym10), we analyzed Ramachandran plots and RMSD deviations from the corresponding template geometry, and filtered out models with defective secondary structures of the LysM domains. Final models were prepared using Protein Preparation Wizard (Sastry et al., 2013;Schrödinger, 2021) and minimized in Maestro using OPLS3ext force field (Roos et al., 2019).

Protein-Protein Docking and Clustering of Dimers
For LykX-Sym10 pairs, we performed the protein-protein docking assay in the Piper package (Kozakov et al., 2006) and obtained 30 docking poses of dimers for each of the three pea alleles (European, Afghan, and Tajik).
Then, we clustered 90 obtained dimers (30 × 3) in the following way: using the Procrustes analysis (rotation and shift, without scaling; Rudemo, 2000), we placed all dimers into the comparable coordinates, so that LykX subunits of all dimers were matched in 3D. Positions of the remaining Sym10 subunits were different from each other, and we utilized this difference for clustering. For each pair of dimers, we calculated two measures of dissimilarity. For two dimers, the first measure was the angle formed by two lines drawn from the center of LykX toward two centers of Sym10. The second measure was the minimal correlation between X, Y, and Z coordinates of two Sym10 subunits. For each measure, we introduced a threshold: dimers were considered similar if the angle was lower than 45 and the minimal correlation was higher than 0.5. Then, we performed clustering based on the binary similarity matrix and found groups containing at least four pairwise similar dimers. Some of these groups had a non-empty intersection; hence, we merged them until the independent clusters were obtained. As was expected, several dimers were not clustered. For each cluster, we randomly chose a typical dimer and checked whether the mutual orientation of LykX and Sym10 domains and possible orientation of dimers to the cytoplasmic membrane were biologically justified.
Typical dimers for clusters were relaxed in an orthorhombic box for 50 ns in a TIP4P solvent by use of MD, Desmond-v5.4 package (Bowers et al., 2006) of Schrödinger 2018-2. The minimal initial distance between protein and simulation box border was 20 Å. Protein dimer poses obtained after trajectory clustering were employed for ligand docking assays. To identify binding energy between LykX and Sym10 proteins, the MM-GBSA approach was applied (Schrodinger, 2013).

Molecular Dynamics Parameters
Molecular dynamics was carried out in the Desmond-v5.4 package of Schrödinger 2018-2, using the TIP4P water solvent model; systems were neutralized by adding single-charged ions (Na + or Cl − ). In addition, 0.15 M of NaCl was added to emulate standard cytoplasmic ion concentration. MD properties were the following: ensemble class: NPT, thermostat method Nose-Hoover chain, barostat method Martyna-Tobias-Klein, relaxation Frontiers in Plant Science | www.frontiersin.org time 5 ps, temperature 300 K, pressure 1,013 bar, and interaction cutoff radius 9 Å. MD trajectories were clustered by the Desmond Trajectory Clustering package.

Stability of the Protein Dimers in the Cytoplasmic Membrane Model
As all three LykX-Sym10 dimers were stable in the solvent, we randomly selected one (the Afghan variant) to test its conformation stability in solvent together with the membrane. To assemble the membrane, transmembrane domains, and dimer into one complex, we performed the following steps. First, we modeled hydrophobic α-helices of LykX and Sym10 transmembrane domains and docked them to imitate the interaction between transmembrane domains in the LykX-Sym10 heterodimer. Second, we sewed subunits of LykX-Sym10 dimer to corresponding transmembrane domains via native peptide linkers through peptide bonds. Lastly, we applied MD in the solvent with the presence of the 1,2-dimyristoyl-sn-glycero-3phosphorylcholine (DMPC) full-atom membrane model for 100 ns. The protein-lipid complex was placed in an orthorhombic box with the minimal distance between protein and simulation box borders equal to 30 Å.

Ligand Structure Preparation
We considered four types of NF structures according to the possible repertoire produced by R. leguminosarum bv. viciae. The analysis of NF conformer structures and their stability was carried out hierarchically. First, molecular mechanics (MM) was employed to perform the initial screening of NF conformers using the MMFF94 force field (Halgren, 1996) as implemented in the Frog2 program package (Miteva et al., 2010), and the 100 lowest-energy conformers were selected for each type of NF molecule. Second, an initial assessment of each conformer's stability in aqueous medium was carried out using the primitive implicit solvent model (COSMO). Third, based on the assessments, each NF conformer was refined by semi-empirical electronic structure calculations at the PM6-DH2X level (Řezáč and Hobza, 2011) using the MOPAC2016 program package (Stewart, 2016). For each NF, the 15 most stable conformers were then used in the docking procedure and in thermochemical analysis.

Ligand Docking
For ligand docking, the Glide-v7.9 package of Schrödinger 2018-2 was applied (Friesner et al., 2004). We ran two flexible XP dockings for each previously obtained typical dimer; grid boxes were centered on #44 and #76 polymorphic amino acids of the LykX protein. The internal size of each grid box was 30x30x30 Å, and the external size was 50x50x50 Å. Intramolecular hydrogen bonds were rewarded, and stereochemical transitions in ligand rings were strictly forbidden. Docking results contained the 10 best docking poses for each ligand. Docking poses were clusterized using the interaction fingerprints method, and the best pose for each ligand was selected by Glide Emodel criteria. Protein-ligand complexes with the best docking energy (Glide E) were chosen for each protein dimer pose.

Relaxation and Analysis of Protein-Ligand Complexes
Sym10-LykX-NF complexes were placed in an orthorhombic box and relaxed for 300 ns using the MD approach. The minimal distance between protein and simulation box borders was 20 Å. Obtained MD trajectories were clusterized by the Desmond Trajectory Clustering method. The MM-GBSA method was applied before and after MD to identify binding energy between (i) LykX and Sym10 proteins and (ii) Sym10-LykX dimers and NF ligands.

Quantum Mechanical Parameters
Quantum mechanical gas-phase calculations were carried out using the ORCA program package (ver. 4.0.1; Neese, 2018). All structures were re-optimized using the higher-level composite RI-B97-3c method (Brandenburg et al., 2018). The energies of solvated structures were estimated by applying the hybrid COSMO-RS solvation model (Klamt and Eckert, 2004) using the CosmoTherm 16 program (Grimme, 2012) The COSMO-RS correction was applied to the DFT results obtained at the BP86/TZVP level of theory (Becke, 1988;Weigend and Ahlrichs, 2005) as a recommended procedure for the COSMO-RS calculations (Hellweg and Eckert, 2017).

Sequence Analysis
Ninety-five sequences of P. sativum LykX gene contained 23 polymorphic sites; the difference between two sequences was 4.7 amino acid positions on average. Among 23 sites, only four demonstrated polymorphisms between pea alleles: amino acids #44, #45, #75, and #76 ( Figure 1A). Some of these four positions may potentially be responsible for the difference in European and Afghan pea phenotypes. It should be noted that European and Tajik LykX alleles are more similar to each other than to the Afghan allele variant. For modeling, we took the most frequent LykX sequence within each of the three groups of alleles.
Analysis of Sym10 sequences revealed only one valineisoleucine polymorphism at #221 position, which was not specific for any pea allele, and valine-isoleucine substitution that was almost insignificant at both structural and chemical levels. This allowed us to conclude that the Sym10 gene sequence is universal for Afghan, Tajik, and European alleles, and guided us to use one Sym10 protein model in all further experiments.
A comparative table of polymorphisms in LykX and Sym10 amino acid sequences is presented in Supplementary Table S3.

Protein Model Reconstruction
As templates for modeling, we utilized three crystal structures of three plant chitin receptors from the PDB database: 5JCD, 4EBZ, and 5LS2. The similarity values between target protein sequences (LykX or Sym10) and all templates were higher than 18% (Supplementary Table 4), but the highest values Frontiers in Plant Science | www.frontiersin.org were observed for the 5LS2 template in all cases. Therefore, we used this template to model both LykX and Sym10. We performed modeling by three different servers: the I-Tasser method (threading), the SWISS-MODEL pipeline (homology modeling), and the Phyre2 algorithm (homology modeling and threading). Analysis of Ramachandran plots and βααβ motifs of LysM domains revealed that the I-Tasser server predicted the most appropriate structures of both LykX and Sym10 receptors. After fold recognition, all models were relaxed by the energy minimization approach in Schrödinger using the implicit GB/SA water solvent model.
We constructed LykX models of European, Afghan, and Tajik alleles separately and observed that amino acids polymorphisms in the variants had no significant impact on both secondary and 3D structures of LysM domains ( Figure 1C).

LykX-Sym10 Dimer Assembly
We performed protein-protein docking of LykX and Sym10 proteins for each of three P. sativum alleles; in each case, we obtained 30 configurations of LykX-Sym10 complexes. Then, 90 models (3 × 30) were clustered and four distinct clusters consisting of 39 dimers in total were found (Supplementary Figure 1). After that, clusters with biologically inadequate dimer subunit orientations were filtered out. Only one cluster (Supplementary Figure 1) had dimers that met all necessary conditions. This cluster contained dimers of all three pea alleles.
From the remaining cluster, we randomly chose dimers corresponding to each pea subpopulation (IDs: A01, T00, and E25 in Supplementary Data 1) and relaxed these variants in water using MD for 50 ns. Analysis of trajectories and Gibbs energy demonstrated the stability of the dimer that allowed us to consider this dimer as potentially existing in the solvent.

Membrane Model Emulation
The simulation trajectory contained two clusters. For both clusters, the RMSD from the initial geometry was around 6 Å, with flexible linker fragments providing the main contribution. The closest distance between transmembrane domains was 10.8 Å, which corresponds to the semi-bounded condition for transmembrane domains (Polyansky et al., 2012). The structure of the dimer was stable during the simulation, and moved closer to the membrane after the relaxation of linker fragments ( Figure 2B). Therefore, we can conclude that the obtained orientation of subunits in the LykX-Sym10 dimer may exist.

Protein-Ligand Docking
To determine the mechanism of NF reception, all four NF types were separately docked into three dimers corresponding to three P. sativum alleles. We considered a dimer to recognize a NF if the fatty acid was caught in the hydrophobic pocket (cave) between the LykX and Sym10 proteins. Otherwise, plant root cells could not distinguish NFs from chitin and chitosan oligomers and trigger the early symbiotic cascades instead of an immune response. We propose that the hydrophobic recognition of the NF fatty tail by the pocket occurs at the structural level. Therefore, any structural variability in the NF tail (i.e., chain length or double bond positions) can potentially have an impact on NF-dimer specificity. This conclusion is in line with previous results, where the correspondence between population diversities of the rhizobial nodA gene (coding for the acyltransferase involved in the fatty acid tail decoration of the NF) and plant heterodimeric receptor genes was demonstrated (Igolkina et al., 2019). Plausible mechanisms of this linkage, namely coordinated variability of amino acids in contact with the surface of the receptor dimer adjacent to the NF fatty tail, were proposed (Igolkina et al., 2018). Other earlier works (Debellé et al., 1996;Ritsema et al., 1996;Moulin et al., 2004;Wu et al., 2011) also showed that nodA gene variability relates to the bacterial host range specificity. Summarizing the above, we can assume that coordinated variability of the fatty tail structure and receptor regions interacting with the fatty tail is one of the possible ways to fine-tune host specificity in legume-rhizobial symbiosis.
Based on protein-ligand docking results, NF5Ac and NF4Ac were recognized by all three dimers; NF5NonAc and NF4NonAc were recognized only by Tajik and European dimers, respectively, while Afghan dimers did not recognize any non-acetylated NFs (Figures 2A,C). The total number of successful NF binding is eight. In each of these complexes, at least four NF-protein hydrogen bonds were formed, except T-NF5NonAc, in which only three bonds were detected (Supplementary Figure 2). LykX protein played the primary role in NF recognition, while Sym10 was involved in ligand reception only in T-NF(4,5)Ac and E-NF4NonAc complexes and seemed to act as a supporter kinase. We analyzed common amino acid residues contacting the NF and observed that Asn48 of the LykX protein was involved in NF recognition in at least five out of eight poses. At the same time, between-allele polymorphic positions in LykX -#44 and #45 -interacted with NF in the most poses. Polymorphic LykX residue #75 partly interacted with NF5Ac in Afghan and European dimers. This residue was also located in a close proximity to NFs in all other docking poses, influencing NF recognition indirectly. Some LykX amino acids were principal for the interaction with specific NF types, e.g., Asn59 took part in NF5Ac reception by both Afghan and Tajik dimers, and Thr47 was essential for NF4Ac recognition by European and Afghan dimers. The hydrophobic pocket for NF fatty tail perception was found to be formed mostly by Val37, Met38, Pro39, Ala40, Phe41, Leu42, Leu43, Tyr119, and Ala121 residues of LykX and Val217 and Phe218 residues of the Sym10 protein.

MD Relaxation of Protein-Ligand Complexes
Along the MD trajectories, all NF-LykX-Sym10 complexes stably interacted with NF in the following manner: the NF fatty acyl tail was located in a hydrophobic pocket between dimer subunits (except T-NF5NonAc), and the NF chitin core was abundantly hydrated, forming many water bridges to dimer residues ( Figure 2D).
The principal residues involved in NF reception before and after MD were similar (Supplementary Table 5). The Asn48 of the LykX protein was principal for NF recognition in all performed MD, hence we can consider it as one of the key residues in NF reception, along with #44 and #45 polymorphic residues. The role of #75 and #76 polymorphic residues in the LykX protein is less evident; they do not form contacts with NFs during MD, but could be critical in early stages of NF recognition, forming the steric landscape on the dimer surface. The Phe41 residue of LykX interacted with the fatty acyl group in all MD. The contribution of other amino acids (Tyr35, Met38, and Leu42) from the hydrophobic pocket was less pronounced and could vary.
To additionally prove the relevance of predicted NF-dimer complexes, we examined the possible mechanism of NF reception directly from the solvent. As previously demonstrated, the NF-dimer contact zone is composed of two different parts: hydrophilic contact with the NF core and hydrophobic contact with a fatty tail. However, the question remained as to what occurs first. We supposed that the primary contacts between NF and dimer are hydrophilic contacts, as they are (i) more probable in solvent and (ii) localized on the dimer surface. After the hydrophilic contacts are established, the NF fatty tail penetrates the hydrophobic pocket if their structures fit together. For this experiment, we placed NF5Ac close to the Tajik dimer surface at the previously discovered docking site nearby and MD was performed for 500 ns. We observed T-NF5Ac complex formation within 20 ns. During the first stage, the NF hydrophilic core was caught by a few separate non-stable hydrogen bonds and the fatty acid was pushed by the water molecules into the hydrophobic cave formed between both proteins. This led to the small rearrangement of LykX-Sym10 complex geometry and the effective recognition of the NF hydrophilic region by protein dimer (Supplementary Video). Thus, the principal role of LykX-Sym10 amino acid interface and the fatty acid in NF reception was proven.

Thermochemical Verification of Ligand-Receptor Complex Formation
During the formation of a ligand-receptor complex, NF changes its conformation in solvent from the free state into the binding pose. To estimate the thermodynamic benefit and principal opportunity of this transition, we computed the lowest energies of NF geometries in several states using the quantum mechanical calculations (Figure 3). The "stable" state describes the lowest energy NF conformation in gas (vacuum). The "active" NF state is obtained after docking with a dimer with further re-optimization in vacuum. For both states, we added solvent corrections, re-optimized NF conformations, and obtained "stable solvated" and "active solvated" states, respectively. We also introduced "bonded" energy, representing "active solvated" energy accounting for the docking energy (Figure 3). For each state, we calculated ∆G energy, set the "stable solvated" state as the reference point, and compared all other states with this reference point. As a result, we obtained ∆∆G energies reflecting energies of states subtracting ∆G ("stable solvated"). We then applied the following filtration criteria: negative "bonded" energy corresponds to potentially active NF molecules, while its positive value indicates impossible states. Based on these criteria, six out of eight successful docking poses were considered to be thermodynamically advantageous for forming ligand-receptor complexes. Two complexes containing acetylated NF with four Frontiers in Plant Science | www.frontiersin.org chitins in the backbone -E-NF4Ac and A-NF4Ac -were filtered out (Supplementary Figure 3).

DISCUSSION
We studied a possible source of variability in pea symbiotic phenotypes while interacting with rhizobia. In brief, (i) in addition to the European phenotype, the non-classical Afghanistan symbiotic pea phenotype was identified in 1978 (Lie, 1978); (ii) in 2003, it was first shown that legume receptors recognize rhizobial NFs working in pairs , and (iii) after a relatively long lag, a putative pair of NF receptors in pea -LykX and Sym10 -was suggested based on sequence comparative analysis (Sulima et al., 2017).
The proposed pair of receptors requires experimental confirmation, for example, by studying mutants of LykX (currently in progress by Zhukov V.) in interactions with Sym10 in the yeast two-hybrid system. However, this approach may only justify the involvement of the target gene in controlling the Afghanistan phenotype, but may not validate the direct interaction between the mutant LykX-Sym10 receptor and NFs. Experimental detection of the NF-LykX-Sym10 interaction is difficult because of challenges in isolating the receptor heterodimer in its natural form.
Due to the complexity of the system under study, molecular modeling is a promising way to test whether the LykX-Sym10 heterodimer is the NF receptor responsible for European/ Afghanistan symbiotic phenotypes. We considered four stages in the assembly of the NF-LykX-Sym10 complex and performed molecular modeling of each of them (Figure 3 We not only demonstrated that LykX-Sym10 is stable in solvent in the presence of a membrane, but also found that LykX-Sym10 heterodimers of both European and Afghanistan phenotypes have the same 3D configuration. The natural and non-deleterious amino acid polymorphisms in receptors should not significantly impact their mutual disposition in a correctly working dimer. Therefore, the observed match in dimer structures likely indicates that the obtained LykX-Sym10 configuration is correct. Moreover, this configuration corresponds to the previously proposed stable "sandwich-like" configuration (Igolkina et al., 2018), where a LysM domain of one subunit binds between LysM2 and LysM3 domains of another subunit. This 3D conservatism of the LykX-Sym10 structure could be a criterion for filtering legume receptor dimers in other modeling studies.
Analysis of NF docking poses in LykX-Sym10 revealed that in most cases (8 out of 12); the NF fatty acid tail was located in the hydrophobic pocket formed by the contact zone of two receptor subunits. This result is also in line with the previously elucidated stable configuration (Igolkina et al., 2018). Therefore, we considered these eight poses to be biologically justified. It should be noted that only non-acetylated NFs were involved in four unsuccessful docking poses.
For each of eight NF-LykX-Sym10 complexes we applied both MD and thermodynamic analysis, each playing the filtration role. It is important to emphasize that these steps are independent and answer different questions: (i) whether the complex is stable and (ii) whether the complex is possible, respectively. Sets of NF-LykX-Sym10 complexes, which were filtered out by each procedure, were not overlapped, which demonstrates that MD and thermodynamic calculations cannot be interchangeable steps.
During the thermodynamic calculations, we compared ΔG energies of NF in different states assuming that biologically active NF geometries correspond to stable states. The chemical structure of NF makes it very flexible, which leads to the plurality of conformations. We assumed that NF stable states could be biologically active if their free energies allow them to interact with the dimer. This assumption is in line with our observation about the conservatism of LykX-Sym10 configurations. Summarizing all together, we can speculate that the conservatism or stability of molecular structures is the important property for biological functioning, and should be taken into account during filtration of biologically justifying models.
Only five out of 12 NF-LykX-Sym10 complexes successfully passed all steps of the analysis (Figure 2C), and they precisely match the known pea symbiotic differences: peas of European phenotype can form a symbiosis with R. leguminosarum bv. viciae, producing both acetylated and non-acetylated NFs, while peas of Afghanistan phenotype react on only acetylated NFs. It is important to note that all the LykX-Sym10 heterodimer complexes considered here formed successful complexes with NF5Ac. Based on our analysis and match between estimated and observed pea phenotypes, we can conclude that the LykX gene is a suitable candidate gene for Sym2.
Our results are in line with the recent experimental study demonstrating the LysM1 domain in plant LysM-RLKs as a determinant in specific recognition and discrimination of NFs and chitin ligands (Bozsoki et al., 2017). Authors constructed chimeric Lotus receptors combining segments of NFR1 and CERK6 genes and tested nodulation activity of nfr1 loss-offunction mutants transformed with chimeric vectors. The presence of NFR1 LysM1 domain in a chimeric receptor induced nodule formation, and, additionally, the point mutations in the II and IV regions of the LysM1 domain caused the absence of symbiotic signaling both in native and chimeric proteins. In our study, all mutations in the LykX receptor segregating Afghanistan and European peas are also located in the II and in the IV regions of LysM1 domain (44, 45 and 75, 76 positions, correspondingly). Based on the modeling, the LysM1 domain of LykX receptor plays a premier role in the NF perception.
Our molecular modeling indicated some properties of the NF-LykX-Sym10 complex that allow us to hypothesize a reasonable idea of NF's role in initiating intracellular Frontiers in Plant Science | www.frontiersin.org phosphorylation cascade. LykX and Sym10 receptors belong to LysM-RLKs, which have three extracellular LysM domains, a transmembrane domain, an intracellular protein kinase domain, and, what is often without due attention, a linker sequence connecting LysMs and the transmembrane domain. Linkers in pea LYK and LYR receptors are unstructured (i.e., flexible) and quite long, and LykX and Sym10 are no exception. We performed the molecular dynamics (MD) of LykX and Sym10 with membrane and demonstrated that, due to of the linkers' structure, trajectories of the extracellular parts faintly relate to movements of transmembrane domains. Hence, a signal from the extracellular domains to intracellular ones is likely not transmitted mechanically. Therefore, initiation of the intracellular phosphorylation cascade probably occurs after the convergence and subsequent interaction of kinase domains. However, the LysM-RLKs receptors work in pairs and form dimers before NFs perception, as was demonstrated both experimentally (Kirienko et al., 2018) and in silico (Igolkina et al., 2018). The lifetime of dimers should be long enough to catch NFs, but short enough to prevent the false initiation. Thus, we hypothesize that the role of NFs is to specifically stabilize receptor dimers, providing enough time for interaction of kinase domains to run the downstream phosphorylation cascade. To extend this assumption to other legume species, we considered candidate genes to form the NF receptor heterodimer in Medicago truncatula (MtLYK3 and MtNFP genes) and L. japonicus (LjNFR1 and LjNFR5 genes). Then, we estimated their secondary structure with Jpred (Drozdetskiy et al., 2015) and assess the structure of linkers between LysM3 domains and transmembrane domains. The average length was 20aa, and in all cases, it was unstructured, which makes our proposal extendable to other species (the estimated secondary structures are in Supplementary Data).
In sum, molecular modeling has shown itself to be an effective way of functional analysis complementary to traditional approaches, and its results provide appealing directions for further studies in this area, such as determining functional meaning of minor components of NF mixtures produced by rhizobia, or reconstruction of the evolutionary history of signal systems in plant-microbe interactions.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.