A Comprehensive Analysis of Calmodulin-Like Proteins of Glycine max Indicates Their Role in Calcium Signaling and Plant Defense Against Insect Attack

The calcium (Ca2+) signaling is a crucial event during plant-herbivore interaction, which involves a transient change in cytosolic Ca2+ concentration, which is sensed by Ca2+-sensors, and the received message is transduced to downstream target proteins leading to appropriate defense response. Calmodulin-like proteins (CMLs) are calcium-sensing plant-specific proteins. Although CMLs have been identified in a few plants, they remained uncharacterized in leguminous crop plants. Therefore, a wide-range analysis of CMLs of soybean was performed, which identified 41 true CMLs with greater than 50% similarity with Arabidopsis CMLs. The phylogenetic study revealed their evolutionary relatedness with known CMLs. Further, the identification of conserved motifs, gene structure analysis, and identification of cis-acting elements strongly supported their identity as members of this family and their involvement in stress responses. Only a few Glycine max CMLs (GmCMLs) exhibited differential expression in different tissue types, and rest of them had minimal expression. Additionally, differential expression patterns of GmCMLs were observed during Spodoptera litura-feeding, wounding, and signaling compound treatments, indicating their role in plant defense. The three-dimensional structure prediction, identification of interactive domains, and docking with Ca2+ ions of S. litura-inducible GmCMLs, indicated their identity as calcium sensors. This study on the characterization of GmCMLs provided insights into their roles in calcium signaling and plant defense during herbivory.


INTRODUCTION
Plants being sessile are constantly attacked by various environmental cues that can cause abiotic or biotic stresses (Stotz et al., 2000). Among all, insect attack is crucial biotic stress, affecting the plants' fitness, performance, and productivity (Stotz et al., 2000;Conrath et al., 2002;Dodd et al., 2010;Hancock et al., 2015;Singh et al., 2018Singh et al., , 2020aSingh et al., ,b, 2021cChen and Mao, 2020;Keshan et al., 2021;Singh and Singh, 2021). To protect themselves, plants activate their defense mechanism and produce different defensive compounds like secondary metabolites, volatile organic compounds, protease inhibitors, and other antiherbivore chemicals (Gatehouse, 2002;Halitschke and Baldwin, 2004;Kliebenstein, 2004;Schuler, 2011;Vadassery et al., 2012b;Lortzing and Steppuhn, 2016;Singh et al., 2021a,c). This activation of biosynthesis of defensive compounds is governed by certain early events that facilitate the detection of herbivory and signal transduction. However, stimulus perception and signal transduction during herbivory is not understood well. Moreover, plants must discriminate among various environmental stimuli to trigger a specific response by activating downstream cellular signaling. Therefore, it is mandatory to determine the perception molecules, crucial players of signal transduction, and molecular mechanism behind plant defense against herbivory (Kong et al., 2013;Singh et al., 2016Singh et al., , 2020cKumar et al., 2020).
Plants recognize herbivore attack by wounding patterns and interaction of herbivore/damage-associated molecular patterns (HAMPs/DAMPs) with pattern recognition receptors (PRRs) (Mithöfer and Boland, 2008). This interaction activates appropriate downstream signal transduction pathways for achieving stimulus-specific responses. The process of coupling the perception of feeding herbivores to plant adaptive response involves transient changes in the intracellular Ca 2+ concentration (Maffei et al., 2004;Thor and Peiter, 2014). During the process, an increase in Ca 2+ levels occurs due to the transport of Ca 2+ across the membrane or from subcellular organelles, which is facilitated by plasma membrane Ca 2+ -permeable channels, such as cyclic nucleotide-gated channels (CNGCs), glutamate receptor-type cation channel (GLRs), stretch-activated Ca 2+ channel (OSCAs), and MID1-complementing activity (MCA) families and vacuolar channel TWO-PORE CHANNEL1 (TPC1) (Maffei et al., 2004;Dodd et al., 2010;Vadassery et al., 2012a;Kiep et al., 2015;DeFalco et al., 2016;Vincent et al., 2017;Meena et al., 2019;DeFalco and Zipfel, 2021). This transport of Ca 2+ causes spatial and temporal variations in cellular distribution, frequency, amplitude, kinetics of Ca 2+ , and intracellular Ca 2+ level leading to the generation of calcium signature, which encrypts information from primary stimuli to encode specific intracellular responses (Ranty et al., 2016). These calcium signals are decoded further by various calcium sensor proteins that consist of canonical Ca 2+ binding EF-hand motif with conserved helix-loop-helix structure coordinating one Ca 2+ (Lecourieux et al., 2002;Gifford et al., 2007). These proteins perceive and interpret Ca 2+ signals through binding to Ca 2+ ions leading to a change in their conformation and functions (Aldon et al., 2018). In plants, Ca 2+ binding proteins are represented by complex families. They can be either Ca 2+ responders, which convey the signal via enzymatic reactions, such as Ca 2+ -dependent protein kinases (CDPKs), or the non-catalytic sensor relay proteins, such as Calmodulins (CaMs), Calmodulin-like proteins (CMLs), and calcineurin B-like proteins (CBLs), which are activated upon binding to Ca 2+ , which promote downstream signaling (DeFalco et al., 2010).
Calmodulin like proteins are principal representative of Ca 2+ sensor proteins (DeFalco et al., 2010;Kumar et al., 2020) and are characterized by two to six EF-hand motifs and are known to evolve from ancestor CaM, sharing at least 15% amino acid identity (Perochon et al., 2011;Bender and Snedden, 2013). So far, CMLs are pointed out to be involved in the developmental process, plant immunity, and stress responses by targeting many kinases, ion transporters, metabolic enzymes, phosphatases, and transcription factors (Reddy et al., 2011). The EF-hand motifs show a high affinity toward cooperative binding with Ca 2+ that expose the hydrophobic surface, which allows them to interact with downstream target proteins (Vadassery et al., 2012a). CML8, CML37, CML38, and CML39 are responsive to drought, salinity, and hormonal treatment in Arabidopsis (Vanderbeld and Snedden, 2007;Park et al., 2010). CML9 is regulated by abscisic acid (ABA) and Pseudomonas syringae infection (Magnan et al., 2008;Leba et al., 2012). CML42 is known to cause aberration in trichomes (Dobney et al., 2009), and CML24 is involved in alterations in flowering time and ion stress (Delk et al., 2005;Hubbard et al., 2008). Plants possess a repertoire of CMLs, which are unique to them with 50 members in Arabidopsis (McCormack et al., 2005), 52 in Solanum lycopersicum (Munir et al., 2016), 19 in Lotus japonicas (Liao et al., 2017), and 62 in Vitis vinifera (Vandelle et al., 2018). Few CMLs of Arabidopsis thaliana (CML9,11,12,16,17,23,and 42) are upregulated upon employing oral secretion of Spodoptera littoralis (Vadassery et al., 2012a,b). CML37 and CML42 act antagonistically to regulate stress responses in Arabidopsis by altering phytohormone signals (Heyer et al., 2021). However, the involvement of these Ca 2+ sensors has not been investigated much in crop plants. To achieve more insights into the function of CMLs and recognize their involvement in plant defense against herbivory, it would be crucial to investigate them during soybean-Spodoptera litura interaction.
Glycine max, commonly known as soybean, is a dietary staple crop of Asian countries and a native legume plant of Southeastern Asia (Soyastats, 2010;Badole and Bodhankar, 2012;Badole and Mahamuni, 2013;Naresh et al., 2019). It consists of a balanced proportion of amino acids essential for the growth of the human body (Naresh et al., 2019). The genome of soybean is fully sequenced, and the processes of gene regulation, disease resistance, and nodulation are well defined (Gresshoff and Ferguson, 2017). However, insect pests can adversely affect the yield and quality of soybean (Cui et al., 1997). S. litura (common cutworm), a polyphagous insect, is one of the major destructive and widespread soybean pests throughout the Asia-Pacific region. Plants have employed various constitutive and induced defense strategies along with sophisticated signaling networks to combat predation by herbivores (Singh et al., 2021b). The induced responses are more effective for plants and lead to increased tolerance against herbivores (Stotz et al., 2000;Zhang and Yang, 2000;Fan et al., 2012). S. litura, a nocturnal moth and a serious polyphagous defoliator, infests soybean plants through the vigorous eating pattern of larvae (Singh et al., 2021b). The proteolytic activity of the gut serine protease of the larvae is mainly responsible for the significant damage to crops (Vasudev and Sohal, 2016). Therefore, it is essential to elucidate how S. litura-infestation induces defense signaling in G. max evoking the plant defense mechanism.
In the present study, a comprehensive investigation of CMLs of G. max was executed across the genome. We studied the genomic organization, evolutionary and phylogenetic relationships, gene structure analysis, and identification of EFhand motifs of Glycine max CMLs (GmCMLs). We also evaluated the mRNA levels of GmCMLs in different tissue types. The investigation on change in mRNA levels of GmCMLs during S. litura-infestation revealed the involvement of 36 GmCMLs in plant defense against herbivory. We also screened the promoter sequences of GmCMLs to identify cis-acting elements that revealed the presence of Jasmonic acid (JA)-response elements and salicylic acid (SA)-related elements. Further, transcript profiling of S. litura-inducible CMLs upon defense-signaling compound treatments was also executed, which revealed their differential expression. We also predicted the three-dimensional structure of S. litura-inducible GmCMLs and its interaction with Ca 2+ , which confirmed their identity and involvement in Ca 2+ signaling. This analysis has offered promising candidates that can function as an essential component of Ca 2+ signaling during G. max-S. litura interaction and can be valuable for deciphering the early events involved in the upregulation of plant defense in soybean and beyond.

Sequence Retrieval, Phylogenetic Analysis, and Similarity Search of Calmodulin-Like Proteins of Glycine max
The protein and gene sequences of CMLs of G. max were identified and retrieved from the soybean database 1 and phytozome database 2 . The DNA and peptide sequences of GmCMLs were evaluated manually utilizing BLASTN and BLASTP 3 . The base pair lengths and number of amino acids for each GmCMLs were retrieved from NCBI 4 .
To study the evolutionary relationships among model plants, monocot, and leguminous dicot, amino acid sequences of homologous GmCMLs from Medicago truncatula, Arabidopsis thaliana, and Oryza sativa were retrieved from Phytozome (see text footnote 2), TAIR 5 , and Oryzabase 6 , respectively. The amino acid sequences were then submitted to multiple sequence alignment using Clustal Omega 7 (Madeira et al., 2019). The generated multiple sequence alignment was subjected to "construct" phylogenetic tree. The details of constructed phylogenetic tree were downloaded using "Download" phylogenetic data and saved in newick format. For better presentation of the phylogenetic tree, the saved phylogenetic tree was uploaded on iTOL 8 (Letunic and Bork, 2021).
The identity of GmCML peptide sequences was compared with the protein sequence of CMLs of Arabidopsis thaliana using the BLAST engine of the TAIR database 9 .
Expression Analysis of Glycine max CMLs During Spodoptera litura-Infestation

Plant Growth
The soybean seeds (Pusa 9712) were washed, soaked for 2 h, and kept at 26−27 • C in a dark chamber between 2 layers of tissue paper, with continuous moistening with distilled water for germination. After 3-4 days, the germinated seedlings were transferred to the sterilized soilrite containing pots. The seedlings were grown in a plant growth chamber under controlled conditions (photoperiod: 16 h light and 8 h dark; temperature: 26−27 • C; humidity: 55-60%) and watered regularly.

Rearing and Maintenance of Spodoptera litura
The larvae of S. litura were procured from ICAR -National Bureau Of Agricultural Insect Resources, Bangalore, India, and reared in the laboratory using standard protocols at 27 • C and 65-70% relative humidity on a 14/10 h light/dark cycle (Singh et al., 2021b). The freshly molted fourth instar larvae were starved for 12 h before releasing them on plants for experimentation and bioassays.

Spodoptera litura-Infestation and Mechanical Wounding on Soybean Plants and Sample Collection
For S. litura-infestation, one-month-old soybean plants were exposed to one larva per plant. Plants without any treatments were served as control. Mechanical wounding was performed according to Singh et al. (2008). The entire shoot of treated (sample) and untreated (control) plants, after 1 h of infestation/wounding from three independent experiments, was harvested and immediately snap-frozen at −80 • C for further analysis.

RNA Isolation and cDNA Synthesis
Trizol (TRI) reagent (Sigma Aldrich, Spruce Street, St. Louis, MO, United States) was used to isolate RNA as per the given protocol (Singh et al., 2018(Singh et al., , 2021bKeshan et al., 2021). The RNA concentrations were measured on a spectrophotometer, and RNA integrity was checked by agarose gel electrophoresis. The isolated RNA was used to synthesize first-strand cDNA using an iScript cDNA synthesis kit (Bio-Rad, Hercules, CA, United States).

Primer Design and Gene Expression Analysis Using qPCR
Primer 3 (v.0.4.0) 10 tool was used to design real-time (RT) PCR primers. Applied Biosystems TM 7500 RT PCR System (Thermo Fisher Scientific, Waltham, MA, United States) was used to perform RT-qPCR. Each 10 µl reaction mix contained 1 µl of cDNA samples, 0.5 µl of 10 µM gene-specific primers, and 5 µl of SYBR green supermix (Bio-Rad, Hercules, CA, United States). The soybean elongation factor gene was used as an internal control. The qPCR data were analyzed utilizing the 2 − CT method. The 2 − CT values obtained by qPCR were transformed to log 2 to lessen noise level. For statistical analysis, ANOVA and Tukey's test (P < 0.05) were applied and a criterion of greater than two-fold induction/reduction level was taken into consideration to select differentially expressing genes according to a previous report (Singh et al., 2008). The heat map was generated by Graphpad Prism Software 11 using qPCR data.

Expression Analysis of Glycine max CMLs Upon Treatment With Signaling Compounds
For treatment with signaling compounds, an equal volume of 100 µM of methyl-jasmonate, 50 µM ethephon, 5 µM salicylic acid, and only water/0.5% (v/v) ethanol in water (control) were sprayed on healthy soybean plants according to the previous studies (Stotz et al., 2000;Singh et al., 2008). The plants were maintained under the same conditions but in individual enclosures. The leaves of treated (sample) and untreated (control) plants were harvested after 1 h of treatment in biological triplicates and immediately snap-frozen at −80 • C for further analysis. For gene expression analysis, qPCR for 36 GmCMLs (differentially expressed on S. litura-infestation) was performed, and the data were analyzed using the 2 − CT method. For statistical analysis, ANOVA and Tukey's test (P < 0.05) were applied and a criterion of greater than two-fold induction/reduction level was taken into consideration to select differentially expressing genes. The heat map was generated by Graphpad Prism Software (see text footnote 11) using qPCR data.

In silico Analysis of True Glycine max CMLs
The subcellular localization of different GmCMLs was predicted using WoLF PSORT 12 . The molecular weight and isoelectric point of each GmCMLs were computed using the "Compute pI/MW tool" of ExPASy 13 . The presence of EF-hands was checked using the InterPro database 14 . The MEME software 15 was used to detect and generate logo plots of the EF-hands motifs.

Exon-Intron Determination
The organization of exons and introns in GmCMLs was deciphered using Gene Structure Display server GSDS2.0 16 (Hu et al., 2015).

Tissue-Specific Gene Expression Profiling
The expression profile values [Reads/Kb/Million (RPKM) normalized data] for different tissues, that is, flower, young leaf, green pods, root, nodule, stem, and seed, were retrieved from SoyBase Expression Explorer 17 (Waese et al., 2017). The heat map was illustrated using RPKM values for tissue-specific expression profiling of GmCMLs on Graphpad Prism Software (see text footnote 11).

Identification of microRNA (miRNA) Targets of Glycine max CMLs
Plant small-RNA target (psRNATarget) analysis server was used to predict miRNA target sites in GmCMLs transcripts with default parameters. G. max-specific mature miRNA sequences were downloaded from the miRNA database (miRBase) 18 (Bhatia et al., 2019).

Interaction of Glycine max CMLs With Other Proteins
To deduce the direct and indirect interaction of GmCMLs with other proteins, the sequences of all GmCMLs were subjected to the String database 19 (Szklarczyk et al., 2021). The GmCML protein sequences were submitted as multiple proteins with organisms selected as G. max. The submission of sequences redirected to a new window, where proteins matching with input sequences along with identity, bitscore, and e-value were listed. Protein interaction list was saved by clicking on "Mapping", and further to find interaction networks proceeded with "continue."

Detection of Regulatory Elements in the Promoter Region of Glycine max CMLs
To detect the regulatory elements present in the 5 upstream region of GmCMLs, 1 kb of the genomic DNA sequences before the initiation codon (ATG) of each GmCMLs were saved from NCBI and subjected to Plant Care analysis 20 (Lescot et al., 2002).

Three-Dimensional Structure Prediction of Selected Glycine max CMLs by Homology Modeling
The homology modeling of S. litura-inducible GmCMLs was conducted to predict their three-dimensional structure. The selected S. litura-inducible GmCMLs were used as query sequences for homology modeling and searched against the Protein Data Bank (PDB) to identify the closely related known protein sequence. The data were submitted to the Phyre 2 Protein Homology/Analogy recognition Engine http: //www.sbg.bio.ic.ac.uk/$\sim$phyre2/html/page.cgi?id=index to construct the protein 3-D structure by homology modeling under intensive modeling mode (Kelley et al., 2015). Further, to check the stereochemical quality of protein tertiary structures, the PROCHECK server was used http://saves.mbi.ucla.edu/ (Laskowski et al., 1996).

Interaction of Selected Glycine max CMLs With Ca 2+
To show the interaction between Ca 2+ and GmCMLs, the InChI format of Ca 2+ was derived from PubChem 21 (Kim et al., 2019). For docking studies, the InChI format was converted to.pdb using OpenBabel-Chemical file format converter 22 (O'Boyle et al., 2011). The interaction studies were performed for true GmCMLs that were differentially regulated by S. litura-infestation using PATCHDOCK https://bioinfo3d.cs.tau.ac.il/PatchDock/php.php (Schneidman-Duhovny et al., 2005). Different parameters used for docking studies were clustering root-mean-square deviation (RMSD) = 4, and the complex type is protein and small ligand. The top ten Patchdock docking results were further refined using FireDock 23 (Mashiach et al., 2008

RESULTS
Genome-Wide Survey, Phylogenetic Study, and Similarity Search Identified True Glycine max CMLs in Glycine max Genome A total of 144 CMLs were predicted in the soybean genome by Zeng et al. (2017) in a preliminary study. The protein sequences of these predicted CMLs were procured and named as GmCMLs. Phylogenetic analysis of GmCMLs with CMLs of other plants, including monocots and leguminous dicots was executed, which showed evolutionary relatedness among them. This analysis revealed that GmCML101, 118, 98, and 100 shared ancestry. GmCML113, 167, 71, and CML25 from M. truncatula and CML28 and CML29 from A. thaliana diverged recently FIGURE 1 | The phylogenetic analysis of CMLs of Glycine max, Medicago truncatula, Oryza sativa, and Arabidopsis thaliana. The alignment for phylogenetic tree was performed with Clustal Omega using full-length protein sequences. The phylogenetic tree was constructed using iTOL. All the Glycine max CMLs (GmCMLs) are presented with black, Medicago truncatula CMLs are presented with green, Oryza sativa CMLs are presented with blue, and Arabidopsis thaliana CMLs are presented with red. The bootstrap values are provided in the mid of the nodes. (Figure 1). Further, the GmCMLs were compared with CMLs of A. thaliana, revealing that 41 CMLs showed 50% or more sequence similarity with CMLs of A. thaliana. Sixteen of the GmCMLs showed higher sequence similarity (85-70%), whereas 25 GmCMLs showed more than 50% sequence similarity with CMLs of A. thaliana. The rest of the CMLs (103 GmCMLs) exhibited less than 50% homology, and 3 of them did not match with any of the CMLs of A. thaliana. With an identity of 50% as a FIGURE 2 | Heat map showing expression profile of GmCMLs upon S. litura-infestation and wounding. The expression profile of GmCMLs upon S. litura-infestation and wounding is represented as heat map, generated with the help of Graphpad Prism software using log 2 values of expression data obtained from qPCR. The qPCR data were analyzed utilizing 2 − CT method and statistical analysis was performed using ANOVA and Tukey's test (P < 0.05).

Differential Expression of Glycine max CMLs During Spodoptera litura-Infestation and Wounding
To gain information on the involvement of GmCMLs in plant defense against S. litura-infestation, the expressions of true 41 GmCMLs were checked using qPCR with gene-specific primers (Supplementary Table 2), and the results indicated that 34 FIGURE 3 | Heat map showing expression profile of GmCMLs on application of JA, SA, and ET. The expression profile of GmCMLs on application of JA, SA, and ET is represented as a heat map, generated with the help of Graphpad Prism software using log 2 values of expression data obtained from qPCR. The qPCR data were analyzed utilizing the 2 − CT method and statistical analysis was performed using ANOVA and Tukey's test (P < 0.05).

Differential Expression of Glycine max CMLs Upon Foliar Application of Signaling Compounds
To recognize the role of defense-related signaling compounds in the regulation of S. litura inducible-GmCMLs, expression analysis was performed with plants treated with JA, SA, and ethylene (ET) by qPCR. GmCMLs 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, and 12 showed induced expression in all four treatments. The JA application also induced the expression of GmCMLs 14, 17, and 29. The ethylene upregulated the expression of 13 of the GmCMLs, and eight of them were commonly upregulated by JA and ethylene.
In addition, eleven of the GmCMLs were upregulated by SA (Figure 3; Supplementary Figure 2).

Subcellular Localization, Molecular Weight, Isoelectric Point, and EF-Hand Analysis of Glycine max CMLs
The subcellular localization study using in silico tools predicted GmCMLs to be located mainly in the nucleus, cytosol, or chloroplast. Only a few of them were predicted to be located on the plasma membrane. The GmCMLs ranged from 229 (GmCML22) to 102 (GmCML38) amino acids, and their molecular weight ranged between 11.26 and 25.85 KDa. GmCML38, 32, and 11 were smallest by molecular weights (11.26 KDa), whereas GmCMLs 22 and 25 were of the largest molecular weight (25.85 KDa). The isoelectric point values of the GmCMLs ranged from pI 3.94 to 6.84, where GmCML2 had a minimum isoelectric point (pI 3.94) and GmCML80 had a maximum isoelectric point (pI 6.84) ( Table 1).

GmCMLs Are Largely Intronless/Carry 1-3 Introns and Share EF-Hand Signature Sequence and Other Common Motifs
The intron and exon display analysis of GmCMLs revealed that a larger number of these CMLs were intronless. Among the studied 36 GmCMLs, 22 GmCMLs coding genes were intronless, whereas the rest of the GmCMLs had one to three introns. GmCMLs 1, 2, 3, 4, 7, 8, 9, 10, 14, 15, and 16 have four exons and three introns (Figure 4). Additionally, prediction of the conserved domain revealed that a couple of motifs were conserved among the members of GmCML family proteins. Many of the GmCMLs (30 GmCMLs) had four EF-hands but GmCML 32, 76, and 117 had three EF-hands and GmCML11, 38, and 137 had only two EF-hands ( Figure 5). All GmCML proteins contained the EELKEAFKVFDKDGBGYISASELRHVMRSLGEKLTDEEVEQ, EAFSLFDKBGDGCITVEELATIJRSLGQNPTEEE, and MIKEA DLDGDGQVBYEEFVKM motif confirming the presence of EF-hand signature sequence.

Members of Glycine max CMLs Family Differ in Their Tissue-Specific Expression Patterns
To identify the genes involved in distinct regulatory functions controlled by specific tissue, tissue-specific gene expression data were procured. The heat map was generated using RNA-seq data to examine the tissue-specific expression profile at the basal level (Figure 6; Supplementary Table 3). This analysis revealed that the relative expression level of GmCMLs 5, 6, and 40 was higher in all tissue types. GmCMLs 6, 5, 40, 28, 17, 80, 1, 4, and 29 showed their higher expression in stem. GmCMLs 5, 6, and 17 showed expression in seed and GmCMLs 6, 5, 40, The heat map has been generated using Graphpad Prism software. Expression data of GmCML011, GmCML018, GmCML025, GmCML038, and GmCML077 were not available.

Members of Glycine max CML Family Are Putative Targets of Known microRNAs
The miRNA target analysis (using psRNATarget) revealed 33 GmCML transcripts as putative targets of 125 miRNAs ( Table 2).

Members of Glycine max CMLs Family Have an Overlapping Protein Network
To deduce the complex interplay of GmCMLs with other proteins, network and enrichment analysis of 36 selected GmCMLs was performed using the STRING database. The studies revealed that each GmCML corresponded to a particular string Id. Based on the STRING analysis, a protein protein interaction (PPI) network complex was constructed. The expected number of edges of the interaction network was 6, the average node degree of the network was 4, the average local clustering coefficient was 0.986, and the protein enrichment p-value was < 1.0e-16. The predicted results showed 40 nodes and 80 edges. Out of 36, 25 of the GmCMLs had shown their involvement as calcium-binding protein having EF-hand, where GmCML1, GmCML2, and GmCML15 had shown very high similarity with SCAM-4, SCAM-5, and Calmodulin, respectively, which were reported to be involved in plant-defense against pathogen attack (Park et al., 2004). Interestingly, the studies have shown the involvement of GmCML5 and GmCML6 in nodulation of soybean roots. Most of the GmCMLs showed to interact via GLYMA07G38930.2 (spindle-like microcephalyassociated protein isoform) and GLYMA17G01810.2 (spindle-like microcephalyassociated protein isoform). Overall, it was predicted that 24 GmCMLs have their role in plant-pathogen defense signaling and 5 of them showed their role in the phosphatidylinositol signaling and MAPK pathway (Figure 7).

In silico Promoter Analysis Identified Multiple Hormones and Stress-Responsive Cis-Elements in Promoter Sequences of Glycine max CMLs
A broad range analysis to decipher which cis-elements were present in the promoter regions of true GmCMLs, 1 kb 5 UTR regions of each of the GmCMLs were subjected to Plant care analysis that predicted the presence of 16 different cisregulatory elements involved in biotic or abiotic stresses. Most of the GmCMLs promoters were predicted to harbor more than one phytohormone-related cis-regulatory elements, such as ABA (ABRE), JA (G-box, Myc binding JA response element), SA (TCA-element), Ethylene (ERE), GA (P-Box and GARE-motifgibberellin responsiveness), and biotic and abiotic stress-related cis-elements like ARE (Anaerobic induction), wounding, Mybbinding site involved in drought-inducibility (MBS), W-boxstress inducible and LTR (a low temperature)-responsive element, TC-rich repeats -defense and stress response (Figure 8).

Molecular Docking Indicates That Members of Glycine max CML Family Proteins Interact With Ca 2+
The study of the interaction of Ca 2+ with GmCMLs could be crucial for understanding their mechanism of action. The tertiary structure of GmCMLs was modeled using Phyre 2 , and the structures were validated by keeping a threshold value of >90% region of protein being present in the allowed region of the Ramachandran plot. The confidence level of modeled structures was more than 99%. The classic EF-hands consisting of a typical helix-loop-helix domain were detected and highlighted (Figure 9).

DISCUSSION
Plants have to encounter numerous challenges ranging from biotic to abiotic factors during their lifespan. Therefore, they have evolved sophisticated defense mechanisms regulated by recognizing the specific type of stress and stimulation of certain early signaling events to defend themselves. One such crucial early event is the generation of Ca 2+ ion pulses due to the elevation of influx and efflux, which acts as a cellular signal in response to external stimulus (Vadassery et al., 2012a,b). Calcium sensor proteins, such as CMLs, are essential components of Ca 2+ signaling (DeFalco et al., 2010). We have only a little information about the involvement of CMLs in plant defense against herbivory in crop plants. Although fewer studies on the EF-hand possessing protein families, CaM, CDPK, CMLs, and CBLs (Weinl and Kudla, 2009;DeFalco et al., 2010;Poovaiah et al., 2013) have been taken into consideration in a couple of plants, such as Arabidopsis thaliana, Oryza sativa, Grapevine, Nicotiana benthamiana, and Solanum lycopersicum but significantly less information are available about CMLs in legumes (McCormack and Braam, 2003;Asano et al., 2005;Boonburapong and Buaboocha, 2007;Chen et al., 2013;Kong et al., 2013;Kleist et al., 2014). Therefore, to characterize the CMLs of soybean and figure out their role in plant defense, the GmCMLs were subjected to phylogenetic analysis with CMLs of Medicago truncatula, Arabidopsis thaliana, and Oryza sativa, which revealed that many of the GmCMLs were grouped with CMLs of Arabidopsis and Medicago. They shared ancestry, which confirmed their evolutionary relationship. However, when GmCMLs were subjected to the BLAST search engine of TAIR against the known EF-hand protein of A. thaliana, only 41 GmCMLs had more than 50% sequence similarity. Expression analysis of true GmCMLs during S. litura-infestation was investigated, and the results demonstrated that many of these GmCMLs (36) were responsive to herbivory. Out of 41 genes, 34 of the GmCMLs were upregulated, while 2 of the GmCMLs were downregulated. Previous studies have reported that CMLs control signaling FIGURE 7 | Protein-protein interaction network analysis. Protein-protein interaction network and enrichment analysis showing the complex interplay among GmCMLs as predicted from the STRING database. The proteins colored in red are showing their potential involvement in plant-pathogen interaction, the proteins in blue color are predicted to be involved in phosphatidylinositol signaling, and the proteins in green color are predicted to be involved in MAPK signaling pathways. events upregulated during mechanical wounding and insect attack. It was observed that in A. thaliana, CML9,11,12,16,17,23, and 42 genes were upregulated upon treatments with the oral secretion of larvae of the herbivorous insect S. littoralis (Vadassery et al., 2012a). In comparison to herbivory, 24 GmCMLs were upregulated 5 GmCMLs were downregulated by mechanical damage, and 8 of the GmCMLs showed induced expression exclusively during S. litura-infestation. These data suggested that wounding and herbivory had a different transcript pattern on account of the presence of insect-derived elicitors in the oral secretion during herbivory. The distinction in the transcript patterns during mechanical damage and insect attack has been detected in previous studies (Reymond et al., 2000(Reymond et al., , 2004Singh et al., 2008). Wounding stimulated plant defense similar to herbivory; however, mechanical damage-induced responses were not the same as those activated by herbivory.
Signaling compounds, such as JA, ET, and SA, function as essential plant defense compounds against herbivory. Most of the herbivores trigger the jasmonate/ethylene pathway and the salicylate pathway (Hammerschmidt and Smith-Becker, 1999;Reymond et al., 2004). To recognize the role of defense regulators, change in the expression of GmCMLs was checked upon treatments with JA, SA, and ET. JA changed the mRNA levels of 14 of the S. litura-inducible GmCMLs, demonstrating the involvement of JA in the regulation of expression of GmCMLs.
JA-responsive cis-regulatory elements, TGACG and CGTCA, were observed in the promoter regions of GmCMLs 1, 3, 9, 14, 7 and 1, 3, 9, 14, 17, respectively, which further confirmed their regulation by JA. Ethylene upregulated the expression of 13 of the GmCMLs, and eight of them were commonly upregulated by JA. ET and JA were reported to induce defense genes synergistically in Arabidopsis (Kessler and Baldwin, 2002;Singh et al., 2008). In addition, eleven of the GmCMLs were upregulated by SA, indicating the role of SA in regulation of expression of GmCMLs either directly or through a crosstalk. The presence of SAresponsive cis-regulatory element in the promoter of GmCML 8 further confirmed their regulation by SA.
The exon-intron distribution study predicted the presence of zero to three introns in GmCMLs, indicating critical evolutionary changes in the G. max genome. There were only 11 GmCMLs that carried 1-3 introns, and 22 GmCMLs were intronless. The presence of few/no introns indicated their ability to get transcribed quickly to facilitate early defense response in the host plant during stress (Keshan et al., 2021).
The subcellular localization studies predicted the presence of CMLs in the nucleus, cytosol, and chloroplast mainly. Previous studies on CMLs had reported their localization in the nucleus and cytosol in Arabidopsis (Inzé et al., 2012;Vadassery et al., 2012a). The lower values of isoelectric points revealed that GmCMLs could be acidic, making them highly hydrophilic. The motif analysis exhibited 1 to 4 EF-hand potential motifs in most of the GmCMLs, indicating their potential identity as calciumbinding proteins (Boonburapong and Buaboocha, 2007;Kong et al., 2013;Kleist et al., 2014;Munir et al., 2016). Further, identifying potential miRNA target sites in GmCMLs divulged the dynamic roles of miRNA in post-transcriptional regulation of these genes in response to normal or stressed conditions (Bhatia et al., 2019).
The promoter of every gene possesses regulatory elements that act as binding sites for transcription factors. These promoter elements govern the differential expression of a gene in a tissue-specific manner at different growth stages and during environmental stresses. Several studies have divulged a direct connection between gene expression and promoter elements present in their 5 upstream region (Lata et al., 2014). Therefore, cis-elements present in the promoter regions of all true GmCMLs were investigated using in silico tool, which predicted the presence of 16 different cis-regulatory elements involved in biotic or abiotic stresses, indicating their role in plant responses during stress and their involvement in signal transduction via ABA, SA, and JA.
The analysis of measuring mRNA abundance of genes belonging to a family in a tissue-specific manner will allow in identifying the genes involved in the development or regulatory pathways associated with a particular tissue type. The expression data of the GmCML gene family in different organs exhibited divergence in their function. The transcript profiling of GmCMLs during several developmental stages in different organs revealed that most GmCMLs do not express vegetatively. They are inducible to exhibit a particular response, whereas some GmCMLs, such as GmCML5, GmCML6, and GmCML40, expressed highly in flower, root, pod, pod shell, young leaf, and seed, depicting their role in growth and development. Nine of the GmCMLs exhibited higher expression in the stem, illustrating their potential role in shoot development. Three of the GmCMLs showed their expression in seed, and six showed their expression in the nodule, indicating their prospective role in seed and nodule formation. Twelve GmCMLs showed higher expression in root, signifying their possible function in root development and plant-microbe interaction in soil. Seven of the GmCMLs exhibited higher expression in the leaf, specifying their role in leaf development and plant defense against pests and pathogens. Six of the GmCMLs were expressed abundantly in flower and pod, depicting their role in reproduction.
To deduce the complex interplay of GmCMLs with other proteins, network and enrichment analysis of S. liturainducible 36 GmCMLs were performed with the STRING database. The PPI network complex indicated that out of 36 GmCMLs, 24 GmCMLs were predicted to play a role in plantpathogen defense signaling; 5 of the GmCMLs (GmCML16, GmCML8, GmCML15, GmCML116, and GmCML117) could be involved in the phosphatidylinositol signaling system FIGURE 9 | Interaction of GmCMLs with Ca 2+ using molecular docking and their comparison with interaction of Calmodulin 2 (A. thaliana) with Ca 2+ . The figure shows the modeled tertiary structures of GmCMLs and their interaction with Ca 2+ , wherein the yellow balls indicate Ca 2+ and the yellow-colored segments in the model represent EF-hands. The number of Ca 2+ showing interaction with GmCMLs varied according to the number of EF-hands. The global docking energy score of interaction between 26 GmCMLs with Ca 2+ is shown to have near to global docking energy score of interaction between Calmodulin 2 of A. thaliana with Ca 2+ . The tertiary structure of A. thaliana is shown in gray color and the green balls indicate Ca 2+ . and MAPK signaling system. The plant defense signaling, phosphatidylinositol signaling system, MAPK pathway, and the components of these pathways, such as SCAM-4, SCAM-5, CaMEKK, CaZIK, CaRAF, PLA, PLC, and PLD had been reported to be involved in plant defense against biotic stresses (Park et al., 2004;Hung et al., 2014;Keshan et al., 2021). The above results are FIGURE 10 | Schematic representation of herbivory-induced early signaling events (calcium signaling, ROS generation, activation of MAPK Pathway) and involvement of CMLs in plant defense response against insect attack. Upon herbivory, plants perceive stimuli and instigate a couple of early events, such as activation of calcium signaling, induction of mitogen-activated protein kinase pathway, and generation of reactive oxygen species, which transduce the signal and activate the biosynthesis of other signaling and defensive compounds. In soybean, the expression of few GmCMLs, one of the crucial Ca 2+ sensors, is upregulated during S. litura-infestation. GmCMLs have Ca-EF hands, which allow them to sense calcium signature and bind with Ca 2+ , which can further regulate downstream signaling and activate plant defense response.
prediction-based and should be perceived with caution. Further empirical analyses and experimentation would be required to assess whether these interactions are physiologically relevant.
The interaction of CMLs with Ca 2+ is crucial for their biological activity. To examine the interaction between Ca 2+ and GmCMLs, the tertiary structure of S. litura-inducible GmCMLs was predicted, which demonstrated a difference in the modeled structures of GmCMLs, mainly due to the change in their amino acid sequences. This diversity in their structures could allow them to perform a specific function. However, all the modeled structures of GmCMLs had shown the EF-hand helix-loop-helix motif as described in Calmodulin of A. thaliana, which could facilitate their interaction with Ca 2+ . The molecular docking was performed to investigate the interaction of GmCMLs with Ca 2+ and their global energy score of interaction was compared with the global energy score of interaction between AtCAM2 with Ca 2+ . The results revealed that the global energy score for interaction between GmCMLs with Ca 2+ ranged from −2.5 to −4.5 (for 26 of the GmCMLs), which is comparable to AtCAM2 with Ca 2+ interaction. Thus, in silico interaction study predicted that calcium ions interacted with GmCMLs, particularly with EF-hand motifs, indicating that GmCMLs could be Ca 2+binding proteins and could be involved in Ca 2+ signaling. Further experimentation would be required to confirm whether CMLs are undergoing any conformational change to induce downstream signaling.

CONCLUSION
Plants encounter various environmental stresses, including insect attack during their life span. To defend themselves, plants have developed sophisticated signaling pathways that activate the biosynthesis of defensive proteins and metabolites. Among a couple of early events that instigate defense signaling, a transient change in cytosolic Ca 2+ concentration is observed during herbivory. This transient change in Ca 2+ concentration is identified by Ca 2+ sensors, which transduce the information to downstream target proteins leading to appropriate defense response. The role of Ca 2+ sensors during herbivory has been studied in model plants, but not in leguminous plants, such as G. max. This study identified 41 true GmCMLs, an important Ca 2+ -sensor, in soybean. The phylogenetic analysis revealed their similarity with already known CMLs from model plants. The presence of potential EF-hands further supported their identification as Ca 2+ -sensors. The gene structure determination of GmCMLs indicated the presence of no/few introns revealing their capability to change their expression quickly during recognition of a stimulus. The differential expression of GmCMLs in various tissues signified their involvement in growth and development. Alteration in the transcript patterns of the novel GmCMLs during S. liturainfestation suggested their significant involvement in plant defense signaling during herbivory (Figure 10). Comparative expression profiling during mechanical damage showed that wounding-regulated responses are not the same as observed during herbivory. Signaling compounds, such as JA, ET, and SA, showed their involvement in regulating the expression of S. litura-inducible GmCMLs. Network and enrichment analysis of GmCMLs with other proteins indicated their role in plant defense, phosphatidylinositol signaling system, and MAPK signaling system. The study on the interaction between calcium and GmCMLs showed their identity as Ca 2+ -binding proteins and their participation in signal transduction during Ca 2+ signaling (Figure 10). The results of this study can be extended to confirm the role of calcium signaling in triggering the plant defense mechanism against S. litura-infestation. S. liturainducible GmCMLs are prospective candidates for being critical components of calcium signaling during pest attacks. Thus, the structural and functional characterization of S. liturainducible GmCMLs utilizing contemporary techniques like RNAi, overexpression, and/or gene editing is highly recommended to delineate the calcium signaling pathway and its role in plant defense during soybean-S. litura interaction.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.