Molecular Characterization of Magnesium Chelatase in Soybean [Glycine max (L.) Merr.]

Soybean (Glycine max) seed yields rely on the efficiency of photosynthesis, which is poorly understood in soybean. Chlorophyll, the major light harvesting pigment, is crucial for chloroplast biogenesis and photosynthesis. Magnesium chelatase catalyzes the insertion of Mg2+ into protoporphyrin IX in the first committed and key regulatory step of chlorophyll biosynthesis. It consists of three types of subunits, ChlI, ChlD, and ChlH. To gain a better knowledge of chlorophyll biosynthesis in soybean, we analyzed soybean Mg-chelatase subunits and their encoding genes. Soybean genome harbors 4 GmChlI genes, 2 GmChlD genes, and 3 GmChlH genes, likely evolved from two rounds of gene duplication events. The qRT-PCR analysis revealed that GmChlI, GmChlD, and GmChlH genes predominantly expressed in photosynthetic tissues, but the expression levels among paralogs are different. In silicon promoter analyses revealed these genes harbor different cis-regulatory elements in their promoter regions, suggesting they could differentially respond to various environmental and developmental signals. Subcellular localization analyses illustrated that GmChlI, GmChlD, and GmChlH isoforms are all localized in chloroplast, consistent with their functions. Yeast two hybrid and bimolecular fluorescence complementation (BiFC) assays showed each isoform has a potential to be assembled into the Mg-chelatase holocomplex. We expressed each GmChlI, GmChlD, and GmChlH isoform in Arabidopsis corresponding mutants, and results showed that 4 GmChlI and 2 GmChlD isoforms and GmChlH1 could rescue the severe phenotype of Arabidopsis mutants, indicating that they maintain normal biochemical functions in vivo. However, GmChlH2 and GmChlH3 could not completely rescue the chlorotic phenotype of Arabidopsis gun5-2 mutant, suggesting that the functions of these two proteins could be different from GmChlH1. Considering the differences shown on primary sequences, biochemical functions, and gene expression profiles, we conclude that the paralogs of each soybean Mg-chelatase subunit have diverged more or less during evolution. Soybean could have developed a complex regulatory mechanism to control chlorophyll content to adapt to different developmental and environmental situations.


INTRODUCTION
Soybean [Glycine max (L.) Merr.] is one of the most economically important crops as the main sources of plant protein and vegetable oil (Dornbos and Mullen, 1992;Ainsworth et al., 2012). Global soybean yield has been steadily increased over the past century attributed to improved cultivars and agronomy management, but average soybean yield is still away from reaching a plateau (Egli, 2008;Masuda and Goldsmith, 2009). In order to meet the growing need resulting from a fast expanding population and limited agricultural land, soybean yield and quality must improve at a higher speed than before. Multiple agricultural approaches can be exploited to achieve this goal, such as improving photosynthetic efficiency, optimizing utilization of carbon, increasing the efficiency of nitrogen fixation, and adjusting developmental process (Ainsworth et al., 2012;Natarajan et al., 2013;Koester et al., 2014).
Previous reports have shown that soybean seed yields are positively correlated to the increase in light interception, energy conversion, and partitioning efficiencies (Zhu et al., 2010;Koester et al., 2014), suggesting that improving photosynthetic efficiency could be a promising method to raise soybean production in the future. However, little is known about the molecular base of photosynthesis in soybean up to date, because soybean is not an ideal material to perform plant physiology study at molecular biology level. Soybean is a diploid (2n = 40) crop evolved from a recent tetraploid ancestor, possessing a 1.1-gigabase genome with approximate 50,000 genes, in which 75% are duplicated (Shoemaker et al., 2006;Schlueter et al., 2007). The complexity of genome and shortage of molecular biology tools are big obstacles to dissect soybean photosynthesis in detail.
Chlorophyll content of plant leaves is the major limiting factor for the efficiency of photosynthesis (Chen and Blankenship, 2011). Chlorophyll is the main light harvesting and energy converting pigment for photosynthesis (Croce and van Amerongen, 2014). It is composed of a chlorophyllide moiety and an isoprenoid phytol tail, which are generated through the tetrapyrrole biosynthetic pathway and methylerythritol phosphate (MEP) metabolic pathway, respectively (Masuda and Fujita, 2008;Kim et al., 2013;Croce and van Amerongen, 2014). The genes and corresponding enzymes involved in the chlorophyll biosynthesis pathway have been well-characterized in model photosynthetic organisms; however, the regulatory mechanisms of the pathway are only recently studied and not fully understood (Masuda and Fujita, 2008;Brzezowski et al., 2015).
In the first committed step of chlorophyll biosynthesis, magnesium chelatase (E.C.6.6.1.1, Mg-chelatase) inserts a magnesium ion (Mg 2+ ) into protoporphyrin IX to generate Mgprotoporphyrin IX (Masuda, 2008). Mg-chelatase is a highly conserved polymeric enzyme composed of 3 distinct subunits, ChlI, ChlD, and ChlH, with an approximate molecular weight of 40, 70, and 140 kDa, respectively (Walker and Willows, 1997;Sirijovski et al., 2006). The ChlI subunit belongs to the large AAA + family (ATPase Associated with various cellular Activities), contains typical ATP-binding motifs in sequence like Walker A and Walker B, and is responsible for ATP hydrolysis (Hansson et al., 2002;Lake et al., 2004). The ChlD subunit has an AAA + module at its N-terminus, an integrin I domain at the C-terminus, and an acidic proline-rich region in between; however, no ATPase activity is detected in ChlD (Gräfe et al., 1999;Fodje et al., 2001). The ChlH subunit is the porphyrin binding and catalytic subunit responsible for the insertion of Mg 2+ into protoporphyrin IX (Jensen et al., 1998;Karger et al., 2001). It is composed of six domains (I-VI), and an internal Proto-binding pocket is located at the interface between domain III and V possibly functioning in engulfing a tetrapyrrole ligand (Chen et al., 2015).
The magnesium chelation reaction has been postulated to proceed in two steps. During initial activation step, 6 ChlI and 6 ChlD subunits are assembled into a two-tiered hexameric ring in the presence of ATP, and meanwhile ChlH are activated by binding Mg 2+ and protoporphyrin IX. Next, the activated ChlH docks to the ATP-I-D complex to form Mg-chelatase holoenzyme, catalyzing Mg 2+ chelation into protoporphyrin IX in an ATP hydrolysis-dependent manner (Sirijovski et al., 2006;Zhang et al., 2006;Masuda, 2008).
The Chlorophyll biosynthesis has to be tightly controlled to fit the requirement of chloroplast biogenesis, or to maintain proper function of photosynthetic machineries. Most of chlorophyll intermediates are strong photosensitizers, and they will accumulate and further produce reactive oxygen species to damage cells if the regulation of chlorophyll biogenesis is impaired (Masuda and Fujita, 2008;Stephenson and Terry, 2008). Magnesium chelation, the branch point between the heme and chlorophyll biosynthetic pathways, is the major regulatory point of the chlorophyll biosynthesis pathway. It is well-known that Mg-chelatase is tightly controlled by the light signaling pathway at transcriptional level during thylakoid biogenesis in young chloroplasts, and is also regulated by a diurnal cycle and photosynthetic electron transport at both transcriptional and post-transcriptional levels in mature chloroplasts (Masuda, 2008).
Several Mg-chelatase impaired mutant plants have been identified in soybean, and they showed chlorophyll deficient in heterozygous and homozygous mutant plants (Palmer et al., 1989;Campbell et al., 2015). In one Mg-chelatase impaired mutant, the chlorophyll amount of heterozygous plants reduces to 50% of the wild-type level, however the yield remains a similar level compared to wild-type plants Walker et al., 2017), or even is higher than wild-type plants (Pettigrew et al., 1989). It suggests a possibility to improve photosynthesis capacity and seed yield in soybean by means of manipulating chlorophyll content. As a key enzyme and major regulatory point of chlorophyll biogenesis, Mg-chelatase might be a primary target to engineer at molecular level.
One hurdle to regulate the activity of Mg-chelatase with molecular biology tools is that its molecular features and regulatory mechanisms are not well-understood in soybean. The complete genome sequence of soybean was released in 2010 (Schmutz et al., 2010). Along with improved gene transformation tools in soybean, it provides an opportunity to dissect the photosynthesis process in detail, and further to manipulate some key components to improve photosynthesis efficiency. To gain a better knowledge of chlorophyll biogenesis in soybean, we take advantage of soybean genome data and molecular biology tools to examine all Mg-chelatase subunits at genomic, transcriptional, and protein levels. We think this knowledge will help us to better understand this key enzyme in soybean chlorophyll biosynthesis pathway; and it could lay a foundation to manipulate soybean Mg-chelatase for further improving photosynthesis efficiency.

Primers
All the primers used in present study are listed in Supplementary  Table S1.

Plant Materials and Growth Conditions
Soybean cultivar Williams 82 (Wm82) and Nicotiana benthamiana plants were grown in green house at 26 • C with a photoperiod of 16 h light/8 h dark cycle at a photosynthetic flux of 140 µmol . m −2. s −1 .

Sequence Retrieval and Bioinformatics Analysis
A search for soybean genes encoding Mg-chelatase I subunit (GmChlI), D subunit (GmChlD), and H subunit (GmChlH) was performed by using the BLASTN program against the soybean genome (https://soybase.org) with the coding sequences (CDSs) of Arabidopsis Mg-chelatase subunits I1 (AtChlI1), D (AtChlD), and H (AtChlH) (Mochizuki et al., 2001;Huang and Li, 2009;Du et al., 2012). The cDNA fragments of retrieved GmChlIs, GmChlDs, and GmChlHs were amplified by RT-PCR and further sequenced. A primary comparison between the deduced amino acid sequences of each subunit and their corresponding homologs from Arabidopsis and Synechocystis sp. PCC6803 (Ssp. PCC6803) was conducted with multiple sequence alignment program ClusterW2.
To investigate the potential regulation of gene expression, 1,500 bp sequences immediately upstream of the translation start codon of GmChlI, GmChlD, and GmChlH were subject to the analysis of putative cis-acting regulatory elements by using the PlantCARE online program (http://bioinformatics.psb.ugent.be/ webtools/plantcare/html/, Lescot et al., 2002).

RNA Extraction, RT-PCR, and Real-Time qRT-PCR
Total RNA was extracted from desired soybean tissues by RNAPrep pure plant kit (TIANGEN, China) following the manufacturer's instruction. For regular RT-PCR, RNA from leaf tissue was used. For qRT-PCR analysis, RNAs were extracted from tissues at different developmental stages. The roots and cotyledons were harvested from 1-week old seedlings. The stems, young trifoliate leaves, and flowers were sampled from flowering plants at R2 stage (∼45-day old). Young pods (∼4 cm long) and immature seeds (∼1 cm in length) from 4 cm long pods were sampled from plants at R5 stage (65∼70-day old).
Total RNA was treated with Turbo DNA-free kit (Invitrogen, USA) to remove genomic DNA contamination before reverse transcription. One microgram of DNA-free RNA was reverse transcribed to 1st strand cDNA using the PrimeScript II 1st strand cDNA Synthesis Kit (Takara, Japan) according to the manufacturer's protocol. The full-length coding sequences of GmChlIs, GmChlDs, and GmChlHs were amplified from Wm82 leave cDNA by PCR with PrimerSTAR Max DNA polymerase (Takara, Japan). PCR was run with an initial pre-denaturation of 98 • C for 30 s, followed by 30 cycles of 98 • C for 15 s, 60 • C for 15 s, and 72 • C for 2 min. The PCR products were cloned into pMD18T vector (Takara, Japan) for sequencing.
For qRT-PCR analysis, the 1st strand cDNA was diluted 20 times, and 1 µl was used in each real time quantitative PCR. Quantitative PCR was performed on BioRad CFX96 with 2x SYBR Green qPCR Master Mix (Roche, Switzerland) in 20 µl reaction. PCR was run with the following program: 10 min in 95 • C, followed by 40 cycles of 10 s at 95 • C and 30 s at 63 • C, then increasing up to 95 • C at an increment of 0.5 • C degree per min. The cycle threshold value was calculated by the CFX Manager Software version 3.0 (Bio-Rad, USA). Relative gene expression level was calculated by using the 2 − Ct method. All data were normalized against the expression level of the soybean actin gene (Glyma18g290800). For each sample, three replicates were performed.

Subcellular Localization and Bimolecular Fluorescence Complementation (BiFC)
Vectors pSPYNE and pSPYCE (Waadt et al., 2008) carrying the CaMV 35S promoter-driven N-and C-terminal half of YFP, respectively, were used for BiFC assay. Vector pSPY-GFP for localization is generated by replacing YFP N in pSPYNE with GFP gene. Coding sequences of GmChlIs, GmChlDs, and GmChlHs were cloned into pSPY-GFP, pSPYNE, and pSPYCE, in-frame fused to the N-terminus of the corresponding tags.
For subcellular localization, Agrobacterium tumefaciens (GV3101) carrying vectors expressing GmChlI-GFP, GmChlD-GFP, and GmChlH-GFP were co-infiltrated with the p19 strain into N. benthamiana leaves for transient expression as described in Waadt et al. (2008). Similarly, for BiFC, pairs of YFP N and YFP C fusion proteins were transiently co-expressed in the leaves of N. benthamiana. GFP and reconstructed YFP fluorescence were observed and imaged 2-3 days after infiltration using the Olympus Fluoview FV1000 confocal laser scanning microscope (Olympus, Japan).

Yeast Two Hybrid Assay
The yeast two-hybrid (Y2H) analysis was performed using the Matchmaker GAL4 Two-Hybrid System 3 according to the supplier's instruction (Clontech, USA). Coding sequences for mature peptides of GmChlIs, GmChlDs, and GmChlHs were cloned into both the bait vector pGBKT7 and the prey vector pGADT7, downstream in frame to the GAL4 DNA binding domain (BD) and the GAL4 activation domain (AD), respectively. Pairs of the resulting BD and AD constructs were co-transformed into yeast strain AH109 according to the manufacturer's instruction. Yeast cells were first grown on synthetic dropout medium missing Leu and Trp, and then the colonies were tested for growth on selective medium lacking Leu, Trp, His, and Ade.

Gene Transformation in Arabidopsis and Transgenic Plant Screening
Full length CDSs of GmChlIs, GmChlDs, and GmChlHs tagged with HA at C-terminus were cloned into pSPYCE by replacing the YFP C fragment, and transformed into the corresponding Arabidopsis mutants, heterozygous chli1/+, chld-2/+, and homozygous gun5-2, through Agrobacterium mediated transformation (Clough and Bent, 1998). Transgenic lines were screened on MS plates against kanamycin. The genotypes at chli1 or chld-2 loci were evaluated by PCR in GmChlI-HA and GmChlD-HA transformants. Genomic DNA for PCR analysis was extracted from leaves using NuClean PlantGen DNA Kit (CWBIO, China). Transgenic plants with homozygous mutant background would be advanced to next generation for further analysis. Three-week old T 3 plants derived from three independent T 1 transgenic lines were surveyed for phenotypes and photographed. The lines that did not show a full complementation by the transgene were subject to further examinations of protein and chlorophyll content. The protein level of these transgenic lines was examined by western-blot analysis with HA monoclonal antibody (H9658, Sigma-Aldrich, USA).

Chlorophyll (Chl) Content Measurement
The chlorophyll was extracted from leaves by immersing in a 4.5: 4.5: 1 (v/v) mixture of acetone, ethanol, and distilled water for 12 h under dark condition. After extraction, the absorbance was measured at 663 and 645 nm with UV-VIS Double Beam Spectrophotometer (HALO DB-30, Dynamica, UK). Chlorophyll contents (mg per gram of fresh weight tissue, mg/g) were calculated using the following equations : Total Chl (mg/g) =Chl a (mg/g) + Chl b (mg/g) The experimental result for each line was expressed as mean ± standard deviation (sd) of three replicates. Statistical significance of differences between transgenic lines and corresponding wild type or mutant lines were tested using the two-tailed Student's t-test algorithm. P-value < 0.05 was regarded as significant.

Accession Number
We deposited all the soybean chelatase genes in this article to the GenBank/EMBL databases under the following accession numbers:
The cDNA of each gene was amplified by RT-PCR from Wm82 leaf tissue and sequenced to determine the right splicing forms. According to the sequencing results, GmChlI1a, GmChlI1b, and GmChlI2b encode a ∼40 kDa polypeptide with 433 amino acid (aa), while GmChlI2a encodes a 415-aa polypeptide. Proteins encoded by GmChlID1 and GmChlD2 are 751-aa long with molecular weight of ∼70 kDa. GmChlH1 and GmChlH3 both encode 1385-aa polypeptides with molecular weight of ∼140 kDa, while GmChlH2 encodes a 1384-aa polypeptide. All soybean ChlI, ChlD, and ChlH subunits are similar in molecular mass to their corresponding homologs identified in other species ( Figure  S1) Sawers et al., 2006;Zhang et al., 2006;Du et al., 2012;Muller et al., 2014).
According to the prediction by TargetP, each soybean Mg-chelatase subunit contains a chloroplastic transit peptide (CTP) at the N-terminus as expected. Sequence comparison of GmChlIs, GmChlDs, and GmChlHs to their orthologs from A. thaliana and Ssp. PCC6803 revealed that they are all wellconserved (Figures S1-S3). Four GmChlIs are highly similar, sharing at least 90% amino acid identity, and each of them contains the full set of characteristic motifs in ChlI, including Walker A, Walker B, Sensor I, Arginine fingers (R finger), and Sensor II ( Figure S1). GmChlID1 and GmChlD2 share 97% identity on amino acid sequence ( Figure S2). All three structural domains of ChlD are highly conserved in GmChlD proteins, including an AAA + like module at the N-terminus, a proline-rich linker region, and an integrin I domain at the C-terminus ( Figure  S2). GmChlH1 and GmChlH2 share 99% identity on amino acid sequence and both of them share 97% sequence identity with GmChlH3 ( Figure S3). The sequences of three GmChlHs are conserved in all six functional domains, especially in domains III, V, and VI that constitute the putative active center (Chen et al., 2015). The residues surrounding the putative tetrapyrrolebinding pocket are invariant compared to SspChlH. Together, the primary structure information indicates that all isoforms of GmChlI, GmChlD, and GmChlH are likely functional in vivo.

Phylogenetic Analysis of Mg-Chelatase Subunits in Soybean
To investigate the evolution history of Mg-chelatase, protein sequences of GmChlI, GmChlD, and GmChlH were subjected to phylogenic analysis along with their orthologs from 11 species including 6 dicotyledons (P. vulgaris, C. cajan, V. vinifera, G. arboreum, P. trichocarpa, A. thaliana), 4 monocotyledons (B. distachyon, Z. mays, S. bicolor, O. sativa), and 1 cyanobacterium (Ssp. PCC6803). The sequences of ChlIs, ChlDs, and ChlHs are clustered into dicot and monocot clades along with Ssp. PCC6803 as an out-group, and subsequently diverged by family, showing that all the homologs from legume family are subgrouped together (Figure 1). In legumes, most of species encode 2 ChlIs, 1 ChlD, and 2 ChlHs, in which the ChlI and ChlH subunits are further sub-divided into two clusters, indicating that these two subunits experienced a duplication at the origin of the legumes. Comparatively, soybean genome encodes 4 GmChlIs, 2 GmChlDs, and 3 GmChlHs, indicating that Mg-chelatase genes went through a second round of duplication at the origin of soybean. Therefore, four GmChlIs are separated into two groups, each belonging to one of the legume ChlI clusters. One group contains GmChlI1a and GmChlI1b while the other one contains GmChlI2a and GmChlI2b (Figure 1A). Similarly, three isoforms of GmChlH are divided into two groups as well, with GmChlH1 and GmChlH2 as one group and GmChlH3 as the other group ( Figure 1C). The twin copy of GmChlH3 gene is likely lost after the gene duplication event.
In addition, an interesting history is observed in the evolution of ChlI subunit ( Figure 1A). Most dicotyledons encode 2 ChlI isoforms whereas monocotyledons and Ssp. PCC6803 only have 1 ChlI. It is logical to postulate that a duplication event of ChlI would happen near the origin of the dicots. However, the phylogenetic tree shows that ChlIs from the same dicotyledon species or family tend to cluster together indicating that various independent duplications of ChlI subunit happened during the evolution of dicots ( Figure 1A).
Frontiers in Plant Science | www.frontiersin.org FIGURE 1 | and ChlH subunits from A. thaliana (Du et al., 2012), O. sativa (Zhang et al., 2006;Muller et al., 2014), Ssp. PCC6803 , and the ChlI subunit from Z. mays (Sawers et al., 2006) were published previously. The unrooted phylogenetic trees were constructed using the Neighbor-Joining method and a 1,000 bootstrap resampling value. The bootstrap values >70% are shown next to the branches. The evolutionary distances were computed using the Poisson correction method and are in the units of the number of amino acid substitutions per site. Positions containing gaps and missing data were eliminated. Evolutionary analyses were conducted in MEGA6.

Tissue Specific Expression Profiles of GmChlIs, GmChlDs, and GmChlHs
To illustrate whether the paralogs of GmChlI, GmChlD, and GmChlH genes are diverged in biological function, we analyzed the tissue specific expression profile of each Mg-chelatase gene through real-time quantitative RT-PCR. The transcription levels of GmChlIs, GmChlDs, and GmChlHs were investigated in various tissues, including roots, stems, cotyledons, leaves, flowers, pods, and immature seeds. The results revealed that the tissue expression patterns of all the genes are generally similar, with the highest level in leaves followed by cotyledons, a relative lower level in stems, flowers, and pods, and an almost negligible level in roots. The expression levels between paralogs are different, but the differences between closely related paralogs are smaller compared to distantly related ones (Figures 1, 2). For example, GmChlI1a and GmChlI1b express much higher than GmChlI2a and GmChlI2b, which are barely expressed in all examined tissues (Figure 2A). Meanwhile, the expression levels of GmChlI1a and GmChlI2a are about 2-fold higher compared to GmChlI1b and GmChlI2b, respectively, in most of the tissues. These data imply that GmChlI1a and GmChlI1b are the major functional Mgchelatase I subunit in soybean. The expression levels of two GmChlD genes were comparable in all tissues, but GmChlD2 is expressed at a slightly higher level (less than 2 fold) compared to GmChlD1 (Figure 2B), suggesting both of GmChlD paralogs function in soybean. Among three GmChlH genes, the expression of GmChlH3 is generally lower than the other two especially in leaves and cotyledons, and no significant difference was detected between GmChlH1 and GmChlH2 ( Figure 2C). Taken together, the transcription levels of GmChlIs, GmChlDs, and GmChlHs show a positive correlation with the tissue photosynthetic activity, which is consistent to the enzyme function. However, the functions of the paralogs of each subunit are likely diverged with respect to their different expression levels.

Promoter Analyses of GmChlIs, GmChlDs and GmChlHs
We next carried out a promoter motif analyses to further study the potential regulatory mechanisms of GmChlIs, GmChlDs, and GmChlHs. The 1,500 bp region immediately upstream of start codon ATG of each gene was analyzed with the online program PlantCARE (Lescot et al., 2002). Multiple CAAT boxes and various cis-acting regulatory motifs were predicted from the promoter regions of these FIGURE 2 | Expression analysis of GmChlIs, GmChlDs, and GmChlHs in various tissues at different growth stage of soybean. Expression profiles of four GmChlI genes (A), two GmChlD genes (B), and three GmChlH genes (C) in different tissues are obtained by quantitative RT-PCR. The names of tissues are indicated below the chart. Root and cotyledon were taken from 7-day old seedlings. Stem, trifoliolate leaves, and flowers were taken from ∼45-day-old flowering plants. Pods (∼4 cm long) and immature seeds (∼1 cm in length) were sampled from 65∼70-day old plants. Expression level of each gene is normalized to those of actin. Error bars indicate standard deviation (sd) from three technical replicates.
The CAAT box is a proximal promoter element recognized by CAAT-box binding transcription factors and is important for the  Frontiers in Plant Science | www.frontiersin.org sufficient transcription of the downstream gene (Bi et al., 1997). We counted the predicted CAAT boxes in the 400 bp region upstream of Mg-chelatase genes, revealing a consistency with their transcription levels detected by qRT-PCR. The paralogs with more CAAT boxes generally show a higher expression compared to the ones with less CAAT boxes in most of tissues (Figure 2 and Table 1).
Among cis-acting regulatory motifs, it is not surprised that LREs are the most abundant motifs in promoter regions of GmChlIs, GmChlDs, and GmChlHs (Table 1). Light is the main environmental factor regulating chlorophyll biosynthesis; it is required for massive expression of Mg-chelatase genes (Papenbrock et al., 1999;Winter et al., 2007;Stephenson and Terry, 2008). The total number of LREs is similar among 9 promoters, ranged from 12 to 15, in which G-box is commonly present ( Table 1). G-box is the bind site of many transcription factors in light signaling pathway, such as ELONGATED HYPOCOTYL5 (HY5) and PHYTOCHROME INTERACTING FACTOR proteins (PIFs) (Toledo-Ortiz et al., 2014). Other than G-box, light responsive motifs are different across promoters, suggesting there is a difference in fine regulation of soybean Mg-chelatase genes responsive to light.
Several reports show that the expression of ChlH exhibits a diurnal oscillation pattern when plants are grown in light-dark regime whereas the expression of ChlI and ChlD displays less or no variation under the same situation Jensen et al., 1996;Papenbrock et al., 1999).
In soybean, the circadian regulatory element circadian is ubiquitously present in the promoter regions close to three GmChlHs; on the other hand, it is either absent in the 1,500 bp promoter regions of GmChlIs and GmChlDs, or present in the upstream regions far away from the genes ( Table 1). These results indicate that the expression patterns of Mg-chelatase genes in soybean are similar to those in other species.
Another major type of elements in the promoters of GmChl genes is the hormone responsive element (HRE). It is well-known that phytohormones play important roles in chlorophyll biosynthesis pathway. For example, the greening process of seedlings is orchestrated through a complex network of interactions between auxin, gibberellic acids (GA), cytokinins (CK), ethylene, and light signal transduction pathways (Liu et al., 2017). Here, we can find auxin responsive elements, GA responsive elements, or both of them in GmChl promoters except for pGmChlD2 (the promoter of GmChlD2), suggesting that the expression of most of GmChl genes can FIGURE 3 | Subcellular localization of GmChlIs, GmChlDs, and GmChlHs. Free GFP and GmChlIs, GmChlDs, and GmChlHs tagged with a C-terminal GFP were transiently expressed under the control of 35S promoter in N. benthamina leaves and observed by confocal microscopy. In each case, images of GFP fluorescence (GFP), chlorophyll autofluorescence (Chl), and merged GFP and chlorophyll fluorescence with bright-field (Merged) are shown. Scale bar = 20 µm. be directly regulated by the components in auxin or/and GA signaling pathway. By contrast, no CK response element is detected in any of the promoters and ethylene responsive element is only observed in pGmChlI2a, suggesting CK and ethylene probably indirectly regulate the transcription of GmChlIs, GmChlDs, and GmChlHs via interacting with other signal transduction pathways. Previous report showed that the cytokinin-mediated Arabidopsis root greening is dependent on transcription factor HY5 (Kobayashi et al., 2012). ETHYLENE INSENSITIVE 3 (EIN3) can active PIFs to regulate the expression of Mg-chelatase genes (Zhong et al., 2014).
Comparatively, abscisic acid (ABA) mainly functions as an inhibitor of chlorophyll biosynthesis. ABSCISIC ACID INSENSITIVE 5 (ABI5) in Arabidopsis represses the cotyledon greening during seed germination under light through ABA-mediated pathway (Guan et al., 2014). Defect in transcription factor ABSCISIC ACID INSENSITIVE 3 (ABI3) leads Arabidopsis plants producing stay-green embryos (Delmas et al., 2013). Consistently, ABA response element (ABRE) is present in most of GmChl promoters, except for pGmChlI1b and pGmChlI2b, showing that chlorophyll synthesis is highly controlled by ABA in soybean.
Salicylic acid (SA) and jasmonates (JA) are important signal molecules in the regulation of plant response to biotic and abiotic stress conditions, such as pathogen attacks, extreme temperatures, salts, and oxidative conditions (Khan et al., 2010;Ahmad et al., 2012;Wasternack and Hause, 2013). Chlorophyll contents usually are reduced under stress conditions (Ramachandra Reddy et al., 2004). The elements responsive to SA, including TCA-element motifs and TC-rich repeats, are relatively abundant in pGmChlIs and pGmChlHs, but rare in pGmChlDs, while JA response element is only found in pGmChlI1b and pGmChlD2 ( Table 2). It indicates that the regulations of soybean Mg-chelatase genes are different in responsive to SA and JA signals. In addition, several abiotic stress responsive elements are found in the promoters of GmChls, including HSE, LTR, and MBS motifs, which are responsive to heat, low temperature, and drought, respectively ( Table 2). HSE and MBS are relatively common in the promoters of 9 GmChl genes, and LTR only presents in pGmChlD1 and pGmChlHs with a low copy number, implying the expression of GmChls is likely more sensitive to heat and drought.
Moreover, there are some tissue specific cis-elements present in the promoters of all GmChl genes except GmChlI2a (Table 2). Interestingly, GCN and skn-1 motifs related to endosperm specific expression are common and abundant in GmChl genes, indicating Mg-chelatase in endosperm are likely functionally important. Additionally, two copies of CAT-box motifs related to root meristem development are found in pGmChlI1a and pGmChlD1, and one copy of HD-Zip1 related to leaf development is found in GmChlD2 (Table 2).
Collectively, regulatory cis-elements response to environmental and growth conditions are diverse among the promoters of different Mg-chelatase genes, indicating that the expression of soybean Mg-chelatase genes are complicated and differentially regulated by various factors.

Subcellular Localization of GmChlIs, GmChlDs, and GmChlHs
Sequence alignment analysis showed that GmChlIs, GmChlDs, and GmChlHs all contain potential CTP at the N-terminus similar to AtChlI1, AtChlD, and AtChlH ( Figures S1-S3). To confirm their subcellular localizations, we generated GFP fusion proteins by fusing GFP to the C termini of full-length GmChlIs, GmChlDs, and GmChlHs, respectively. Nine fusion proteins were FIGURE 4 | Yeast two hybrid assay between GmChlIs, GmChlDs, and GmChlHs. pGBKT7 vectors expressing BD-fused GmChlIs, GmChlDs, and GmChlHs were co-transformed with pGADT7 empty vector or the ones expressing AD-fused GmChlIs, GmChlDs, and GmChlHs into the yeast strain AH109. The transformants were grown (A) on the synthetic dextrose medium (SD) lack of Trp and Leu (SD-TL) or (B) on the SD medium missing Trp, Leu, His, and Ade (SD-TLHA).
transiently expressed in the leaf tissue of N. benthamiana through agro-infiltration, and observed under confocal microscope 2 days after inoculation. The results clearly demonstrated that GFP-fused GmChlI, GmChlD, and GmChlH subunits were expressed and localized in the chloroplasts, as shown by the co-localization of GFP signal and chlorophyll autofluorescence (Figure 3), supporting that GmChlIs, GmChlDs, and GmChlHs are chloroplastic proteins, in agreement with their expected functions.

Interactions Among GmChlIs, GmChlDs, and GmChlHs
It has been established that ChlI, ChlD, and ChlH are assembled together to form an active Mg-chelatase holo-complex (Adhikari et al., 2011). We performed Y2H and BiFC assay to test these interactions among GmChlIs, GmChlDs, and GmChlHs.
The Y2H assays revealed that each GmChlI isoform can interact with itself and the other paralogs (Figure 4), suggesting that four GmChlIs are able to form homo-and hetero-hexameric ring. Similarly, they also interact with GmChlD1, indicating that they can form the oligomer with GmChlD1 protein. Meanwhile, GmChlD1 could interact with itself, which suggesting GmChlD1 can form the hexameric ring like GmChlIs. Notably, when GmChlD2 was fused with Gal4 activation domain, it did not interact with GmChlI or GmChlD proteins; but the interactions between them could be observed when GmChlD2 was fused with Gal4 DNAbinding domain (Figure 4), indicating that Gal4 activation domain could interfere the interaction of GmChlD2. In addition, no interaction was detected between a GmChlH isoform and any soybean Mg-chelatase subunit through Y2H method (Figure 4).
During BiFC experiments, obvious reconstructed YFP fluorescence was observed in chloroplasts coexpressing GmChlIs-YFP N with GmChlIs-YFPc, GmChlDs-YFPc, or GmChlHs-YFPc ( Figure S4). It confirmed the interactions between GmChlIs and GmChlDs detected in Y2H, and suggested that 4 GmChlIs could interact with all the GmChlH isoforms. Similar results were obtained by coexpression of GmChlDs-YFP N with GmChlIs-YFPc, GmChlDs-YFPc, or GmChlHs-YFPc, verifying GmChlDs could interact with GmChlIs and GmChlDs, and implying the interactions between GmChlDs and GmChlHs as well ( Figure S5). In addition, the interactions between GmChlHs and GmChlIs or GmChlDs were observed when GmChlHs-YFP N was co-expressed with GmChlIs-YFPc or GmChlDs-YFPc in leaves ( Figure S6). The interactions between GmChlH and other subunits were not observed in Y2H experiment possibly because GmChlH protein can only interact with GmChlI or GmChlD proteins in the two-tiered ring complex but not to the single GmChlI or GmChlD hexameric ring. As the negative control, no YFP signal was detected in the leaves co-transformed with a GmChlI-YFP N , GmChlD-YFP N , or GmChlH-YFP N construct in combination with an empty YFP C vector. Taken together, these results suggest that all of GmChlIs, GmChlDs, and GmChlHs are likely involved in Mg-chelatase activity by physically interacting with each other.
Arabidopsis gun5-2 mutant, an AtChlH knock-down mutant, harbors a T-DNA insertion at the promoter region of AtChlH Error bars indicate sd from three biological replications. Student t-test was performed for statistical analysis among all the lines; different letters represent statistically significant differences between transgenic plant and wild-type or mutant lines (P < 0.05); (C) Protein expression level of transgenes in the lines present in (A). Western blot analysis was performed using HA tag antibody. The Ponceau S staining of Rubisco large subunit is used as the loading control. Scale bars = 5 mm.
Results from a detailed chlorophyll analysis were consistent to the phenotypes (Figure 6B). The gun5-2 plants expressing different GmChlHs had different chlorophyll levels. The chlorophyll content in gun5-2 35S : : GmChlH1−HA plants was highest, similar to that in wild types, and gun5-2 35S : : GmChlH3−HA plants had lowest amount of chlorophyll, which seemed a little higher than that in gun5-2, but without statistically significant difference ( Figure 6B). As to gun5-2 35S : : GmChlH2−HA plants, the chlorophyll content was lower than that in gun5-2 35S : : GmChlH1−HA plants but higher than that in gun5-2 35S : : GmChlH3−HA plants ( Figure 6B). However, western blot analysis exhibited that protein contents of three GmChlHs were similar in corresponding transgenic plants (Figure 6C), indicating that the failure of GmChlH2-HA and GmChlH3-HA to fully complement the gun5-2 mutant phenotypes was not caused by a lack or lower of gene expression. These results imply that GmChlH1 is probably a full functional ChlH subunit, whereas GmChlH2 and GmChlH3 have lower chelatase activities.

CONCLUSION
In cyanobacterium and monocotyledon plants, each subunit of Mg-chelatase has only one isoform Sawers et al., 2006;Zhang et al., 2006;Muller et al., 2014), whereas most dicotyledons encodes 2 ChlI, 1 ChlD, and 1 ChlH subunits (Du et al., 2012; Figure 1). Soybean genome harbors 4 GmChlI, 2 GmChlD, and 3 GmChlH genes, and other legume species have 2 ChlI, 1 ChlD, and 2 ChlH genes (Figures 1, 2). Phylogenetics analysis reveals that there are two gene duplication events occurring in the history of soybean Mgchelatase, including one at the origin of the legume family, and the other one after speciation of soybean (Figure 2). The first round of gene duplication seems only apply to GmChlI and GmChlH genes. This is in an agreement with two recent major duplication events in soybean genome at 58 and 13 million years ago revealed by the soybean genome sequencing project (Schmutz et al., 2010).
Sequence alignment analysis (Figures S1-S3) and interaction assay (Figure 4; Figures S4-S6) indicate that the isoforms of each soybean Mg-chelatase subunit are likely functional proteins. Further ectopic expression of each Mg-chelatase subunit in Arabidopsis confirms that all of GmChlIs and GmChlDs have proper biochemical function because they can fully recover chlorophyll biosynthesis of the corresponding Arabidopsis mutants (Figure 5). However, the experiment also reveals that GmChlH2 and GmChlH3 are not as active as GmChlH1 in Arabidopsis (Figure 6). Even though GmChlH1, GmChlH2, and GmChlH3 are very similar in primary structure, the fine difference in their sequences could lead to a big change in enzyme activity.
Results from qRT-PCR and in silicon promoter analyses indicate that all soybean Mg-chelatase subunits are highly and similarly regulated by light, concerning the tissue specific expression patterns and the number of light response elements (Tables 1, 2; Figure 2). Nevertheless, the expression levels are varied among different isoforms (Figure 2). Especially, the transcriptions of GmChlI2a and GmChlI2b are extremely low compared to GmChlI1a and GmChlI1b in photosynthetic tissues, indicating the latter two play the major role during active photosynthesis. Moreover, the elements responsive to environmental and growth conditions other than LREs are diverse in the promoters of all the subunit genes, indicating they are differentially regulated.
With respect to the sequence, biochemical function, and gene expression and regulation, we conclude that the paralogs of each soybean Mg-chelatase subunit are diverged in biological functions. The differently expressed and variously functional isoforms of the Mg-chelatase subunits may also suggest that there is a more complex regulatory mechanism to control chlorophyll content in soybean in order to response to different development stages and environmental stresses, and to optimize its light harvesting capacity and photosynthetic efficiency.

AUTHOR CONTRIBUTIONS
MX, AF, and DZ designed research. DZ, EC, XY, YoC, QY, YaC, and XL performed research. DZ, EC, XY, YoC, QY, YW, MX, and AF analyzed data. DZ, MX, and AF wrote the paper with contributions from all authors. All authors read and approved the manuscript.

FUNDING
This work was supported by the National Key R&D Program of China (2016YFD0101500 and 2016YFD0101503) and a Natural Sciences Foundation of China project (31371226) to MX, and a Shaanxi Science and Technology Project (2014KTCL02-03) to AF.

ACKNOWLEDGMENTS
We thank Dr. Hsou-min Li (Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan) and Dr. Fang-Qing Guo (Institute of Plant Physiology & Ecology, Shanghai, China) for kindly providing us Arabidopsis mutants.