CONSTANS Polymorphism Modulates Flowering Time and Maturity in Soybean

CONSTANS (CO) plays a critical role in the photoperiodic flowering pathway. However, the function of soybean CO orthologs and the molecular mechanisms in regulating flowering remain largely unknown. This study characterized the natural variations in CO family genes and their association with flowering time and maturity in soybeans. A total of 21 soybean CO family genes (GmCOLs) were cloned and sequenced in 128 varieties covering 14 known maturity groups (MG 0000-MG X from earliest to latest maturity). Regarding the whole genomic region involving these genes, GmCOL1, GmCOL3, GmCOL8, GmCOL9, GmCOL10, and GmCOL13 were conserved, and the remaining 15 genes showed genetic variation that was brought about by mutation, namely, all single-nucleotide polymorphisms (SNPs) and insertions-deletions (InDels). In addition, a few genes showed some strong linkage disequilibrium. Point mutations were found in 15 GmCOL genes, which can lead to changes in the potential protein structure. Early flowering and maturation were related to eight genes (GmCOL1/3/4/8/13/15/16/19). For flowering and maturation, 11 genes (GmCOL2/5/6/14/20/22/23/24/25/26/28) expressed divergent physiognomy. Haplotype analysis indicated that the haplotypes of GmCOL5-Hap2, GmCOL13-Hap2/3, and GmCOL28-Hap2 were associated with flowering dates and soybean maturity. This study helps address the role of GmCOL family genes in adapting to diverse environments, particularly when it is necessary to regulate soybean flowering dates and maturity.


INTRODUCTION
Plants can adapt to different environmental conditions in response to various day lengths (photoperiods). In the photoperiodic flowering pathway, CONSTANS (CO), which is a B-boxcontaining transcription factor Gangappa and Botto, 2014), plays a key role (Putterill et al., 1995;Suarez-Lopez et al., 2001). CO also possesses a CONSTANS, CONSTANS-LIKE, TIMING OF CAB1 (CCT) domain at its C-terminus involved in DNA binding (Strayer et al., 2000;Robson et al., 2001). CYCLING DOF FACTOR (CDF) family proteins bind to the CO promoter to repress its transcription in the morning (Imaizumi et al., 2005;Fornara et al., 2009). FLAVIN-BINDING, KELCH REPEAT, and F-BOX1 (FKF1) interact with GIGANTEA (GI) to degrade the CDF1 protein, which results in the elevation of CO mRNA (Sawa et al., 2007). CO acts in the phloem of leaf vascular tissues to activate FLOWERING LOCUS T (FT) expression, which causes flowering under linkage disequilibrium (LD) conditions (Takada and Goto, 2003;An et al., 2004). SUPPRESSOR OF PHYTOCHROME A-105 (SPA) proteins interact with the CCT domain of CO (Laubinger et al., 2006). CO is ubiquitinated by CONSTITUTIVE PHOTOMORPHOGENIC 1 (COP1) and degraded by the 26S proteasome (Jang et al., 2008;Liu L. J. et al., 2008). HIGH EXPRESSION OF OSMOTICALLY RESPONSIVE GENES1 (HOS1) interacts with CO and participates in the degradation of CO mediated by red light (Lazaro et al., 2012(Lazaro et al., , 2015. Nucleoporin96 (Nup96) interacts with HOS1 to gate CO protein levels . CO protein is stable under the light in the evening but degraded in the morning or in the dark under LD conditions (Valverde et al., 2004). All of the abovementioned comments are based on studies in Arabidopsis.
Natural variation is associated with photoperiodic flowering and adaptation in different species (Alonso-Blanco et al., 2005;Balasubramanian et al., 2006;Rosas et al., 2014;Lu et al., 2017;Bao et al., 2019;Jiang et al., 2019;Wu et al., 2020). In this study, 128 soybean varieties were selected and planted, covering all 14 maturity groups from MG 0000 to MG X with a continuous distribution in maturity groups (Jiang et al., 2019). Due to the short duration of the project, 21 of the 28 soybean COL genes (Fan et al., 2014) were sequenced, and their sequence polymorphisms were analyzed. Furthermore, we studied the haplotypes of these soybean GmCOL genes to discover their natural variations in association with flowering time and maturity. These results suggested that some natural variations in 21 soybean GmCOL genes were associated with flowering date and maturity.

Plant Materials
Soybean varieties covering all 14 maturity groups (MG 0000-MG X) were assessed in this study (Jiang et al., 2019). In this study, 64 Chinese and Russian soybean varieties and 64 North American maturity group standard varieties were included, covering MG 0000-MG X, for a total of 128 accessions analyzed ( Table 1).

Investigation of Flowering and Maturity Dates
The soybeans were planted in soil in 10-L pots and grown under natural conditions in Haidian District, Beijing (39.95 • N,116.32 • E) on May 27, 2015, and May 18, 2016(Jiang et al., 2019. After emergence (VE), seedlings of similar size were selected so that each pot contained five uniform plants. Each variety was planted in three pots. The four developmental stages of soybean, namely, VE, R1, R7, and R8 (Fehr and Caviness, 1977), were investigated, and the average value of three pots for each variety was used for statistical analysis.

Data Mining and Sequence Analysis
The annotated soybean COL gene sequence was downloaded from the Joint Genome Institute (JGI) Phytozome website 1 . The protein sequences of the annotated Arabidopsis genes (TAIR9 release) were downloaded from the Arabidopsis Information Resource (TAIR) website 2 . The CO-like gene sequence data were also collected from the USDA-ARS Soybean Genetics and Genomics Database (SoyBase Database 3 ). The reference sequence of the soybean variety Williams 82 was obtained from the Phytozome (version 12.0) database 4 as a reference design for the amplification and sequencing primers of the COL genes. Sequencing was performed using an ABI3730 sequencer. Multiple sequence alignment, editing, and stitching were carried out using ClustalW in MEGA5 with default parameters. Singlenucleotide polymorphism (SNP) analysis was carried out using TASSEL5 software. The haplotype analysis of the sequencing results was conducted by DNAsp to determine whether the protein coding was affected. LD was also evaluated using TASSEL5 software. The statistical analysis was performed using "R" software.

Diversity of Soybean Varieties in Flowering Time and Maturity
The results of the growth traits are presented in Table 2. The varieties utilized in this experiment exhibited significant diversity in flowering time and maturity (  Table 2). In 2015, the days to R1 from emergence (VE) ranged from 21 (MC82, MG 0) to 121 days (MC72, MG VIII) with a span of 100 days. In 2016, the span time was 96 days between MG 0000 (MC05, 13 days) and MG VIII (MC129, 109 days). In 2016, the days to R7 ranged from 68 (MC05 and MC74, MG 0000) to 144 days (MC42, MG IV), and the days to R8 ranged from 76 (MC05, MG 0000) to 109 days (MC37, MG III). The combination of maturity group information for the days of flowering time from the emergence and the days of beginning maturity from emergence in 2016 is plotted in a box diagram (Figure 1).

GmCOL Gene Sequencing and Mutation Study
A large number of mutated sites were found in GmCOL2, GmCOL5, GmCOL14, GmCOL16, GmCOL20, and GmCOL28 ( Figures 2B1,D1 and Supplementary Figures 1G1,I1, 2B1,H1). A majority of the mutation sites were in gene intron regions, but a few were located in protein-coding regions caused by insertions-deletions (InDels) and substitutions. In addition, GmCOL6, GmCOL22, GmCOL23, GmCOL24, GmCOL25, and GmCOL26 showed high mutation frequencies in the protein-coding regions (Supplementary Figures 1, 2). In contrast, GmCOL4, GmCOL15, and GmCOL19 expressed lower mutational occurrences (Supplementary Figures 1A1,H1, 2A1). The exception was GmCOL3, which did not show mutations ( Figure 2C) in either the intron or exon region of the entire gene. GmCOL3 is a highly conserved gene with no change in sequence. However, the amino acid sequences encoded by GmCOL1, GmCOL8, GmCOL9, GmCOL10, and GmCOL13 were not affected (Figure 2A and Supplementary  Figures 1C1-F1).

Soybean COL Gene Family Exhibited Different Linkage Disequilibrium
Linkage and association mapping were drawn among the SNPs of the gene via TASSEL5. GmCOL5 across the region almost from the starting to the end site presented a strong LD by completing the haplotype block ( Figure 3D). GmCOL20, GmCOL24, and GmCOL25 across the region expressed strong LD as a haplotype block (Figures 3N,Q,R), and the polymorphic sites of GmCOL6, GmCOL8, GmCOL10, GmCOL13, and GmCOL19 showed a strong LD, with almost the entire region being a haplotype block (Figures 3E,F

Haplotype Analysis and Its Association With Maturity Groups of Soybean Varieties
Based on LD and sequence clustering, haplotype definition was analyzed for the 21 soybean GmCOL gene families in    Figures 1, 2), respectively. Supplementary  Figure 1A2). GmCOL4-Hap1 had an earlier flowering date in 2016. GmCOL5-Hap2 was present in 56 accessions, covering all maturity groups (Figure 2D1). GmCOL6-Hap1 was the most abundant in 70 varieties from different maturity groups, including MG 0000-MG V and MG VIII-MG X (Supplementary Figure 1B1). Among the 11 haplotypes, GmCOL9-Hap1 was dominant in 67 varieties from maturity groups MG 0000-MG X (Supplementary Figures 1D1,D2). GmCOL9-Hap3 and GmCOL9-Hap4 were distributed in a few varieties, and the flowering time for both haplotypes was early. GmCOL10-Hap1 was the most abundantly expressed in 91 varieties from all maturity groups except MG X (Supplementary Figures 1E1,E2). GmCOL16-Hap1 was rich in accessions distributed in the maturity groups MG 0000-MG VIII (Supplementary  Figures 1I1,I2). GmCOL20-Hap2 was present in 52 varieties from all maturity groups (MG 0000-MG X) (Supplementary  Figures 2B1,B2). Among all the haplotypes of GmCOL28, GmCOL28-Hap1 was the most common, accounting for 59 accessions, and was distributed in the maturity groups MG 0000, MG 00, MG 0, MG I-MG VIII, and MG X (Supplementary  Figures 2H1,H2). In this experiment, no haplotype variants were found in GmCOL1 and GmCOL3 (Figures 2A,C). The Hap1 series of haplotypes of GmCOL8/13/14/15/19/22/23/24/25/26 was the most abundant in the varieties and was distributed in all 14 maturity groups (Supplementary Figures 1C1,F1,G1,H1,  2A1,C1,D1,E1,F1,G1).

Haplotypes Associated With Flowering Time
The haplotype groups of 21 gene families are presented in Table 3. An analysis of variance (ANOVA) was performed to elucidate the natural variations in flowering time (VE-R1) of the GmCOL gene families based on haplotype groups. In this analysis, eight genes (GmCOL2/5/9/13/15/16/25/28) showed significant results from the first emergence to the first flowering date (VE-R1) in both years, and three genes [GmCOL8 (2015), GmCOL22 (2015), and GmCOL26 (2016)] exhibited significant results in a single year (Supplementary Table 6). The remaining genes (GmCOL4/6/10/14/19/20/23/24) did not show significant results in these years. Notably, GmCOL1 and GmCOL3 were the most conserved, and no polymorphisms in the coding region were observed.
Among the 11 haplotype groups of GmCOL2, 4 were analyzed, and GmCOL2-Hap3 was significantly different from every other haplotype group (Figure 4A). The GmCOL2-Hap1/2/4 haplotype groups, which appeared in the above haplotype group, were not significantly different from each other. Analysis of GmCOL5 showed that in both years two major haplotypes showed significant differences from each other, and Hap2 was related to late flowering compared with Hap1 ( Figure 4B). In terms of GmCOL5 Hap2 (S999), the original nucleotide (A 499 ) was mutated to G 499 in the coding sequence, resulting in a missense mutation in the amino acid sequence (K 167 to E 167 ). K 167 was located between the B-box and the CCT domain. In comparison, GmCOL9, Hap1, and Hap2 ( Figure 4C) did not show significant differences from each other, and both were significantly different from GmCOL9-Hap3 and GmCOL9-Hap4 only in 2015. Likewise, GmCOL13-Hap2/3 (Figure 4D) was closely associated with flowering time because both were significantly different from GmCOL13-Hap1 in both years.
For GmCOL13 Hap2/3 (S584), the original nucleotide (G 84 ) was mutated to A 84 in the coding sequence, resulting in a silent mutation in the no. 28 amino acid sequence, which was located in the first B-box. In both years, GmCOL15-Hap3/4 ( Figure 4E) showed a significant difference compared with GmCOL15-Hap1/2. For GmCOL16, the haplotypes showed nonsignificant results among the four haplotypes (Hap1, Hap2, Hap3, and Hap4) (Figure 4F). In the first year, five haplotypes of GmCOL25 (Hap1, Hap2, Hap3, Hap4, and Hap5) were not significantly different from each other, whereas Hap3 and Hap4 were significantly different from Hap1, Hap2, and Hap5 in the second year ( Figure 4G). Hap2 of GmCOL28 was significantly different from Hap1 and Hap3 in both years ( Figure 4H). For GmCOL28 Hap2 (S1066), the original nucleotide (T 263 ) was mutated to C 263 in the coding sequence, resulting in a missense mutation in the amino acid sequence (V 88 to A 88 ), which was located in the first B-box, and may affect the protein and protein interaction. Thus, the results revealed that Hap2 of GmCOL28 was closely associated with flowering time.

Selected Soybean Varieties Showed Diversity in Flowering Time and Maturity
The varieties analyzed in this study showed the diversity of flowering time and maturity, which indicated that the flowering time (VE-R1) and the days from emergence to physiological maturity (VE-R7) were related to the variety trait and natural environment (Jiang et al., 2019). According to the photoperiod responses, many soybean varieties have evolved to adapt to a broad range of growing areas in China (Wang et al., 2016). The flowering time of the selected varieties ranged from 21 to 121 days and 13.66 to 109.2 days, and the maturity time ranged from 61.2 to 150.7 days and 68.4 to 144.4 days, respectively, for two consecutive years (2015 and 2016) (Jiang et al., 2019), indicating high diversity ( Table 2). However, Paula showed the shortest VE-R1 in the 2nd year, with the maturity group MG 0000. Paula is recognized as a high-latitude cold region (HCR) soybean variety and HCR soybean, which typically matures early (Jia et al., 2014). This observation suggested that the maturity group MG 0000 was relatively stable for days to start flowering from emergence.

GmCOL Orthologs Showed Divergence in Sequence Polymorphism
It has been shown that GmCOL1a (Glyma08g28370) and GmCOL1b (Glyma18g51320) in soybean are the closest Arabidopsis COL2 orthologs to CO (Thakare et al., 2010). The transgenic soybean line overexpressing GmCOL1a flowered late under long-day or natural conditions (Cao et al., 2015). There is a single-nucleotide substitution at the GmCOL1b CCT domain in the gmcol1b mutant, leading to the mutagenesis of conserved threonine at amino acid 314 into isoleucine, which results in early flowering of the gmcol1b mutant (Cao et al., 2015). In this study, the varieties that were taken for the observation of natural variations regarding flowering time and maturation showed divergent characteristics in flowering time (Jiang et al., 2019). Allelic variation can have an effect on flowering time (Irwin et al., 2016). In this study, the insertion and single-nucleotide substitution (InDel918, InDel919, InDel920, s958, s1138, s1139, s1218, and s1260) found in the first exon in GmCOL2 led to missense mutations, which might have an effect on flowering time and maturity. The other genes in the GmCOL family showed various types of point mutations (Figure 2 and Supplementary  Figures 1-5). The loss of functions may have an impact on gene function, and the presence of these mutations suggested that polymorphism could be the main cause of the diversity of soybean flowering time.

CONCLUSION
In summary, 21 GmCOL genes exhibited natural divergence in association with flowering and growth periods. Significant changes were found in 21 COL genome sequences, among which GmCOL1, GmCOL3, GmCOL8, GmCOL9, GmCOL10, and GmCOL13 were conserved, but the sequences of the remaining COL genes showed a wide range of changes that could alter their function. In this study, it was noticed that additional polymorphisms linked to the 21 GmCOL gene families in traitcontrolling regions might have a significant impact on flowering time and maturity.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
MA conducted the major phenotypic and genotypic works. SZ, RE, WS, TW, and BJ joined the experiments. SY, CW, WH, SS, and YF provided the materials and technical supports. MA, RE, and BJ prepared the initial draft of the manuscript. FC rewrote the manuscript. FC, BJ, MA, RE, and TH revised the manuscript. TH coordinated and supervised the project. All authors read and approved the final manuscript.